<a href="https://colab.research.google.com/github/RyanZR/ColabDock-Vina/blob/main/%F0%9F%8D%8APLIA_V2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🍊 **PLIA_V2**
_**P**rotein-**L**igand **I**nteraction **A**nalysis Version **2**_ is a Jupyter Notebook written to perform protein-ligand binding interaction analysis using **Protein-Ligand Interaction Profiler**.


Proceed to [MOUNTAIN_V2.pynb](https://colab.research.google.com/github/RyanZR/ColabDock-Vina/blob/main/%F0%9F%8D%8AMOUNTAIN_V2.ipynb) to perform single molecular docking.

Proceed to [UNOIN_V2.pynb](https://colab.research.google.com/github/RyanZR/ColabDock-Vina/blob/main/%F0%9F%8D%8AUNION_V2.ipynb) to perform virtual screening.

---
---
# **Setting Up the Environment for Analysis**

Before starting, we need to install all the necessary software and dependecies to perform molecular docking. 

+ condacolab (https://github.com/con)
+ py3Dmol (https://pypi.org/project/py3Dmol/)
+ PLIP (https://plip-tool.biotec.tu-dresden.de/plip-web/plip/index)

In [None]:
# @title **Install dependencies**
# @markdown It will take a few minutes, please, drink a coffee and wait. ;-)

# install dependencies

%%capture
!pip install -q rdkit-pypi
!pip install -q py3Dmol

!pip install -q condacolab
import condacolab
condacolab.install()

!conda install -c defaults conda python=3.6 --yes
!conda update -c defaults --all --yes
import sys
sys.path.append('/usr/local/lib/python3.6/site-packages/')
!conda install -c conda-forge plip --yes

!rm -r /content/sample_data
!rm /content/condacolab_install.log

In [None]:
# @title **Import Python modules**
# @markdown This allow Python accessible to the neccessary modules.

import sys
import os
sys.path.append('/usr/local/lib/python3.6/site-packages/')

import ast
import time
import shutil
import py3Dmol
import pandas as pd
import numpy as np

from functools import reduce
from google.colab import drive, files
import matplotlib.pyplot as plt
import seaborn as sns

from rdkit import Chem
from rdkit.Chem import rdFMCS, AllChem, Draw, PandasTools
from rdkit.Chem.Draw import DrawingOptions, IPythonConsole

from plip.structure.preparation import PDBComplex
from plip.exchange.report import BindingSiteReport

%matplotlib inline

class Hide:
  def __enter__(self):
    self._original_stdout = sys.stdout
    sys.stdout = open(os.devnull, "w")
  
  def __exit__(self, exc_type, exc_val, exc_tb):
    sys.stdout.close()
    sys.stdout = self._original_stdout

bond_dict = {
    "hydrophobic": ["0x59e382", "GREEN"], 
    "hbond": ["0x59bee3", "LIGHT BLUE"],
    "waterbridge": ["0x4c4cff", "BLUE"], 
    "saltbridge": ["0xefd033", "YELLOW"], 
    "pistacking": ["0xb559e3", "PURPLE"], 
    "pication": ["0xe359d8", "VIOLET"], 
    "halogen": ["0x59bee3", "LIGHT BLUE"], 
    "metal": ["0xe35959", "ORANGE"] }

response = {"Yes": True, "No": False}
all_interaction = list(bond_dict.keys())

In [None]:
# @title **Import Google Drive**
# @markdown This allow data to be stored in Google Drive.

# Flush and mount GDrive
with Hide():
  drive.flush_and_unmount()
  drive.mount("/content/drive", force_remount=True)

print(f"> Mounted at /content/drive")

In [None]:
# @title **Select and create folders**
# @markdown Select a **work directory** name without space. Analysis folder will be created to store the data necessary for interaction analysis.

# Define path of folder
GDrive_dir = "/content/drive/MyDrive/Docking/virtual_screening" #@param {type: "string"}
dir = os.path.abspath(".")
analysis_folder = os.path.join(dir,"analysis")
protein_folder = os.path.join(GDrive_dir,"protein")
ligand_folder = os.path.join(GDrive_dir,"ligand")
experimental_folder = os.path.join(GDrive_dir,"experimental")
docking_folder = os.path.join(GDrive_dir,"docking")

# Create folder if folder have not exists
if os.path.exists(analysis_folder):
  print(f"> %s already exists" % analysis_folder)
if not os.path.exists(analysis_folder):
  os.mkdir(analysis_folder)
  print(f"> %s was successfully created" % analysis_folder)

---
---
# **Analyzing Protein-Ligand Interaction** (For Virtual Screening)
This section of the codes load data obtained from [UNOIN_V2.pynb](https://colab.research.google.com/github/RyanZR/ColabDock-Vina/blob/main/%F0%9F%8D%8AUNION_V2.ipynb) to profile ligand interaction for analysis.


In [None]:
# @title **Setup analysis**
# @markdown This setup multiple function to determine the protein-ligand interaction using **PLIP**.

def retrieve_interaction(inputPL):
  protlig = PDBComplex()
  protlig.load_pdb(inputPL)
  sites = {}
  for ligand in protlig.ligands:
    protlig.characterize_complex(ligand)
  for key, site in sorted(protlig.interaction_sets.items()):
    binding_site = BindingSiteReport(site)
    interactions = { k: [ getattr(binding_site, k + "_features") ] + getattr(binding_site, k + "_info") for k in tuple(bond_dict.keys()) }
    sites[key] = interactions
    return sites

def export_inter_csv(interaction_site, site_selected, inputPL):
  export_dir = os.path.dirname(inputPL)
  PL_BN = os.path.basename(inputPL)[:-12]
  inter_csv_afile = os.path.join(export_dir, PL_BN + "_inter.csv")
  bonding = []
  df_tp = pd.DataFrame(columns = ["RESNR", "RESTYPE", "RESCHAIN", "RESNR_LIG", "RESTYPE_LIG", "RESCHAIN_LIG", "DIST", "LIGCARBONIDX", "PROTCARBONIDX", "LIGCOO", "PROTCOO"])
  for bond in list(bond_dict.keys()):
    df = pd.DataFrame.from_records(interaction_site[site_selected][bond][1:], columns = interaction_site[site_selected][bond][0])
    if df.empty is not True:
      a = [bond.upper()] * len(df)
      bonding.extend(a)
      df_tp = df_tp.append(df)
  df2 = df_tp.assign(BOND = bonding)
  df2.reset_index(drop=True, inplace=True)
  df2.to_csv(inter_csv_afile, index=False)

def view_interaction(inputCSV, mode="summary"):
  DIST_CALC = []
  COLOR = []
  MIDCOO = []
  interaction = pd.read_csv(inputCSV, converters = {"LIGCOO": lambda x: ast.literal_eval(str(x)), "PROTCOO": lambda x: ast.literal_eval(str(x)), "BOND": lambda x: x.lower()})
  for LC, PC, BT in zip(interaction["LIGCOO"], interaction["PROTCOO"], interaction["BOND"]):
    # Find color of bond
    COLOR.append(bond_dict[BT][1])
    # Find distance between 2 points
    p1 = np.array([LC[0], LC[1], LC[2]])
    p2 = np.array([PC[0], PC[1], PC[2]])
    squared_dist = np.sum((p1-p2)**2, axis=0)
    dist = np.round(np.sqrt(squared_dist) ,2)
    DIST_CALC.append(dist)
    # Find midpoint between 2 points
    mid_x = np.round((LC[0] + PC[0]) / 2, 2)
    mid_y = np.round((LC[1] + PC[1]) / 2, 2)
    mid_z = np.round((LC[2] + PC[2]) / 2, 2)
    p_mid = (mid_x, mid_y, mid_z) 
    MIDCOO.append(p_mid)
  interaction["BOND"] = interaction["BOND"].str.upper()
  interaction["COLOR"] = COLOR
  interaction["MIDCOO"] = MIDCOO
  interaction["DIST_CALC"] = DIST_CALC
  interaction["RESNAME"] = interaction["RESTYPE"] + interaction["RESNR"].astype(str)
  if mode == "summary":
    df = interaction[["RESNR", "RESTYPE", "DIST_CALC", "BOND", "COLOR"]]
  if mode == "py3Dmol":
    df = interaction[["RESNR", "DIST_CALC", "LIGCOO", "PROTCOO", "MIDCOO", "BOND"]]
  if mode == "overall":
    df = interaction[["RESNAME", "RESTYPE_LIG", "DIST_CALC", "BOND", "COLOR"]]
  return df

In [None]:
# @title **Generate ProtLig.pdb file**
#@markdown This merge each pose of ligand PDBs with protein PDB and export as one **`ProtLig.pdb`** file.

Protein_pdb = "7KNX_prot_A.pdb" #@param {type : "string"}

protein_pdb_dfile = os.path.join(docking_folder, Protein_pdb)
protein_pdb_afile = os.path.join(analysis_folder, Protein_pdb)
shutil.copy(protein_pdb_dfile, protein_pdb_afile)

ligand_dfolder = sorted([
    os.path.join(docking_folder, f)
    for f in os.listdir(docking_folder) 
    if os.path.isdir(os.path.join(docking_folder, f))
])

ligand_afolder = sorted([
     os.path.join(analysis_folder, f)
     for f in os.listdir(docking_folder)
     if os.path.isdir(os.path.join(docking_folder, f))
])

ligand_dfolder_dir = sorted([
    sorted(os.listdir(f))
    for f in ligand_dfolder
])

pose_pdb_listname = sorted([
    list(filter(lambda x: x.endswith(".pdb") and x[len(x)-6] == "_", n)) 
    for n in ligand_dfolder_dir
])

ready = False
check = 0
for dfolder, afolder in zip(ligand_dfolder, ligand_afolder):
  if os.path.basename(afolder) == os.path.basename(dfolder):
    check += 1
  else:
    check += 0

if check == len(ligand_dfolder):
  ready = True
  for afolder in ligand_afolder:
    if os.path.exists(afolder):
      continue
    if not os.path.exists(afolder):
      os.mkdir(afolder)
  print(f"> Operation ready")
else:
  print(f"> Please check your files")

total_lig_pose = reduce(lambda a, b: a + b, [ len(f) for f in pose_pdb_listname ])

if ready:
  print(f"> Operation begins")
  start = time.time()
  # Count amount of ligand pose detected
  print(f"> {total_lig_pose} of ligand poses detected in your docking results")
  print(f"> Generating ProtLig.pdb ...")
  # Merge protein data and lig pose data into ProtLig.pdb
  count = 0
  for name, dfolder, afolder in zip(pose_pdb_listname, ligand_dfolder, ligand_afolder):
    for pose_pdb in name:
      pose_pdb_dfile = os.path.join(dfolder, pose_pdb)
      pose_pdb_afile = os.path.join(afolder, pose_pdb)
      pose_PL_pdb = pose_pdb[:-4] + "_ProtLig.pdb"
      pose_PL_pdb_file = os.path.join(afolder, pose_PL_pdb)
      shutil.copy(pose_pdb_dfile, pose_pdb_afile)
      prot = open(protein_pdb_afile, "r")
      ligs = open(pose_pdb_afile, "r")
      with open(pose_PL_pdb_file, "w") as g:
        # Write protein data
        for line in prot:
          row = line.split()
          if row[0] == "ATOM":
            g.write(line)
        # Write lig_pose data
        for line in ligs:
          row = line.split()
          if row[0] == "ATOM" or row[0] == "CONECT" or row[0] == "END":
            g.write(line)
      count += 1
    end = time.time()
    elapsed = np.round(end - start, 2)
    avg = np.round(elapsed / count, 2)
  print(f"> {count} of ProtLig.pdb successfully created")
  print(f"> Time elapsed: {elapsed} s")
  print(f"> Avg time per file: {avg} s")
  print(f"> Operation ends")
else:
  print(f"> Please check your files")

In [None]:
# @title **Perform interaction analysis**
# @markdown This uses the function created above to determine protein-ligand interactions which later exported as **`inter.csv`**.

ligand_afolder_dir = sorted([
    sorted(os.listdir(f))
    for f in ligand_afolder
])

pose_PL_pdb_listname = sorted([
    list(filter(lambda x: x.endswith("_ProtLig.pdb"), n))
    for n in ligand_afolder_dir
])

ready = False
lenA = reduce(lambda a,b: a+b, [ len(f) for f in pose_PL_pdb_listname ])
lenB = reduce(lambda a,b: a+b, [ len(f) for f in pose_pdb_listname ])
if lenA == lenB:
  ready = True
  print(f"> Operation ready")
else:
  print(f"> Please check your files")

if ready:
  print(f"> Operation begins")
  start = time.time()
  print(f"> {lenA} of ProtLig.pdb detected in analysis folder")
  print(f"> Generating inter.CSV ...")
  count = 0
  for afolder, dir in zip(ligand_afolder, pose_PL_pdb_listname):
    for BN in dir:
      pose_PL_pdb_afile = os.path.join(afolder, BN)
      export_inter_csv(retrieve_interaction(pose_PL_pdb_afile), 
                       list(retrieve_interaction(pose_PL_pdb_afile).keys())[0], 
                       pose_PL_pdb_afile)
      count += 1
  end = time.time()
  elapsed = np.round(end - start, 2)
  avg = np.round(elapsed / count, 2)
  print(f"> {count} of inter.csv successfully created")
  print(f"> Time elapsed: {elapsed} s")
  print(f"> Avg time per file: {avg} s")
  print(f"> Operation ends")
else:
  print(f"> Please check your files")

In [None]:
# @title **Show overall interaction profile**
# @markdown This generate an overall interaction data from all of ligand best pose.

new_ligand_afolder_dir = sorted([
    sorted(os.listdir(f))
    for f in ligand_afolder
])

inter_csv_listname = sorted([
    list(filter(lambda x: x.endswith("1_inter.csv"), n))
    for n in new_ligand_afolder_dir
])

overall_csv_afile = os.path.join(analysis_folder, "overall_interaction.csv")
overall_interaction = pd.DataFrame()
for f in inter_csv_listname:
  for csvFile in f:
    read = os.path.join(analysis_folder, csvFile[:-12] + "/" + csvFile)
    single_profile = view_interaction(read, mode="overall")
    overall_interaction = overall_interaction.append(single_profile, ignore_index=True)
overall_interaction.to_csv(overall_csv_afile, index=False)

uBOND = overall_interaction["BOND"].value_counts()
uRESNAME = overall_interaction["RESNAME"].value_counts()

fig, axes = plt.subplots(1,1, figsize=(10,5))
ax1 = sns.barplot(x=uBOND.index, y=uBOND.values)
ax1.set_title("Occurrence of Bonds")
fig.savefig(os.path.join(analysis_folder, "bond_occurrence.png")) 

fig, axes = plt.subplots(1,1, figsize=(10,5))
ax2 = sns.barplot(x=uRESNAME.index, y=uRESNAME.values)
ax2.set_title("Occurence of Residues")
fig.savefig(os.path.join(analysis_folder, "residue_occurrence.png")) 

In [None]:
# @title **Show interaction profile of ligand** {run: "auto"}
# @markdown This show the summary of protein-ligand interactions of the ligand.

Ligand = "A46_1" #@param {type : "string"}
ligand_inter_csv_filename = os.path.join(analysis_folder, Ligand[:-2] + "/" + Ligand + "_inter.csv")
view_interaction(ligand_inter_csv_filename, mode="summary")

In [None]:
# @title **Setup 3D structure viewer**
# @markdown This create 3D viewer for the result of protein-ligand interaction analysis.

def view_interactions(inputP, 
                      inputL, 
                      interCSV,
                      showInter = all_interaction,
                      showSurface = False,
                      showResLabel = True,
                      showDist = False,
                      showOL = False,
                      viewMode = "interactive", 
                      capture = False):
  
  count = 0 
  sum = 0
  mview = py3Dmol.view(1000, 1500)
  if showOL:
    mview.setViewStyle({"style": "outline", "color":  "black", "width": 0.025})

  def interactive(): 
    return mview.show()
  def spin(): 
    capture = False
    return mview.spin()
 
  VM = {"Interactive": interactive, "Animate": spin}

  # Protein model
  count += 1
  prot_model = count
  mol1 = open(inputP, "r").read()
  mview.addModel(mol1, "pdb")  
  mview.setStyle(
      {"model": prot_model})
  if showSurface:
    mview.addSurface(
        py3Dmol.VDW,
        {"color": "white", "opacity": 0.4},
        {"model": prot_model})

  # Ligand model
  count += 1
  lig_model = count
  mol2 = open(inputL, "r").read()
  mview.addModel(mol2, "pdb")
  mview.setStyle(
      {"model": lig_model},
      {"stick": {"colorscheme": "yellowCarbon"}})

  # Interactions
  labelled = []
  interaction = view_interaction(interCSV, mode="py3Dmol")
  for RN, DC, LC, PC, MC, BT in zip(interaction["RESNR"], interaction["DIST_CALC"], interaction["LIGCOO"], interaction["PROTCOO"], interaction["MIDCOO"], interaction["BOND"]):
    BT = BT.lower()

    # Show RES
    mview.addStyle(
        {"and": [{"model": prot_model}, {"resi": RN}]},
        {"stick": {"colorscheme": "whiteCarbon"}})
    
    # Label RES
    if showResLabel:
      if RN not in labelled:
        labelled.append(RN)
        mview.addResLabels(
            {"and": [{"model": prot_model}, {"resi": RN}]},
            {"alignment": "bottomLeft",
            "showBackground": False,
            "fixed": False,
            "fontSize": 14,
            "fontColor": "0x000000",
            "screenOffset": {"x": 10, "y": 10}})
        mview.addResLabels(
            {"and": [{"model": prot_model}, {"resi": RN}]},
            {"alignment": "bottomLeft",
            "showBackground": False,
            "fixed": False,
            "fontSize": 14,
            "fontColor": "0x000000",
            "screenOffset": {"x": 10, "y": 10}})
        
    if BT in showInter:
    # Show interaction
      mview.addCylinder(
          {"start": {"x": LC[0], "y": LC[1], "z": LC[2]}, 
          "end": {"x": PC[0], "y": PC[1], "z": PC[2]}, 
          "radius": 0.05, 
          "fromCap": 2, 
          "toCap": 2, 
          "color": bond_dict[BT][0], 
          "dashed": True})
    # Label distance
      if showDist:
        mview.addLabel(
            str(DC) + " Å",
            {"position": {"x": MC[0], "y": MC[1], "z": MC[2]},
            "alignment": "bottomLeft",
            "inFront": False,
            "fixed": False,
            "backgroundColor": bond_dict[BT][0],
            "fontSize": 12,
            "screenOffset": {"x": 5, "y": 5}})
    else:
      pass
    
  mview.center({"model": lig_model})
  mview.rotate(180,"x")
  mview.enableFog(True)

  if capture:
    mview.show()
    time.sleep(5)
    input("> Press ENTER to screenshot\n")
    mview.png()
  else:
    return VM[viewMode]()

In [None]:
# @title **Display interaction** {run: "auto"}
# @markdown Enter the protein and ligand to view their docking interactions. This display the residues of protein, docked ligand and their interactions in 3D space.

interactions = {
    "ALL": all_interaction
}

# @markdown ---
Protein = "7KNX_prot_A" #@param {type : "string"}
Ligand = "A46_1" #@param {type : "string"}
Show_Interaction = "all" #@param {type:"string"}
show = all_interaction if Show_Interaction == "all" else Show_Interaction

# @markdown ---
Show_Protein_Surface = True #@param {type:"boolean"}
Show_Distance = True #@param {type:"boolean"}
Show_Residue_Label = True #@param {type:"boolean"}
Show_Outline = False #@param {type:"boolean"}

# @markdown ---
Take_Screenshot = "No" #@param ["Yes", "No"]
View_Mode = "Interactive" #@param ["Interactive", "Animate"] 
 

p_pdb_afile = os.path.join(analysis_folder, Protein + ".pdb")
l_pdb_afile = os.path.join(analysis_folder, Ligand[:-2] + "/" + Ligand + ".pdb")
interCSV_afile = l_pdb_afile[:-4] + "_inter.csv"

view_interactions(p_pdb_afile, 
                  l_pdb_afile, 
                  interCSV_afile,
                  showInter = show,
                  showSurface = Show_Protein_Surface,
                  showResLabel = Show_Residue_Label,
                  showDist = Show_Distance,
                  showOL = Show_Outline,
                  viewMode = View_Mode, 
                  capture = response[Take_Screenshot])

---
---
# **Analyzing Protein-Ligand Interaction** (For Single Docking)
*Coming soon ...*

---
---
# **Save to Google Drive**
Save your docking data in GDrive. 

In [None]:
# @title **Store result in Google Drive**
# @markdown The analysis folder will be created. This save all the files created into Google Drive.

# Define varibles
destination_folder = os.path.join(GDrive_dir, "analysis")

# Copy file to GDrive
shutil.copytree(analysis_folder, destination_folder)

print(f"> Data saved at " + destination_folder)