<a href="https://colab.research.google.com/github/HBioquant/DiffBindFR/blob/main/notebooks/DiffBindFR_demo_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **🔥🔥DiffBindFR**
<div class="alert alert-info"> Diffusion model based flexible protein-ligand docking
</div>

for more details see our [Paper](https://pubs.rsc.org/en/content/articlelanding/2024/sc/d3sc06803j)

Here, we conduct a demo by using DiffBindFR to redock ligand (fetched by PDB ID: 2ZEC) into the predefined pocket of **AlphaFold2 modelled structure** (Uniprot ID: Q15661).

In [1]:
#@title **Install Conda Colab**
#@markdown It will restart the kernel (session), don't worry.
!pip install -q condacolab
import condacolab
condacolab.install()

from google.colab import files
from google.colab import output
output.enable_custom_widget_manager()

⏬ Downloading https://github.com/conda-forge/miniforge/releases/download/23.11.0-0/Mambaforge-23.11.0-0-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:13
🔁 Restarting kernel...


In [2]:
#@title **Install dependencies**
#@markdown It will take a few minutes, please, drink a coffee and wait. ;-)
# install dependencies
%%capture

import sys
import os
import subprocess
import tarfile
from pathlib import Path
home = Path(os.path.abspath("DiffBindFR/"))
sys.path.insert(0, str(home))

commands = [
    "git clone https://github.com/HBioquant/DiffBindFR.git",
    "mamba install -c conda-forge openmm=7.7.0",
    "mamba install -c conda-forge pdbfixer -y",
    "mamba install -c conda-forge openmmforcefields -y",
    "mamba install -c conda-forge ambertools -y",
    "mamba install -c conda-forge pymol-open-source",
    "mamba install -c conda-forge openbabel -y",
    "mamba install -c conda-forge mpi4py -y",
    "mamba install -c conda-forge cudatoolkit==11.7.*",
    "pip install torch==1.13.1 --quiet",
    "pip install ml_collections",
    "tail -n +12 ./DiffBindFR/requirements/runtime.txt > ./DiffBindFR/requirements/pkgs.txt",
    "pip install -r ./DiffBindFR/requirements/pkgs.txt --quiet",
    "chmod +x ./DiffBindFR/druglib/ops/smina/smina.static",
    "chmod +x ./DiffBindFR/druglib/ops/dssp/mkdssp",
    "chmod +x ./DiffBindFR/druglib/ops/msms/msms",
]

for cmd in commands:
  subprocess.run(cmd, shell=True)


import torch

try:
    import torch_geometric
except ModuleNotFoundError:
    !pip uninstall torch-scatter torch-sparse torch-geometric torch-cluster --y
    !pip install --no-index torch-scatter -f https://data.pyg.org/whl/torch-{torch.__version__}.html
    !pip install --no-index torch-sparse -f https://data.pyg.org/whl/torch-{torch.__version__}.html
    !pip install --no-index torch-cluster -f https://data.pyg.org/whl/torch-{torch.__version__}.html
    !pip install git+https://github.com/pyg-team/pytorch_geometric.git --quiet

commands = [
    "pip install -e ./DiffBindFR",
    "wget https://zenodo.org/records/10843568/files/weights.tar.gz",
    "bash ./DiffBindFR/INSTALL_OPENFF.sh",
]

for cmd in commands:
  subprocess.run(cmd, shell=True)

file = tarfile.open('weights.tar.gz')
file.extractall('/content/DiffBindFR/DiffBindFR/')
file.close()
os.remove('weights.tar.gz')

In [3]:
#@title **Run demo**
#@markdown Have a fun to run the AF2 structure flexible docking. ;-)
import os, sys, glob, shutil
home = Path(os.path.abspath("DiffBindFR/"))
sys.path.insert(0, str(home))
from pathlib import Path
import pandas as pd
from rdkit import Chem
import MDAnalysis as mda
import nglview as nv
from nglview.color import ColormakerRegistry
import torch
import druglib
from DiffBindFR import common
from DiffBindFR.evaluation import get_traj_id, export_xtc
from DiffBindFR.app.predict import runner
from DiffBindFR.utils import (
    pair_spatial_metrics,
    PDBPocketResidues,
    to_complex_block,
    read_molblock,
    update_mol_pose,
)



No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'


In [4]:
example_path = home / 'examples' / 'AF2'
holo = example_path / '2zec.pdb'
crystal_ligand = example_path / 'ligand.sdf'
af2 = example_path / 'Q15661_AF2.pdb'

#### PoseView of Holo structure

Here, pocket residues within 5 angstrom of crystal ligand are visualized (colored by <font color='red'>red</font>)

<font color='red'>Regrettably, nglview does not work properly on Google Colab. It is suggested to turn to Jupyter Notebook to enhance your experience with NGLview visualization ☹.</font>

See more details about the issue on the [channel](https://github.com/googlecolab/colabtools/issues/2853#issuecomment-1171699299).

In [5]:
pocket_buffer = 5
holo_pocket = PDBPocketResidues.RDmolPocketResidues(
    str(holo), str(crystal_ligand),
)
view = holo_pocket.visualize_pocket(pocket_buffer)
view._remote_call('setSize', target='Widget', args=['','600px'])
view

NGLWidget()

#### Pocket Conformation visual inspection

Compare the AF2 modeled pocket conformation (<font color='yellow'>yellow cartoon</font> and <font color='blue'>blue sticks</font>) with the crystal structure in advance.

We could get the knowledge:

- AF2 modeled structure has holo-like backbone with CA RMSD = 0.32 A
- There are significant differences in pocket side chain conformation with sc-RMSD = 1.24 A, mainly from A:218:ASP, A:219:SER, A:221:GLN, A:244:TRP, A:246:GLU

In [6]:
view = holo_pocket.compare(str(af2))
view._remote_call('setSize', target='Widget', args=['','600px'])
view

NGLWidget()

In [7]:
# Quantitative comparison of pocket conformation between af2 structure and holo
holo_chainid = 'A' # see the receptor chain ID in poseview
results_df = pair_spatial_metrics(
    str(holo), str(crystal_ligand), str(af2),
    holo_chainid, 'A', # af2 chain id is A as it is monomer prediction
    bs_cutoff = pocket_buffer,
)
ca_rmsd = results_df.iloc[0].mean_ca_rmsd
sc_rmsd = results_df.iloc[0].mean_sc_rmsd
print('pocket CA RMSD within 5A of ligand:', round(ca_rmsd, 2))
print('pocket side chain heavy atoms RMSD within 5A of ligand:', round(sc_rmsd, 2))

pocket CA RMSD within 5A of ligand: 0.32
pocket side chain heavy atoms RMSD within 5A of ligand: 1.24


In [8]:
# let's see the key residues
flexible_residues = 'A:218:ASP,A:219:SER,A:221:GLN,A:244:TRP,A:246:GLU'
holo_chainid = 'A' # see the receptor chain ID in poseview
# reverse the input as the residue is on af2 structure and crystal_ligand here is useless
results_df = pair_spatial_metrics(
    str(af2), str(crystal_ligand), str(holo),
    'A', holo_chainid,
    bs_res_str = flexible_residues.split(','),
)
ca_rmsd = results_df.iloc[0].mean_ca_rmsd
sc_rmsd = results_df.iloc[0].mean_sc_rmsd
print(f'pocket CA RMSD of {flexible_residues}:', round(ca_rmsd, 2))
print(f'pocket side chain heavy atoms RMSD of {flexible_residues}:', round(sc_rmsd, 2))

pocket CA RMSD of A:218:ASP,A:219:SER,A:221:GLN,A:244:TRP,A:246:GLU: 0.23
pocket side chain heavy atoms RMSD of A:218:ASP,A:219:SER,A:221:GLN,A:244:TRP,A:246:GLU: 1.78


#### Run demo

So here, we would like to use **DiffBindFR** to perform flexible docking and dock the ligand into pocket and refine the side chain conformation so that the refined structure is close to holo.

In [9]:
experiment_name = 'Q15661'
export_dir = 'demo_af2_docking'
export_dir = os.path.abspath(export_dir)
seed = 888

In [None]:
# input parameters in jupyter using argparse
parser = common.parse_args()
args = parser.parse_args(
    [
        '-l', str(crystal_ligand),
        '-p', str(af2),
        '-o', export_dir,
        '-np', '40',
        '-gpu', '0',
        '-cpu', '1',
        '-bs', '16',
        '-eval', '-rp', # here we automatically evaluate the redock performance
        '-cl',
        '-st',
        '-n', experiment_name,
        '--seed', str(seed),
    ]
)
args.cfg_options = None
job_df = common.make_inference_jobs(args)
runner(job_df, args)

In [18]:
results_dir = os.path.join(export_dir, experiment_name, 'results')
smina_top1 = os.path.join(results_dir, 'results_ec_smina_top1.csv')
smina_top1 = pd.read_csv(smina_top1)
smina_top1 = smina_top1.iloc[0]
smina_top1_protein = smina_top1.protein_pdb
smina_top1_pose = smina_top1.docked_lig
mdn_top1 = os.path.join(results_dir, 'results_ec_mdn_top1.csv')
mdn_top1 = pd.read_csv(mdn_top1)
mdn_top1 = mdn_top1.iloc[0]
mdn_top1_protein = mdn_top1.protein_pdb
mdn_top1_pose = mdn_top1.docked_lig

#### Get top1 prediction

In [None]:
holo_pocket = PDBPocketResidues.RDmolPocketResidues(
    str(holo), str(crystal_ligand),
)
view = holo_pocket.visualize_pocket(pocket_buffer)
view = holo_pocket.compare(smina_top1_protein, ligand_sdf = smina_top1_pose)
view._remote_call('setSize', target='Widget', args=['','600px'])
view

NGLWidget()

In [None]:
print('DiffBindFR-Smina')
print(f'ligand RMSD:', round(smina_top1['l-rmsd_ec'], 2))

results_df = pair_spatial_metrics(
    str(holo), str(crystal_ligand), str(smina_top1_protein),
    holo_chainid, 'A', # af2 chain id is A as it is monomer prediction
    bs_cutoff = pocket_buffer,
)
sc_rmsd = results_df.iloc[0].mean_sc_rmsd
print('pocket side chain heavy atoms RMSD within 5A of ligand:', round(sc_rmsd, 2))

results_df = pair_spatial_metrics(
    str(smina_top1_protein), str(crystal_ligand), str(holo),
    'A', holo_chainid,
    bs_res_str = flexible_residues.split(','),
)
sc_rmsd = results_df.iloc[0].mean_sc_rmsd
print(f'pocket side chain heavy atoms RMSD of {flexible_residues}:', round(sc_rmsd, 2))

DiffBindFR-Smina
ligand RMSD: 3.52
pocket side chain heavy atoms RMSD within 5A of ligand: 1.43
pocket side chain heavy atoms RMSD of A:218:ASP,A:219:SER,A:221:GLN,A:244:TRP,A:246:GLU: 1.65


In [None]:
print('DiffBindFR-MDN')
print(f'ligand RMSD:', round(mdn_top1['l-rmsd_ec'], 2))

results_df = pair_spatial_metrics(
    str(holo), str(crystal_ligand), str(mdn_top1_protein),
    holo_chainid, 'A', # af2 chain id is A as it is monomer prediction
    bs_cutoff = pocket_buffer,
)
sc_rmsd = results_df.iloc[0].mean_sc_rmsd
print('pocket side chain heavy atoms RMSD within 5A of ligand:', round(sc_rmsd, 2))

results_df = pair_spatial_metrics(
    str(mdn_top1_protein), str(crystal_ligand), str(holo),
    'A', holo_chainid,
    bs_res_str = flexible_residues.split(','),
)
sc_rmsd = results_df.iloc[0].mean_sc_rmsd
print(f'pocket side chain heavy atoms RMSD of {flexible_residues}:', round(sc_rmsd, 2))

#### Protein-ligand complex structure relaxation based OpenMM (Optional)

In [12]:
from DiffBindFR.relax.pl import relax_pl

use_cpu = not torch.cuda.is_available()
kwargs = dict(
  sanitize=True,
  removeHs=True,
  strictParsing=True,
  proximityBonding=True,
  cleanupSubstructures=True,
  p_restraint_type='protein',
  p_stiffness=100.,
  l_restraint_type='non_H',
  l_stiffness=0.,
  tolerance=0.01,
  maxIterations=0,
  gpu=(not use_cpu),
  ccd_int=0,
  keepIds=True,
  seed=None,
  rst_mask=None,
  num_workers=12,
  verbose=True,
)

In [None]:
DIR = os.path.join(export_dir, experiment_name, 'DiffBindFR-Smina-top1', Path(smina_top1_protein).parents[1].stem)
DIR = os.path.abspath(DIR)
Path(DIR).mkdir(parents=True, exist_ok=True)
shutil.copy(smina_top1_protein, DIR)
shutil.copy(smina_top1_pose, DIR)
input_pdb_file = f'{DIR}/prot_final.pdb'
ligand_file = f'{DIR}/lig_final_ec.sdf'
out_fixed_pdb_file = f'{DIR}/fixed.pdb'
out_relax_pdb_file = f'{DIR}/relaxed_protein.pdb'
out_relax_lig_file = f'{DIR}/relaxed_ligand.sdf'
out_relax_complex_file = f'{DIR}/relaxed_complex.pdb'


relax_pl(
  input_pdb_file,
  None,
  out_fixed_pdb_file,
  out_relax_pdb_file,
  None,
  None,
  **kwargs,
)
input_pdb_file = out_relax_pdb_file

relax_pl(
  input_pdb_file,
  ligand_file,
  out_fixed_pdb_file,
  out_relax_pdb_file,
  out_relax_lig_file,
  out_relax_complex_file,
  **kwargs,
)

In [None]:
DIR = os.path.join(export_dir, experiment_name, 'DiffBindFR-MDN-top1', Path(mdn_top1_protein).parents[1].stem)
DIR = os.path.abspath(DIR)
Path(DIR).mkdir(parents=True, exist_ok=True)
shutil.copy(mdn_top1_protein, DIR)
shutil.copy(mdn_top1_pose, DIR)
input_pdb_file = f'{DIR}/prot_final.pdb'
ligand_file = f'{DIR}/lig_final_ec.sdf'
out_fixed_pdb_file = f'{DIR}/fixed.pdb'
out_relax_pdb_file = f'{DIR}/relaxed_protein.pdb'
out_relax_lig_file = f'{DIR}/relaxed_ligand.sdf'
out_relax_complex_file = f'{DIR}/relaxed_complex.pdb'


relax_pl(
  input_pdb_file,
  None,
  out_fixed_pdb_file,
  out_relax_pdb_file,
  None,
  None,
  **kwargs,
)
input_pdb_file = out_relax_pdb_file

relax_pl(
  input_pdb_file,
  ligand_file,
  out_fixed_pdb_file,
  out_relax_pdb_file,
  out_relax_lig_file,
  out_relax_complex_file,
  **kwargs,
)

##### Download the top1 results

In [None]:
DIR = os.path.join(export_dir, experiment_name)
DIR = os.path.abspath(export_dir)

tar_file = 'DiffBindFR-predicted-structures.tar.gz'
if os.path.exists(tar_file):
  os.remove(tar_file)
tar = tarfile.open(tar_file, "w:gz")
for root, sdir, files in os.walk(DIR):
  root_ = os.path.relpath(root, start = DIR)
  for f in files:
    full_path = os.path.join(root, f)
    if any(x in full_path for x in ['DiffBindFR-Smina-top1', 'DiffBindFR-MDN-top1']):
      tar.add(full_path, arcname = os.path.join(root_, f))
tar.close()

In [38]:
files.download(tar_file)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

##### Save to Google Drive (Optional)


In [21]:
from pydrive2.drive import GoogleDrive
from pydrive2.auth import GoogleAuth
from google.colab import auth
from oauth2client.client import GoogleCredentials
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
print("You are logged into Google Drive and are good to go!")

You are logged into Google Drive and are good to go!


In [None]:
uploaded = drive.CreateFile({'title': tar_file})
uploaded.SetContentFile(tar_file)
uploaded.Upload()
print(f"Uploaded {tar_file} to Google Drive with ID {uploaded.get('id')}")

#### Make trajectory movie

In [None]:
cm = ColormakerRegistry
cm.add_scheme_func('lig_atomwise','''
 this.atomColor = function (atom) {
     if (atom.element == "C") {
       return 0x7272e6 // C
     } else if (atom.element == "H") {
       return 0xecf0f1
     } else if (atom.element == "S") {
       return 0xf1c40f
     } else if (atom.element == "N") {
       return 0x2d2de1
     } else if (atom.element == "O") {
       return 0xff5252
     }
 }
''')
cm.add_scheme_func('prot_atomwise','''
 this.atomColor = function (atom) {
     if (atom.element == "C") {
       return 0xf9f902 // C
     } else if (atom.element == "H") {
       return 0xecf0f1
     } else if (atom.element == "S") {
       return 0xf1c40f
     } else if (atom.element == "N") {
       return 0x2d2de1
     } else if (atom.element == "O") {
       return 0xff5252
     }
 }
''')

def add_ec_to_xtc(
    sample_dir: str,
    topology: str,
    new_name: str = 'new_prl_traj.xtc',
) -> str:
    pdb_final = os.path.join(sample_dir, 'prot_final.pdb')
    lig_final = os.path.join(sample_dir, 'lig_final_ec.sdf')
    lig_final_mol = Chem.SDMolSupplier(lig_final)[0]
    lig_final_mol = Chem.MolFromPDBBlock(Chem.MolToPDBBlock(lig_final_mol))

    traj_dir = os.path.join(sample_dir, 'prl_traj')
    trajs = list(Path(traj_dir).glob('prl_*.pdb'))
    assert len(trajs) > 0, 'please export trajectory when you run DiffBindFR sampling by turn on -st.'
    ids = []
    for traj in trajs:
        stem = traj.stem
        traj_id = get_traj_id(stem)
        ids.append(traj_id)
    max_id = max(ids)
    final_id = max_id + 1
    final_traj_path = os.path.join(traj_dir, f'prl_{final_id}.pdb')
    seed_traj_path = trajs[0]

    mol_seed_block = read_molblock(seed_traj_path)
    mol_seed = Chem.MolFromPDBBlock(mol_seed_block) # use mol_seed topology to export PDB block
    lig_final_mol = update_mol_pose(mol_seed, lig_final_mol)

    trajectory = os.path.join(sample_dir, new_name)
    p_pdbblock = Path(pdb_final).read_text()
    l_pdbblock = Chem.MolToPDBBlock(lig_final_mol)
    try:
        complex_pdb_block = to_complex_block(p_pdbblock, l_pdbblock, final_traj_path)
        export_xtc(
            topology,
            traj_dir,
            trajectory,
        )
    finally:
        if os.path.exists(final_traj_path):
            os.remove(final_traj_path) # avoid increment by multiple run
    return trajectory

def show_nv_traj(
    sample_dir: str,
    repr_sel: str,
    add_ec_to_xtc_flag = True,
):
    topology = os.path.join(sample_dir, '../prl_topol.pdb')

    if add_ec_to_xtc_flag:
        # add ec ligand into xtc
        trajectory = add_ec_to_xtc(sample_dir, topology)
    else:
        trajectory = os.path.join(sample_dir, 'prl_traj.xtc')

    u = mda.Universe(topology, trajectory)
    system = u.select_atoms('all')
    t = nv.MDAnalysisTrajectory(system)
    w = nv.NGLWidget(t)
    w.clear_representations()
    w.add_cartoon(colorScheme = 'sstruc')
    w.add_representation(
        repr_type='ball+stick',
        selection='[UNL]', # ligand resname
        color_scheme = 'lig_atomwise'
    )
    w.add_representation('licorice', selection=repr_sel, color_scheme='prot_atomwise')

    if add_ec_to_xtc_flag:
        os.remove(trajectory)

    return w

# make nglview selection expression
flex_residue_list = flexible_residues.split(',')
flex_resnumber = [x.split(':')[1] for x in flex_residue_list]
flex_resnumber = ':A and ' + '( ' + ' or '.join(flex_resnumber) + ' )'
flex_resnumber

In [None]:
sample_dir = os.path.dirname(smina_top1_protein)
w = show_nv_traj(sample_dir, flex_resnumber, True)
w

In [None]:
sample_dir = os.path.dirname(mdn_top1_protein)
w = show_nv_traj(sample_dir, flex_resnumber, True)
w

### 🎉🎉End

Thanks for your interest in DiffBindFR. We are still working hard to further improve performance and extend it to other applications.

If you have any question, feel free to open a [github issue](https://github.com/HBioquant/DiffBindFR/issues) or reach out to me: [zhujt@stu.pku.edu.cn](zhujt@stu.pku.edu.cn)

👋👋👋