<a href="https://colab.research.google.com/github/Jahan08/Bioinformatics/blob/main/DiffDock_SingleComplex.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DiffDock
Dock a small molecules on to protein structures using DiffDock approach

1.   This notebook allows you to run diffdock on single protein/ligands and also multiple proteins/ligands.

2.   Colab basic version works fine with single simulations. "Premium GPU" (colab pro), and even then it may fail on large complexes.

## References:

[Research Article](https://arxiv.org/abs/2210.01776)

[Github](https://github.com/gcorso/DiffDock)

[Interactive Online tool by Simon Duerr](https://huggingface.co/spaces/simonduerr/diffdock)

[Colab Notebook by Brian Naughton](https://colab.research.google.com/drive/1nvCyQkbO-TwXZKJ0RCShVEym1aFWxlkX). The current notebook revised from Brain's work/code.






In [1]:
# Start with mapping Google Drive to Colab
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


**Step 1**: Setup working directory named "DiffDock_V2" in your Google Drive and update directory path.

Copy or move this colab notebook to the current directory.

In [2]:
## Enable this code inorder to create DiffDock_V2 directory
#Pls ignore this step if you have already created one
%cd /content/drive/MyDrive
%mkdir DiffDock_V2
%cd DiffDock_V2
%ls

/content/drive/MyDrive
/content/drive/MyDrive/DiffDock_V2


If you have already created or would like to work on different directory; please update the path accordingly

In [3]:
%cd /content/drive/MyDrive/DiffDock_V2
%ls

/content/drive/MyDrive/DiffDock_V2


## Step 2:
Install the dependencies for DiffDock

## Install prerequisites

In [4]:
!pip install ipython-autotime
%load_ext autotime

Collecting ipython-autotime
  Downloading ipython_autotime-0.3.1-py2.py3-none-any.whl (6.8 kB)
Collecting jedi>=0.16 (from ipython->ipython-autotime)
  Downloading jedi-0.19.0-py2.py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: jedi, ipython-autotime
Successfully installed ipython-autotime-0.3.1 jedi-0.19.0
time: 462 µs (started: 2023-09-25 17:41:57 +00:00)


In [5]:
%cd /content/drive/MyDrive/DiffDock_V2
!git clone https://github.com/gcorso/DiffDock.git
%cd /content/drive/MyDrive/DiffDock_V2/DiffDock
!git checkout 0f9c419 # remove/update for more up to date code

/content/drive/MyDrive/DiffDock_V2
Cloning into 'DiffDock'...
remote: Enumerating objects: 305, done.[K
remote: Counting objects: 100% (158/158), done.[K
remote: Compressing objects: 100% (54/54), done.[K
remote: Total 305 (delta 127), reused 104 (delta 104), pack-reused 147[K
Receiving objects: 100% (305/305), 232.37 MiB | 14.07 MiB/s, done.
Resolving deltas: 100% (156/156), done.
Updating files: 100% (56/56), done.
/content/drive/MyDrive/DiffDock_V2/DiffDock
Note: switching to '0f9c419'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detache

In [6]:
!pip install pyg==0.7.1 --quiet
!pip install pyyaml==6.0 --quiet
!pip install scipy==1.7.3 --quiet
!pip install networkx==2.6.3 --quiet
!pip install biopython==1.79 --quiet
!pip install rdkit-pypi==2022.03.5 --quiet
!pip install e3nn==0.5.0 --quiet
!pip install spyrmsd==0.5.2 --quiet
!pip install pandas==1.3.5 --quiet
!pip install biopandas==0.4.1 --quiet
!pip install torch==1.12.1+cu113 --quiet
!pip install nglview --quiet
!pip install -q nglview pytraj --quiet

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/65.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━[0m [32m61.4/65.0 kB[0m [31m1.9 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m65.0/65.0 kB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.5/40.5 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for pyg (setup.py) ... [?25l[?25hdone
  Building wheel for pkgtools (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m682.2/682.2 kB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m39.9/39.9 MB[0m [31m20.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━

In [7]:
import torch

try:
    import torch_geometric
except ModuleNotFoundError:
    !pip uninstall torch-scatter torch-sparse torch-geometric torch-cluster  --y
    !pip install torch-scatter -f https://data.pyg.org/whl/torch-{torch.__version__}.html --quiet
    !pip install torch-sparse -f https://data.pyg.org/whl/torch-{torch.__version__}.html --quiet
    !pip install torch-cluster -f https://data.pyg.org/whl/torch-{torch.__version__}.html --quiet
    !pip install git+https://github.com/pyg-team/pytorch_geometric.git  --quiet # no version for some reason??

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.2/10.2 MB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.8/4.8 MB[0m [31m22.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m19.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for torch_geometric (pyproject.toml) ... [?25l[?25hdone
time: 46.8 s (started: 2023-09-25 17:45:30 +00:00)


### Download 2GB PDBBind dataset
unnecessary for inference

In [None]:
#!test -d /content/DiffDock/data/PDBBind_processed || (wget https://zenodo.org/record/6034088/files/PDBBind.zip && unzip -q PDBBind.zip && mv PDBBind_processed /content/DiffDock/data/)

time: 1.54 ms (started: 2022-10-24 01:37:33 +00:00)


# Upload Input files



**Step 3:**

1.   Upload protein and ligand file in data directory.
2.   DiffDock supports .pdb file format for protein
3.   and it supports, .sdf or .mol2, and SMILES format for ligand
4.   For example, i have saved protein as 'protein.pdb' and ligand as 'ligand.sdf'.
5.   Update the respective file names in esm embedding preparation and inference steps.
6.   Alternatively, you can also provide SMILES as an input. For example, **--ligand "COc(cc1)ccc1C#N"** instead of *--ligand ligand.sdf*





In [9]:
%cd /data
from google.colab import files
uploaded = files.upload()

[Errno 2] No such file or directory: '/data'
/content/drive/MyDrive/DiffDock_V2/DiffDock


Saving 1CG_ideal.sdf to 1CG_ideal.sdf
time: 8.89 s (started: 2023-09-25 17:50:20 +00:00)


For demo files refer my [github profile](https://github.com/suneelbvs/DiffDock)

## Install ESM and prepare PDB file for ESM

In [10]:
%cd /content/drive/MyDrive/DiffDock_V2/DiffDock
!git clone https://github.com/facebookresearch/esm
%cd /content/drive/MyDrive/DiffDock_V2/DiffDock/esm
!git checkout f07aed6 # remove/update for more up to date code
!sudo pip install -e .
%cd /content/drive/MyDrive/DiffDock_V2/DiffDock

/content/drive/MyDrive/DiffDock_V2/DiffDock
Cloning into 'esm'...
remote: Enumerating objects: 1511, done.[K
remote: Counting objects: 100% (151/151), done.[K
remote: Compressing objects: 100% (113/113), done.[K
remote: Total 1511 (delta 42), reused 126 (delta 36), pack-reused 1360[K
Receiving objects: 100% (1511/1511), 11.78 MiB | 10.28 MiB/s, done.
Resolving deltas: 100% (891/891), done.
Updating files: 100% (476/476), done.
/content/drive/MyDrive/DiffDock_V2/DiffDock/esm
error: Your local changes to the following files would be overwritten by checkout:
	examples/lm-design/paper-data/artificial_sequence_purge_ids.txt
	examples/lm-design/paper-data/uniref90_jackhmmer_purge_ids.txt
Please commit your changes or stash them before you switch branches.
Aborting
Obtaining file:///content/drive/MyDrive/DiffDock_V2/DiffDock/esm
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build edit

In [11]:
%cd /content/drive/MyDrive/DiffDock_V2/DiffDock
!python datasets/esm_embedding_preparation.py --protein_path /content/drive/MyDrive/DiffDock_V2/DiffDock/data/protein.pdb --out_file data/prepared_for_esm.fasta

/content/drive/MyDrive/DiffDock_V2/DiffDock
100% 1/1 [00:00<00:00, 12.19it/s]
time: 1.32 s (started: 2023-09-25 17:51:28 +00:00)


In [12]:
%cd /content/drive/MyDrive/DiffDock_V2/DiffDock
%env HOME=esm/model_weights
%env PYTHONPATH=$PYTHONPATH:/content/drive/MyDrive/DiffDock_V2/DiffDock/esm
!python /content/drive/MyDrive/DiffDock_V2/DiffDock/esm/scripts/extract.py esm2_t33_650M_UR50D data/prepared_for_esm.fasta data/esm2_output --repr_layers 33 --include per_tok

/content/drive/MyDrive/DiffDock_V2/DiffDock
env: HOME=esm/model_weights
env: PYTHONPATH=$PYTHONPATH:/content/drive/MyDrive/DiffDock_V2/DiffDock/esm
Downloading: "https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t33_650M_UR50D.pt" to esm/model_weights/.cache/torch/hub/checkpoints/esm2_t33_650M_UR50D.pt
Downloading: "https://dl.fbaipublicfiles.com/fair-esm/regression/esm2_t33_650M_UR50D-contact-regression.pt" to esm/model_weights/.cache/torch/hub/checkpoints/esm2_t33_650M_UR50D-contact-regression.pt
Read data/prepared_for_esm.fasta with 1 sequences
Processing 1 of 1 batches (1 sequences)
time: 1min 11s (started: 2023-09-25 17:51:47 +00:00)


## Run DiffDock

In [None]:
%cd /content/drive/MyDrive/DiffDock_V2/DiffDock
!python -m inference --protein_path data/protein.pdb --ligand data/ligand.sdf --out_dir results/singlecomplx --inference_steps 20 --samples_per_complex 40 --batch_size 10 --actual_steps 18 --no_final_step_noise
#!mv 'index0_data-testing-6w70.pdb____data-testing-6w70_ligand.sdf' out #update the folder name, if you provide custom names for inputs
#%cd ./out
#%ls

/content/drive/MyDrive/DiffDock_V2/DiffDock


## Download results

In [None]:
%cd ./results/singlecomplx
!mv 'index0_data-protein.pdb____data-ligand.sdf' out
#%cp ./data/*.*pdb
%cd ./out
%ls

/content/drive/MyDrive/DiffDock_V2/DiffDock/results/singlecomplx
/content/drive/MyDrive/DiffDock_V2/DiffDock/results/singlecomplx/out
rank10_confidence0.01.sdf   rank29_confidence-1.25.sdf
rank11_confidence0.00.sdf   rank2_confidence0.38.sdf
rank12_confidence-0.08.sdf  rank30_confidence-1.33.sdf
rank13_confidence-0.14.sdf  rank31_confidence-1.33.sdf
rank14_confidence-0.21.sdf  rank32_confidence-1.47.sdf
rank15_confidence-0.24.sdf  rank33_confidence-1.53.sdf
rank16_confidence-0.26.sdf  rank34_confidence-1.64.sdf
rank17_confidence-0.27.sdf  rank35_confidence-1.93.sdf
rank18_confidence-0.31.sdf  rank36_confidence-2.18.sdf
rank19_confidence-0.35.sdf  rank37_confidence-2.87.sdf
rank1_confidence0.44.sdf    rank38_confidence-2.96.sdf
rank1.sdf                   rank39_confidence-3.29.sdf
rank20_confidence-0.36.sdf  rank3_confidence0.38.sdf
rank21_confidence-0.44.sdf  rank40_confidence-3.41.sdf
rank22_confidence-0.45.sdf  rank4_confidence0.37.sdf
rank23_confidence-0.50.sdf  rank5_confidence0.3

In [None]:
from google.colab import output
output.enable_custom_widget_manager()

time: 4 ms (started: 2022-10-24 01:40:34 +00:00)


In [None]:
%ls

rank10_confidence0.01.sdf   rank29_confidence-1.25.sdf
rank11_confidence0.00.sdf   rank2_confidence0.38.sdf
rank12_confidence-0.08.sdf  rank30_confidence-1.33.sdf
rank13_confidence-0.14.sdf  rank31_confidence-1.33.sdf
rank14_confidence-0.21.sdf  rank32_confidence-1.47.sdf
rank15_confidence-0.24.sdf  rank33_confidence-1.53.sdf
rank16_confidence-0.26.sdf  rank34_confidence-1.64.sdf
rank17_confidence-0.27.sdf  rank35_confidence-1.93.sdf
rank18_confidence-0.31.sdf  rank36_confidence-2.18.sdf
rank19_confidence-0.35.sdf  rank37_confidence-2.87.sdf
rank1_confidence0.44.sdf    rank38_confidence-2.96.sdf
rank1.sdf                   rank39_confidence-3.29.sdf
rank20_confidence-0.36.sdf  rank3_confidence0.38.sdf
rank21_confidence-0.44.sdf  rank40_confidence-3.41.sdf
rank22_confidence-0.45.sdf  rank4_confidence0.37.sdf
rank23_confidence-0.50.sdf  rank5_confidence0.36.sdf
rank24_confidence-0.52.sdf  rank6_confidence0.34.sdf
rank25_confidence-0.60.sdf  rank7_confidence0.30.sdf
rank26_confidence-0.63



# Work In Progress: Analysis Part
**bold text**

