# Fast and Accurate Protein-Peptide Docking with DiffPepDock

<img src="https://github.com/YuzheWangPKU/DiffPepBuilder/blob/main/examples/figures/dpd_model.jpg?raw=true">

This notebook demonstrates how to use the DiffPepDock tool to dock batches of peptide sequences to a specific target protein. We provide an example of the redocking task of the substrate-binding protein YejA in complex with its native peptide fragment (PDB ID: [7Z6F](https://www.rcsb.org/structure/7Z6F)) to demonstrate the procedures of batch docking.

## Setup

In [None]:
#@title ### Download model assets
import os

diffpep_folder = "DiffPepBuilder"
checkpoint_file = os.path.join(diffpep_folder, "experiments", "checkpoints", "diffpepdock_v1.pth")

if not (os.path.isdir(diffpep_folder) and os.path.isfile(checkpoint_file)):
  print("Installing DiffPepDock...")

  if not os.path.isdir(diffpep_folder):
    os.system("git clone https://github.com/YuzheWangPKU/DiffPepBuilder.git")

  os.chdir(diffpep_folder)

  if not os.path.isfile(checkpoint_file):
    print("Downloading model weights...")
    os.system("wget https://zenodo.org/records/15398020/files/diffpepdock_v1.pth")
    os.makedirs("experiments/checkpoints/", exist_ok=True)
    os.system("mv diffpepdock_v1.pth experiments/checkpoints/")

  os.chdir("..")
  print("DiffPepDock is installed and ready.")

else:
  print("DiffPepDock is already installed and ready.")

Installing DiffPepBuilder...
Installing SSBLIB...
Downloading model weights...
DiffPepBuilder is installed and ready.


In [2]:
#@title ### Install dependencies
os.system("pip install wget wandb fair-esm biotite pyrootutils easydict biopython tqdm ml-collections mdtraj GPUtil dm-tree tmtools py3Dmol")

pdbfixer_folder = "pdbfixer"
if not os.path.isdir(pdbfixer_folder):
  print("Installing pdbfixer...")
  os.system("git clone https://github.com/openmm/pdbfixer.git")
  os.chdir(pdbfixer_folder)
  os.system("python setup.py install")
  os.chdir("..")
  print("pdbfixer is installed.")
else:
  print("pdbfixer is already cloned.")

os.system("pip install hydra-core hydra-joblib-launcher")

Installing pdbfixer...
pdbfixer is installed.


0

## Inference

In [None]:
#@title ### Specify receptor information
from google.colab import files
import json

os.makedirs("test_case", exist_ok=True)
receptor_type = "default (ALK1)" #@param ["default (7Z6F)", "uploaded"]

if receptor_type == "uploaded":
  uploaded_pdb = files.upload(accept=".pdb")
  file_name = next(iter(uploaded_pdb))
  os.system(f"mv {file_name} test_case/")
else:
  file_name = "7Z6F.pdb"
  os.system(f"cp DiffPepBuilder/examples/docking_data/7Z6F.pdb test_case/")
#@markdown - **Note**: please remove non-protein components from the PDB file and ensure that the CA atoms are present.

lig_chain = "A" #@param {type:"string"}
#@markdown  - Chain ID of the **reference** ligand. Please set to `None` if no reference ligand is included in the PDB file.
#@markdown  The model will prioritize reference ligand information over the binding motif if both are given.
motif = None #@param {type:"string"}

key = os.path.splitext(file_name)[0]
data = {}
if lig_chain and lig_chain != "None":
  data['lig_chain'] = lig_chain
if motif and motif != "None":
  data['motif'] = motif.replace(",", "-")

json_file_write_path = "test_case/docking_cases.json"
final_data = {key: data}
with open(json_file_write_path, 'w') as file:
  json.dump(final_data, file, indent=4)

peptide_seq_mode = "single" #@param ["single", "batch"]
if peptide_seq_mode == "single":
  peptide_seq = "VLGEPRYAFNFN" #@param {type:"string"}
  #@markdown - Peptide sequence in single-letter code.
  peptide_id = "nat" #@param {type:"string"}
  file_name = "peptide_seq.fasta"
  with open(os.path.join("test_case", file_name), 'w') as f:
    f.write(f">{peptide_id}\n{peptide_seq}\n")
  #@markdown - Peptide sequence in single-letter code.
elif peptide_seq_mode == "batch":
  peptide_fasta = files.upload(accept=".fasta")
  #@markdown - Upload a FASTA file containing multiple peptide sequences.
  file_name = next(iter(peptide_fasta))
  os.system(f"mv {file_name} test_case/")


In [None]:
#@title ### Preprocess receptor and peptide sequence data
!python DiffPepBuilder/experiments/process_batch_dock.py \
  --pdb_dir test_case \
  --write_dir test_case \
  --receptor_info_path test_case/docking_cases.json \
  --peptide_seq_path test_case/peptide_seq.fasta

Files will be written to test_case
Finished test_case/alk1.pdb in 0.04s
Finished processing 1/1 files. Start ESM embedding...
Model file /content/DiffPepBuilder/experiments/checkpoints/esm2_t33_650M_UR50D.pt not found. Downloading...
Model file /content/DiffPepBuilder/experiments/checkpoints/esm2_t33_650M_UR50D-contact-regression.pt not found. Downloading...
Read sequence data with 1 sequences
Processing protein sequence batches:   0% 0/1 [00:00<?, ?it/s]Processing 1 of 1 batches (1 sequences)
Processing protein sequence batches: 100% 1/1 [00:00<00:00,  1.07it/s]
100% 1/1 [00:00<00:00, 556.57it/s]


In [None]:
#@title ### Customize docking settings
import yaml

#@markdown #### Sampling params
denoising_steps = "200" #@param [100, 200, 500]
noise_scale = "1.0" #@param [0.5, 1.0, 1.5, 2.0, 2.5]
samples_per_sequence = 4 #@param {type:"integer"}

yaml_file_path = "DiffPepBuilder/config/docking.yaml"
with open(yaml_file_path, 'r') as file:
  yaml_data = yaml.safe_load(file)

yaml_data['data']['num_t'] = int(denoising_steps)
yaml_data['experiment']['noise_scale'] = float(noise_scale)
yaml_data['data']['num_repeat_per_eval_sample'] = int(samples_per_sequence)

with open(yaml_file_path, 'w') as file:
  yaml.dump(yaml_data, file, default_flow_style=False)


In [None]:
#@title ### Run batch docking
os.environ['BASE_PATH'] = "/content/DiffPepBuilder"

!torchrun --nproc-per-node=1 DiffPepBuilder/experiments/run_docking.py \
  data.val_csv_path=test_case/metadata_test.csv \
  experiment.use_ddp=False \
  experiment.num_gpus=1 \
  experiment.num_loader_workers=1

[2025-05-15 04:50:26,648][experiments.train][INFO] - Loading checkpoint from /content/DiffPepBuilder/experiments/checkpoints/diffpepbuilder_v1.pth
[2025-05-15 04:50:32,484][data.so3_diffuser][INFO] - Computing IGSO3. Saving in /content/DiffPepBuilder/runs/cache/eps_1000_omega_1000_min_sigma_0_1_max_sigma_1_5_schedule_logarithmic
[2025-05-15 04:51:52,768][experiments.train][INFO] - Number of model parameters: 103.66 M
[2025-05-15 04:51:57,911][experiments.train][INFO] - Evaluation mode only, no checkpoint being saved.
[2025-05-15 04:51:57,913][experiments.train][INFO] - Evaluation saved to: /content/DiffPepBuilder/runs/inference/15D_05M_2025Y_04h_51m
[2025-05-15 04:51:58,034][experiments.train][INFO] - Using device: cuda:0
[2025-05-15 04:51:58,044][data.pdb_data_loader][INFO] - Validation: 1 examples
  output = torch._nested_tensor_from_mask(
[2025-05-15 04:52:32,305][experiments.train][INFO] - Done sample alk1 (peptide length: 16, sample: 0), saved to /content/DiffPepBuilder/runs/infer

In [None]:
#@title ### Download results

!tar --directory=/content/DiffPepBuilder/runs -czf /content/docking_results.tar.gz docking
files.download("/content/docking_results.tar.gz")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### Postprocessing

Please refer to [README](https://github.com/YuzheWangPKU/DiffPepBuilder?tab=readme-ov-file#docking) to run the side chain assembly using [Rosetta](https://rosettacommons.org/software/).

- We didn’t include this step in the Colab Notebook due to its limited storage and Rosetta’s large size. Apologies for any inconvenience!