# Target-Specific De Novo Peptide Binder Design with DiffPepBuilder

This notebook demonstrates how to use the [DiffPepBuilder](https://github.com/YuzheWangPKU/DiffPepBuilder) package to design peptides that bind to a target protein. We provide an example of the target ALK1 (Activin Receptor-like Kinase 1, PDB ID: [6SF1](https://www.rcsb.org/structure/6SF1)) to demonstrate the procedures of generating peptide binders.

## Setup

In [1]:
#@title ### Download model assets
!git clone https://github.com/YuzheWangPKU/DiffPepBuilder.git
%cd DiffPepBuilder
!tar -xvf SSbuilder/SSBLIB.tar.gz -C SSbuilder
!wget https://zenodo.org/records/12794439/files/diffpepbuilder_v1.pth
!mkdir -p experiments/checkpoints/
!mv diffpepbuilder_v1.pth experiments/checkpoints/
%cd ..

Cloning into 'DiffPepBuilder'...
remote: Enumerating objects: 145, done.[K
remote: Counting objects: 100% (145/145), done.[K
remote: Compressing objects: 100% (123/123), done.[K
remote: Total 145 (delta 23), reused 136 (delta 18), pack-reused 0[K
Receiving objects: 100% (145/145), 4.57 MiB | 6.19 MiB/s, done.
Resolving deltas: 100% (23/23), done.
/content/DiffPepBuilder
tar: unrecognized option '--quiet'
Try 'tar --help' or 'tar --usage' for more information.
--2024-07-22 18:14:51--  https://zenodo.org/records/12794439/files/diffpepbuilder_v1.pth
Resolving zenodo.org (zenodo.org)... 188.184.103.159, 188.184.98.238, 188.185.79.172, ...
Connecting to zenodo.org (zenodo.org)|188.184.103.159|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1242313672 (1.2G) [application/octet-stream]
Saving to: ‘diffpepbuilder_v1.pth’


2024-07-22 18:16:34 (11.6 MB/s) - ‘diffpepbuilder_v1.pth’ saved [1242313672/1242313672]

/content


In [1]:
#@title ### Install dependencies
#@markdown Please restart the runtime as the warning suggests.
!pip install wget wandb fair-esm biotite pyrootutils easydict biopython tqdm ml-collections mdtraj GPUtil dm-tree tmtools
!git clone https://github.com/openmm/pdbfixer.git
%cd pdbfixer
!python setup.py install
%cd ..
!pip install hydra-core hydra-joblib-launcher

Collecting wget
  Downloading wget-3.2.zip (10 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting wandb
  Downloading wandb-0.17.5-py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.8/6.8 MB[0m [31m74.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting fair-esm
  Downloading fair_esm-2.0.0-py3-none-any.whl (93 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m93.1/93.1 kB[0m [31m18.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting biotite
  Downloading biotite-0.41.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (35.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m35.8/35.8 MB[0m [31m15.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pyrootutils
  Downloading pyrootutils-1.0.4-py3-none-any.whl (5.8 kB)
Collecting biopython
  Downloading biopython-1.84-cp310-cp310-manylinux_2_17_x86_64.manylinux2014

<IPython.core.display.Javascript object>

## Inference

In [2]:
#@title ### Preprocess receptors
%cd DiffPepBuilder
!python experiments/process_receptor.py --pdb_dir examples/receptor_data --write_dir data/receptor_data --peptide_info_path examples/receptor_data/de_novo_cases.json

/content/DiffPepBuilder
Files will be written to data/receptor_data
Finished examples/receptor_data/alk1.pdb in 0.12s
Finished processing 1/1 files. Start ESM embedding...
Model file /content/DiffPepBuilder/experiments/checkpoints/esm2_t33_650M_UR50D.pt not found. Downloading...
Model file /content/DiffPepBuilder/experiments/checkpoints/esm2_t33_650M_UR50D-contact-regression.pt not found. Downloading...
Read sequence data with 1 sequences
Processing protein sequence batches:   0% 0/1 [00:00<?, ?it/s]Processing 1 of 1 batches (1 sequences)
Processing protein sequence batches: 100% 1/1 [00:01<00:00,  1.35s/it]
100% 1/1 [00:00<00:00, 89.06it/s]


In [9]:
#@title ### Run *de novo* generation
import os
os.environ['BASE_PATH'] = "/content/DiffPepBuilder"

!torchrun --nproc-per-node=1 experiments/run_inference.py \
  data.val_csv_path=data/receptor_data/metadata_test.csv \
  experiment.use_ddp=False \
  experiment.num_gpus=1 \
  inference.sampling.min_length=12 \
  inference.sampling.max_length=16

[2024-07-22 18:40:12,090][experiments.train][INFO] - Loading checkpoint from /content/DiffPepBuilder/experiments/checkpoints/diffpepbuilder_v1.pth
[2024-07-22 18:40:17,977][data.so3_diffuser][INFO] - Computing IGSO3. Saving in /content/DiffPepBuilder/tests/cache/eps_1000_omega_1000_min_sigma_0_1_max_sigma_1_5_schedule_logarithmic
[2024-07-22 18:41:30,611][experiments.train][INFO] - Number of model parameters: 103.66 M
[2024-07-22 18:41:33,156][experiments.train][INFO] - Evaluation mode only, no checkpoint being saved.
[2024-07-22 18:41:33,159][experiments.train][INFO] - Evaluation saved to: /content/DiffPepBuilder/tests/inference_outputs/inference/22D_07M_2024Y_18h_41m
[2024-07-22 18:41:33,294][experiments.train][INFO] - Using device: cuda:0
[2024-07-22 18:41:33,306][data.pdb_data_loader][INFO] - Validation: 1 examples
  output = torch._nested_tensor_from_mask(output, src_key_padding_mask.logical_not(), mask_check=False)
[2024-07-22 18:42:32,574][experiments.train][INFO] - Done sample 

In [None]:
#@title ## Evaluation
#@markdown **Note:** This may take ~15 min
!wget https://downloads.rosettacommons.org/downloads/academic/3.14/rosetta_bin_linux_3.14_bundle.tar.bz2
!tar -xvjf rosetta_bin_linux_3.14_bundle.tar.bz2
!rm -f rosetta_bin_linux_3.14_bundle.tar.bz2
!export ROSETTA_BIN_PATH="rosetta.binary.linux.release-371/main/source/bin"