AbFlow studies the antibody design problem centered on complementary determinants (CDRS), and addresses the coupling between local generation of CDRS and all-atomic information propagation, as well as the introduction of fine-grained structural information of antigens, through the message propagation mechanism of local flow matching and antigen surface enhancement.
- Paratope-Centric Design: Focus on specific CDR-H3 regions
- Multi-CDRs Design: Design multiple complementarity-determining regions (CDRs) simultaneously
- Affinity Optimization: Optimize antibody-antigen binding affinity
- Structure Prediction: Predict antibody structure from sequence
- Python 3.10.14
- CUDA 12.4 (for GPU support)
- Conda (recommended)
git clone https://github.com/wenda8759/AbFlow.git
cd AbFlowconda env create -f environment.yaml
conda activate AbFlowpip install torch==2.6.0 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu124
pip install torch_scatter -f https://data.pyg.org/whl/torch-2.6.0+cu124.htmlpip install -r requirements.txtcd DockQ
make
cd ..All datasets and pre-trained model weights are available on Hugging Face:
π€ https://huggingface.co/wenda8759/AbFlow
# Install huggingface_hub if not already installed
pip install huggingface_hub
# Download all files
huggingface-cli download wenda8759/AbFlow --local-dir ./
# Or download specific files
huggingface-cli download wenda8759/AbFlow checkpoints/multi_cdr_design.ckpt --local-dir ./from huggingface_hub import snapshot_download, hf_hub_download
# Download entire repository
snapshot_download(repo_id="wenda8759/AbFlow", local_dir="./")
# Or download specific file
hf_hub_download(
repo_id="wenda8759/AbFlow",
filename="checkpoints/multi_cdr_design.ckpt",
local_dir="./"
)| Model | Description | File |
|---|---|---|
| Paratope-Centric Design | Design based on epitope | checkpoints/paratope_centric_design.ckpt |
| Multi-CDR Design | Design all 6 CDR regions | checkpoints/multi_cdr_design.ckpt |
| Structure Prediction | Predict antibody structure | checkpoints/structure_prediction.ckpt |
| Affinity Optimization | Optimize binding affinity | checkpoints/affinity_optimization.ckpt |
| ΞΞG Predictor | Predict binding energy changes | checkpoints/ddg_predictor.ckpt |
After downloading, organize your data as follows:
AbFlow/
βββ datasets/
β βββ RAbD/
β β βββ train.json
β β βββ valid.json
β β βββ test.json
β β βββ train.pkl
β β βββ valid.pkl
β β βββ test.pkl
β β βββ train_surf.pkl
β β βββ valid_surf.pkl
β β βββ test_surf.pkl
β βββ IgFold/
β βββ train.json
β βββ valid.json
β βββ test.json
β βββ ...
βββ checkpoints/
βββ multi_cdr_design.ckpt
βββ structure_prediction.ckpt
βββ ...
# Paratope-CDR Design
GPU=0,1 bash scripts/train/train.sh scripts/train/configs/single_cdr_design.json
# Multi-CDR Design
GPU=0,1 bash scripts/train/train.sh scripts/train/configs/multi_cdr_design.json
# Structure Prediction
GPU=0,1 bash scripts/train/train.sh scripts/train/configs/struct_prediction.json
# Affinity Optimization (a β³β³G predictor need to be trained additionally.)
GPU=0,1 bash scripts/train/train.sh scripts/train/configs/single_cdr_opt.json
GPU=0 bash scripts/train/train_predictor.sh checkpoints/cdrh3_opt.ckpt# Basic usage
GPU=0 bash scripts/test/test.sh <checkpoint> <test_set> [save_dir] [task]
# Example: Test multi-CDR design on RAbD dataset
GPU=0 bash scripts/test/test.sh \
checkpoints/multi_cdr_design.ckpt \
datasets/RAbD/test.json \
results/multi_cdr_design \
rabdGPU=0 bash scripts/test/test.sh \
checkpoints/structure_prediction.ckpt \
datasets/IgFold/test.json \
results/struct_pred \
igfoldGPU=0 bash scripts/test/optimize_test.sh \
checkpoints/affinity_optimization.ckpt \
checkpoints/ddg_predictor.ckpt \
datasets/SKEMPI/test.json \
0 \
50 \which will do 50 steps of gradient search without restrictions on the maximum number of changed residues (change 0 to any number to restrict the upperbound of
The following directory structure is required:
base_dir/
βββ prot_ids.txt # One PDB entry name per line (e.g., 1abc_H_L_A)
βββ fasta/
β βββ 1abc_H_L_A.fasta # One fasta file per entry
β βββ 2xyz_H_L_B.fasta
β βββ ...
βββ pdb/
βββ 1abc.pdb # Corresponding PDB structure files
βββ 2xyz.pdb
βββ ...
File format requirements:
-
prot_ids.txt: One entry name per line; the first 4 characters must be the PDB ID (e.g.,1abc_H_L_A) -
fasta/*.fasta: Each file must contain at least 2 sequences, labeled as Heavy chain (H), Light chain (L), and optionally Antigen chain (A):>H QVQLQESGPGLVKPSETLSLTCTVSGSSLTSYGVHWVRQPPGKGLEGLGVIWPGGSTNYNSALMSRVTI SKDNSKSQVSLKMSSLTAADTAVYYCARVTGTWYFDVWGQGTTVTVSS >L DIQMTQSPSSLSASLGDRVTISCSASQGISNYLNWYQQKPDGTVKLLIYYTSTLHSGVPSRFSGSGSGT DYTLTISSLQPEDIATYYCQQYSKLPWTFGGGTKLEIK >A LQDPCSNCPAGTFCDNNRNQICSPCPPNSFSSAGGQRTCDICRQCKGVFRTRKECSSTSNAECDCTPGFH CLGAGCSMCEQDCKQGQELTKKGCKDCCFGTFNDQKRGICRPWTNCSLDGKSVLVNGTKERDVVCGPSPA DLSPGASSVTPPAPAREPGHSPQLEGGGHHHHHH>Hdenotes the Heavy chain,>Ldenotes the Light chain, and>Adenotes the Antigen chain. Multiple antigen chains (e.g.,>A,>B) are supported; antigen chains are optional. -
pdb/: Raw PDB structure files, used bydata/download.pyin subsequent steps.
Run the following script to automatically generate summary.tsv from prot_ids.txt and the fasta/ directory:
python create_summary.py --base_dir <your/base/directory>The output summary.tsv has the following format:
pdb Hchain Lchain antigen_chain antigen_type
1abc H L A protein
Using summary.tsv and the PDB structures in pdb/, generate the standard antibody_data.json:
python data/download.py \
--summary <base_dir>/summary.tsv \
--fout <base_dir>/<dataset>.json \
--type sabdab \
--pdb_dir <base_dir>/pdb \
--numbering imgt \
--pre_numbered \
--n_cpu 8--pre_numbered: Indicates that PDB files are already IMGT-numbered; skips the renumbering step--numbering imgt: Uses the IMGT numbering scheme to parse CDR regions--n_cpu 8: Number of CPU cores for parallel processing; adjust according to your server
The output <dataset>.json is the standard dataset format required for inference.
Install MSMS (required for surface computation):
conda install -c bioconda msmsCompute the molecular surface feature file:
from data.surface import generate_surf_pkl
generate_surf_pkl("<dataset>.json",
"<surface>.pkl")The output <surface>.pkl contains the antigen surface features required for inference.
This project is licensed under the MIT License - see the LICENSE file for details.
- DockQ for protein docking quality assessment
- IgFold for antibody structure prediction baseline
- SAbDab for antibody structure database
For questions or issues, please open an issue on GitHub or contact the authors.
