- Clone the package
git clone https://github.com/YaoYinYing/rf_diffusion_all_atom.git
cd rf_diffusion_all_atom
git switch pip-installable # switch to pip installable branch
- Download the model weights.
wget http://files.ipd.uw.edu/pub/RF-All-Atom/weights/RFDiffusionAA_paper_weights.pt
-
use the exact conda env of RF2AA:
conda activate rf2aa # suppose you have created the conda env and installed the RF2AA
-
Install the remaining dependencies:
pip install 'libclang>=13.0.0' 'protobuf<3.20,>=3.9.2' pip install -r requirements.txt
-
Install RFdiffusionAA:
pip install .
To generate a binder to the ligand OQO from PDB 7v11, run the following:
Example (ligand binder):
HYDRA_FULL_ERROR=1 rfdaa_inference inference.ckpt_path=/path/to/weights/RFdiffusionAA/RFDiffusionAA_paper_weights.pt inference.deterministic=True diffuser.T=100 inference.output_prefix=output/ligand_only/sample inference.input_pdb=input/7v11.pdb 'contigmap.contigs=[150-150]' inference.ligand=OQO inference.num_designs=1 inference.design_startnum=0
Explanation of arguments:
inference.ckpt_path
specifies the path to the checkpoint fileinference.deterministic=True
seeds the random number generators used so that results are reproducible. i.e. running with inference.design_startnum=X will produce the same results. Note that torch does not guarantee reproducibility across CPU/GPU architectures: https://pytorch.org/docs/stable/notes/randomness.htmlinference.num_designs=1
specifies that 1 design will be generated'contigmap.contigs=[150-150]'
(Please remind the single quotes) specifies that the length of the generated protein should be 150diffuser.T=100
specifies the number of denoising steps taken.
Expected outputs:
output/ligand_only/sample_0.pdb
The design PDBoutput/ligand_only/sample_0_Xt-1_traj.pdb
The partially denoised intermediate structuresoutput/ligand_only/sample_0_X0-1_traj.pdb
The predictions of the ground truth made by the network at each step
Note that the sequences associated with these structure have no meaning, apart from the given motif. LigandMPNN or similar must be used to generate sequences for the backbones if they are to be used for structure prediction / expression.
To include protein residues A430-435 in the motif, use the argument contigmap.contigs. e.g. contigmap.contigs=[\'10-120,A84-87,10-120\']
tells the model to design a protein containing the 4 residue motif A84-87 with 10-120 residues on either side.
Example (ligand binder with protein motif):
HYDRA_FULL_ERROR=1 rfdaa_inference inference.ckpt_path=/path/to/weights/RFdiffusionAA/RFDiffusionAA_paper_weights.pt inference.deterministic=True diffuser.T=200 inference.output_prefix=output/ligand_protein_motif/sample inference.input_pdb=input/1haz.pdb 'contigmap.contigs=[10-120,A84-87,10-120]' inference.ligand=CYC inference.num_designs=1 inference.design_startnum=0
- 'contigmap.length="150-150"' is dropped due to error message like
Contig string incompatible with --length range
An end-to-end design pipeline illustrating the design of heme-binding proteins using RFdiffusionAA, proteinMPNN, AlphaFold2, LigandMPNN and PyRosetta is available at: https://github.com/ikalvet/heme_binder_diffusion