# RFDiffusion

RFDiffusion is a state-of-the-art method for computational protein structure design. Many possible use cases are demonstrated in this notebook.

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
from prtm import protein
from prtm.models.rfdiffusion import config
from prtm.models.rfdiffusion.modeling import RFDiffusionForStructureDesign
from prtm.models.rfdiffusion.samplers import UnconditionalSampler, SelfConditioningSampler, ScaffoldedSampler

There are 8 different sets of model weights for RFDiffusion. The right weights depend on the use case. When `model_name` is set to `auto` the model weights to load are determined by the input inference parameters. Unlike other pipelines, this means that the model is loaded at runtime instead of at instantiation.

In [None]:
designer = RFDiffusionForStructureDesign(model_name="auto", random_seed=0)

## Unconditional Design

First, let's look at the simplest design setup where structures are either unconstrained or only lightly constrained by a guiding potential.

The `UnconditionalSamplerConfig` is used when there are no reference structures or scaffolds to be used in design. The only argument that must be set for this sampler is `contigmap_params`. `RFDiffusion` uses a `ContigMap` and `contig` string to specify the lengths of generated chains and the number of chains. For example:

- ["50-50"] will create a single chain of exactly 50 residues.
- ["50-100"] will create a single chain with anywhere from 50-100 residues.

In addition to specifying the length of the protein to be designed, we can provide symmetry and potential constraints to the sampler. Choices for symmetry are:

- `tetrahedral`
- `octahedral`
- `icosahedral`
- C symmetries (`C2`, `C3`, ...)
- D symmetries (`D2`, `D3`, ...)

Examples of using symmetries and potentials are provided in the follow sections of this notebook.

### Random Length

Generate a structure with an exact number of residues.

In [None]:
sampler_config = config.UnconditionalSamplerConfig(
    contigmap_params=config.ContigMap(contigs=["50-50"]),
)
designed_structure, _ = designer(sampler_config)

In [None]:
# Note: all residues in the structure are glycines which have no sidechains
designed_structure.show()

### Monomer ROG Potential

Add a potential during design to encourage a smaller radius of gyration for the monomer.

In [None]:
sampler_config = config.UnconditionalSamplerConfig(
    # Variable number of residues from 100 to 200
    contigmap_params=config.ContigMap(contigs=["100-200"]),
    # Potentials are defined in potentials.py and arguments are delimited by commas and colons
    potentials_params=config.PotentialsParams(
        guiding_potentials=["type:monomer_ROG,weight:1,min_dist:5"],
        guide_scale=2,
        guide_decay="quadratic",
    ),
)
designed_structure, _ = designer(sampler_config)

In [None]:
designed_structure.show()

### Contact Potential

Add a potential during design to encourage more contacts in a monomer.

In [None]:
sampler_config = config.UnconditionalSamplerConfig(
    contigmap_params=config.ContigMap(contigs=["100-200"]),
    potentials_params=config.PotentialsParams(guiding_potentials=["type:monomer_contacts,weight:0.05"]),
)
designed_structure, _ = designer(sampler_config)

In [None]:
designed_structure.show()

### Tetrahedral Oligos

Generate a protein with tetrahedral symmetry and add a potential to encourage more contacts within and between chains.

In [None]:
sampler_config = config.UnconditionalSamplerConfig(
    contigmap_params=config.ContigMap(contigs=["240-240"]),
    symmetry_params=config.SymmetryParams(symmetry="tetrahedral"),
    potentials_params=config.PotentialsParams(
        guiding_potentials=["type:olig_contacts,weight_intra:1,weight_inter:0.1"],
        olig_inter_all=True,
        olig_intra_all=True,
        guide_scale=2,
        guide_decay="quadratic",
    ),
)
designed_structure, _ = designer(sampler_config)

In [None]:
designed_structure.show()

### Cyclic Oligos

Generate a protein with 6-fold cyclic symmetry and add a potential to encourage more contacts within and between chains.

In [None]:
sampler_config = config.UnconditionalSamplerConfig(
    contigmap_params=config.ContigMap(contigs=["90-90"]),
    symmetry_params=config.SymmetryParams(symmetry="C6"),
    potentials_params=config.PotentialsParams(
        guiding_potentials=["type:olig_contacts,weight_intra:1,weight_inter:0.1"], 
        olig_intra_all=True, 
        olig_inter_all=True, 
        guide_scale=2.0, 
        guide_decay="quadratic",
    ),
)
designed_structure, _ = designer(sampler_config)

In [None]:
designed_structure.show()

### Dihedral Oligos

Generate a protein with dihedral symmetry and add a potential to encourage more contacts within and between chains.

In [None]:
sampler_config = config.UnconditionalSamplerConfig(
    contigmap_params=config.ContigMap(contigs=["120-120"]),
    symmetry_params=config.SymmetryParams(symmetry="D2"),
    potentials_params=config.PotentialsParams(
        guiding_potentials=["type:olig_contacts,weight_intra:1,weight_inter:0.1"], 
        olig_intra_all=True, 
        olig_inter_all=True, 
        guide_scale=2.0, 
        guide_decay="quadratic",
    ),
)
designed_structure, _ = designer(sampler_config)

In [None]:
designed_structure.show()

## Self Conditioning Design

Now we'll look at a more sophisticated design problem where a reference structure is provided for motifscaffolding and binder design.

The `SelfConditioningSamplerConfig` is used with a reference structure. The reference structure can be any `protein.Protein` object. Like the `UnconditionalSamplerConfig` we can set potentials and symmetries, but now we can specify `contigs` that incorporate residues from a reference structure. For example:

- ["30-50/B10-20/40-70"] will scaffold on the `B` chain of a provided reference structure with 30-50 residues before and 40-70 residues after the specified segment of chain `B`.
- ["5-15/A10-25/30-40/0 B1-100"] will scaffold on chain `A` while accounting for residues 1-100 in chain `B`. The `\0` together with a trailing space denotes a chain break.

### Motifscaffolding

Design a structure using a subchain from a reference structure as the scaffold.

In [None]:
with open('./data/5tpn.pdb', mode="r") as f:
    pdb_str = f.read()

reference_structure = protein.Protein14.from_pdb_string(pdb_str, parse_hetatom=True)

In [None]:
sampler_config = config.SelfConditioningSamplerConfig(
    inference_params=config.InferenceParams(reference_structure=reference_structure),
    contigmap_params=config.ContigMap(contigs=["10-40/A163-181/10-40"]),
)
designed_structure, _ = designer(sampler_config)

In [None]:
designed_structure.show()

### Motifscaffolding with target

Design a structure with respect to a particular target chain by using a subchain from a reference structure as a scaffold.

In [None]:
# This could have been done by reinstantiating the pipeline, but the `set_model` method is a convenience
# to change the loaded model
designer.set_model("complex_base")

In [None]:
with open('./data/1ycr.pdb', mode="r") as f:
    pdb_str = f.read()

reference_structure = protein.Protein14.from_pdb_string(pdb_str, parse_hetatom=True)

In [None]:
sampler_config = config.SelfConditioningSamplerConfig(
    inference_params=config.InferenceParams(reference_structure=reference_structure),
    # Length here denotes the total allowable length range of the generated oligomer
    contigmap_params=config.ContigMap(contigs=["A25-109/0 0-70/B17-29/0-70"], length="70-120"),
)
designed_structure, _ = designer(sampler_config)

In [None]:
designed_structure.show()

### Enzyme

Design a structure to bind with an enzyme, use a potential to encourage contacts with the substrate.

In [None]:
designer.set_model("active_site")

In [None]:
with open("./data/5an7.pdb", mode="r") as f:
    pdb_str = f.read()
    
reference_structure = protein.Protein14.from_pdb_string(pdb_str, parse_hetatom=True)

In [None]:
sampler_config = config.SelfConditioningSamplerConfig(
    inference_params=config.InferenceParams(reference_structure=reference_structure),
    contigmap_params=config.ContigMap(contigs=["10-100/A1083-1083/10-100/A1051-1051/10-100/A1180-1180/10-100"]),
    potentials_params=config.PotentialsParams(
        guiding_potentials=["type:substrate_contacts,s:1,r_0:8,rep_r_0:5.0,rep_s:2,rep_r_min:1"], 
        guide_scale=1,
        guide_decay="quadratic",
        substrate="LLK",
    ),
)
designed_structure, _ = designer(sampler_config)

In [None]:
designed_structure.show()

### Nickel Motif

Design a symmetric structure where each chain scaffolds on top of the same reference chain.

In [None]:
designer.set_model("base_epoch8")

In [None]:
with open('./data/nickel_motif.pdb', mode="r") as f:
    pdb_str = f.read()
    
reference_structure = protein.Protein14.from_pdb_string(pdb_str, parse_hetatom=True)

In [None]:
sampler_config = config.SelfConditioningSamplerConfig(
    inference_params=config.InferenceParams(reference_structure=reference_structure),
    contigmap_params=config.ContigMap(contigs=["50/A2-4/50/0 50/A7-9/50/0 50/A12-14/50/0 50/A17-19/50/0"]),
    symmetry_params=config.SymmetryParams(symmetry="C4"),
    potentials_params=config.PotentialsParams(
        guiding_potentials=["type:olig_contacts,weight_intra:1,weight_inter:0.06"],
        olig_inter_all=True,
        olig_intra_all=True,
        guide_scale=2,
        guide_decay="quadratic",
    ),
)
designed_structure, _ = designer(sampler_config)

In [None]:
designed_structure.show()

### Insulin PPI

Design a structure for a target with hotspot residues.

In [None]:
designer.set_model("auto")

In [None]:
with open('./data/insulin_target.pdb', mode="r") as f:
    pdb_str = f.read()

reference_structure = protein.Protein14.from_pdb_string(pdb_str, parse_hetatom=True)

In [None]:
sampler_config = config.SelfConditioningSamplerConfig(
    inference_params=config.InferenceParams(reference_structure=reference_structure),
    contigmap_params=config.ContigMap(contigs=["A1-150/0 70-100"]),
    ppi_params=config.PPIParams(hotspot_res=["A59", "A83", "A91"]),
    denoiser_params=config.DenoiserParams(noise_scale_ca=0, noise_scale_frame=0),
)
designed_structure, _ = designer(sampler_config)

In [None]:
designed_structure.show()

### Insulin PPI Beta Model

Design a structure for a target with hotspot residues using a model that favors generating structures with beta sheets.

In [None]:
designer.set_model("complex_beta")

In [None]:
with open('./data/insulin_target.pdb', mode="r") as f:
    pdb_str = f.read()

reference_structure = protein.Protein14.from_pdb_string(pdb_str, parse_hetatom=True)

In [None]:
sampler_config = config.SelfConditioningSamplerConfig(
    inference_params=config.InferenceParams(reference_structure=reference_structure),
    contigmap_params=config.ContigMap(contigs=["A1-150/0 70-100"]),
    ppi_params=config.PPIParams(hotspot_res=["A59", "A83", "A91"]),
    denoiser_params=config.DenoiserParams(noise_scale_ca=0, noise_scale_frame=0),
)
designed_structure, _ = designer(sampler_config)

In [None]:
designed_structure.show()

## Sequence Inpainting

Sequence inpainting uses a reference structure as a scaffold but RFDiffusion is allowed to redesign parts of the scaffold as needed.

In [None]:
designer.set_model("auto")

In [None]:
with open("./data/5tpn.pdb", mode="r") as f:
    pdb_str = f.read()
    
reference_structure = protein.Protein14.from_pdb_string(pdb_str, parse_hetatom=True)

In [None]:
sampler_config = config.SelfConditioningSamplerConfig(
    inference_params=config.InferenceParams(reference_structure=reference_structure),
    contigmap_params=config.ContigMap(contigs=["10-40/A163-181/10-40"], inpaint_seq=["A163-168/A170-171/A179"]),
)
designed_structure, _ = designer(sampler_config)

In [None]:
designed_structure.show()

## Partial Diffusion

Partial diffusion adds some noise to a reference structure and then denoises the reference structure.

### Without Sequence

Noise the whole reference structure and then denoise it.

In [None]:
with open('./data/2kl8.pdb', mode="r") as f:
    pdb_str = f.read()

reference_structure = protein.Protein14.from_pdb_string(pdb_str, parse_hetatom=True)

In [None]:
diffuser_config_override = config.DiffuserConfig(partial_T=10)
sampler_config = config.SelfConditioningSamplerConfig(
    inference_params=config.InferenceParams(reference_structure=reference_structure),
    contigmap_params=config.ContigMap(contigs=["79-79"]),
)
designed_structure, _ = designer(sampler_config, diffuser_config_override=diffuser_config_override)

In [None]:
designed_structure.show()

### With Sequence

Noise a simplistic structure and peptide binding model and denoise it to make something more plausible.

In [None]:
with open('./data/peptide_complex.pdb', mode="r") as f:
    pdb_str = f.read()

reference_structure = protein.Protein14.from_pdb_string(pdb_str, parse_hetatom=True)

In [None]:
diffuser_config_override = config.DiffuserConfig(partial_T=10)
sampler_config = config.SelfConditioningSamplerConfig(
    inference_params=config.InferenceParams(reference_structure=reference_structure),
    contigmap_params=config.ContigMap(contigs=["172-172/0 34-34"], provide_seq=["172-205"]),
)
designed_structure, _ = designer(sampler_config, diffuser_config_override=diffuser_config_override)

In [None]:
designed_structure.show()

### With Multisequence

Noise a simplistic structure and peptide binding model and denoise it to make something more plausible, but this time specify multiple disjoint sequences to hold constant.

In [None]:
with open('./data/peptide_complex.pdb', mode="r") as f:
    pdb_str = f.read()

reference_structure = protein.Protein14.from_pdb_string(pdb_str, parse_hetatom=True)

In [None]:
diffuser_config_override = config.DiffuserConfig(partial_T=10)
sampler_config = config.SelfConditioningSamplerConfig(
    inference_params=config.InferenceParams(reference_structure=reference_structure),
    contigmap_params=config.ContigMap(contigs=["172-172/0 34-34"], provide_seq=["172-177,200-205"]),
)
designed_structure, _ = designer(sampler_config, diffuser_config_override=diffuser_config_override)

In [None]:
designed_structure.show()

## Scaffold Guided

Here we can specify a target protein and tell RFdiffusion that we want to do "scaffoldguided" diffusion (i.e. we want to specify the fold of the protein).

The `ScaffoldedSamplerConfig` can be used with or without a reference structure. For scaffoldguided design, a list of scaffold structures is provided and, optionally, a target structure (e.g., binder) can be given. Like other samplers, we can define potentials and symmetries as desired.

### TIM Barrel

Scaffoldguided design using a `TIM` barrel as the scaffold structure.

In [None]:
with open('./data/1qys.pdb', mode="r") as f:
    pdb_str = f.read()
    
reference_structure = protein.Protein14.from_pdb_string(pdb_str, parse_hetatom=True)

with open('./data/tim10.pdb', mode="r") as f:
    scaffold_pdb_str = f.read()

scaffold_structure = protein.Protein14.from_pdb_string(scaffold_pdb_str, parse_hetatom=True)

In [None]:
sampler_config = config.ScaffoldedSamplerConfig(
    inference_params=config.InferenceParams(reference_structure=reference_structure),
    denoiser_params=config.DenoiserParams(noise_scale_ca=0.5, noise_scale_frame=0.5),
    scaffoldguided_params=config.ScaffoldGuidedParams(
        target_structure=None,
        target_adj=False,
        target_ss=False,
        scaffold_structure_list=[scaffold_structure],
        sampled_insertion="0-5",
        sampled_N="0-5",
        sampled_C="0-5",
    )
)
designed_structure, _ = designer(sampler_config)

In [None]:
designed_structure.show()

### PPI Scaffolded

Scaffoldguided design with a binder and hotspot residues.

In [None]:
with open('./data/1qys.pdb', mode="r") as f:
    pdb_str = f.read()
reference_structure = protein.Protein14.from_pdb_string(pdb_str, parse_hetatom=True)

with open('./data/insulin_target.pdb', mode="r") as f:
    target_pdb_str = f.read()
target_structure = protein.Protein14.from_pdb_string(target_pdb_str, parse_hetatom=True)

with open('./data/5L33.pdb', mode="r") as f:
    scaffold_pdb_str = f.read()
scaffold_structure = protein.Protein14.from_pdb_string(scaffold_pdb_str, parse_hetatom=True)

In [None]:
sampler_config = config.ScaffoldedSamplerConfig(
    inference_params=config.InferenceParams(reference_structure=reference_structure),
    denoiser_params=config.DenoiserParams(noise_scale_ca=0, noise_scale_frame=0),
    ppi_params=config.PPIParams(hotspot_res=["A59", "A83", "A91"]),
    scaffoldguided_params=config.ScaffoldGuidedParams(
        target_structure=target_structure,
        target_adj=True,
        target_ss=True,
        scaffold_structure_list=[scaffold_structure],
        sampled_insertion="0-5",
        sampled_N="0-5",
        sampled_C="0-5",
    )
)
designed_structure, _ = designer(sampler_config)

In [None]:
designed_structure.show()