# Protein Generator

ProteinGenerator is a method for jointly designing a protein sequence and structure.

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
from proteome import protein
from proteome.models.protein_generator import config
from proteome.models.protein_generator.modeling import ProteinGeneratorForJointDesign

Like RFDiffusion there are a few ProteinGenerator models for different tasks. The right weights depend on the use case. When `model_name` is set to `auto` the model weights to load are determined by the input inference parameters.

In [None]:
designer = ProteinGeneratorForJointDesign(model_name="auto", random_seed=0)

ProteinGenerator uses a `ContigMap` and `contig` string to specify the lengths of generated chains, the number of chains, and any scaffolds from a reference structure.

- ["50"] will create a single chain of exactly 50 residues.
- ["50-100"] will create a single chain with anywhere from 50-100 residues.

## Unconditional Design

Simplest example of generating a sequence and structure with a fixed number of residues.

In [None]:
designed_structure, designed_sequence, _ = designer(
    config.InferenceConfig(
        contigmap_params=config.ContigMap(contigs=["100"]),
    ),
)

In [None]:
# Unlike other design algorithms, the sidechains we get with ProteinGenerator aren't trivial
# because the sequence is learned instead of being defaulted to glycine
designed_structure.show()

In [None]:
designed_sequence

## Symmetric Design

Design a structure and sequence with symmetry.

The symmetry design parameters are less sophisticated than RFDiffusion, we can only specify an n-fold symmetry and the chains to generate via the `contigs`.

In [None]:
designed_structure, designed_sequence, _ = designer(
    config.InferenceConfig(
        diffuser_params=config.DiffuserParams(T=50),
        contigmap_params=config.ContigMap(contigs=["25/0 25/0 25/0"]),
        symmetry_params=config.SymmetryParams(symmetry=3),
    ),
)

In [None]:
designed_structure.show()

In [None]:
designed_sequence

## Weighted Sequence

Weight sequence design to have a particular composition.

In [None]:
designed_structure, designed_sequence, _ = designer(
    config.InferenceConfig(
        contigmap_params=config.ContigMap(contigs=["100"]),
        potentials_params=config.PotentialsParams(
            # Design a sequence with 20% Ws (i.e., trytophans)
            potentials=[config.AACompositionalBiasParams(aa_composition="W0.2")],
            potential_scales=[1.75],
        ),
    ),
)

In [None]:
designed_structure.show()

In [None]:
designed_sequence

## Motif Scaffolding

Design around a given motif from a reference structure.

In [None]:
with open("./data/rsv5_5tpn.pdb", mode="r") as f:
    reference_pdb_str = f.read()

reference_structure = protein.Protein27.from_pdb_string(reference_pdb_str)

In [None]:
designed_structure, designed_sequence, _ = designer(
    config.InferenceConfig(
        reference_structure=reference_structure,
        # Only the structure is considered for the scaffold, not the sequence
        contigmap_params=config.ContigMap(contigs=["0-25/A163-181/25-30"]),
    ),
)

In [None]:
designed_structure.show()

In [None]:
designed_sequence

## Sequence Conditioning

Design around a particular reference sequence.

In [None]:
designed_structure, designed_sequence, _ = designer(
    config.InferenceConfig(
        # X means designable residue
        sequence="XXXXXXXXXXXXXXXXPEPSEQXXXXXXXXXXXXXXXX",
        # Sequence length is already given, no need to provide a contig
        contigmap_params=config.ContigMap(),
    ),
)

In [None]:
designed_structure.show()

In [None]:
designed_sequence

## Binder Design

Design a binder to hotspot residues in a reference structure.

In [None]:
with open("./data/cd86.pdb", mode="r") as f:
    reference_pdb_str = f.read()

reference_structure = protein.Protein27.from_pdb_string(reference_pdb_str)

In [None]:
designed_structure, designed_sequence, _ = designer(
    config.InferenceConfig(
        reference_structure=reference_structure,
        contigmap_params=config.ContigMap(contigs=["B1-110/0 25-75"]),
        hotspot_params=config.HotspotParams(hotspot_res=["B40", "B32", "B87", "B96", "B30"]),
    ),
)

In [None]:
designed_structure.show()

In [None]:
designed_sequence

## Design with Secondary Structure

ProteinGenerator allows specifying a desired secondary structure in a few different formats.

### Secondary structure DSSP string

Design from a secondary structure string.

The code for DSSP secondary structures follows:

- H: a-helix
- G: 310 helix
- I: p-helix
- E: extended beta sheet
- B: beta bridge
- S: bend
- T: helix turn
- L: other/loop
- X: anything

In [None]:
secondary_structure_str = "XXXXXHHHHXXXLLLXXXXXXXXXXHHHHXXXLLLXXXXXXXXXXHHHHXXXLLLXXXXXXXXXXHHHHXXXLLLXXXXXXXXXXHHHHXXXLLLXXXXX"

In [None]:
designed_structure, designed_sequence, _ = designer(
    config.InferenceConfig(
        contigmap_params=config.ContigMap(contigs=["100"]),
        secondary_structure_params=config.SecondaryStructureParams(
            secondary_structure=secondary_structure_str
        ),
    ),
)

In [None]:
designed_structure.show()

In [None]:
designed_sequence

### Secondary structure bias

Adjust a bias parameter to encourage designed structures toward a certain secondary structure.

In [None]:
designed_structure, designed_sequence, _ = designer(
    config.InferenceConfig(
        contigmap_params=config.ContigMap(contigs=["100"]),
        structure_bias_params=config.StructureBiasParams(helix_bias=0.01, strand_bias=0.01)
    ),
)

In [None]:
designed_structure.show()

In [None]:
designed_sequence

### Reference protein secondary structure

Compute the secondary structure of a reference protein and use it to guide design.

In [None]:
with open("./data/cd86.pdb", mode="r") as f:
    dssp_pdb_str = f.read()

dssp_structure = protein.Protein27.from_pdb_string(dssp_pdb_str)

In [None]:
designed_structure, designed_sequence, _ = designer(
    config.InferenceConfig(
        contigmap_params=config.ContigMap(contigs=["110"]),
        secondary_structure_params=config.SecondaryStructureParams(dssp_structure=dssp_structure),
    ),
)

In [None]:
designed_structure.show()

In [None]:
designed_sequence

## Partial Diffusion

Start with a noisy structure or sequence and denoise into something more realistic.

### Structure denoising

Denoise a reference structure

In [None]:
with open("./data/design_000000.pdb", mode="r") as f:
    reference_pdb_str = f.read()

reference_structure = protein.Protein27.from_pdb_string(reference_pdb_str)

In [None]:
designed_structure, designed_sequence, _ = designer(
    config.InferenceConfig(
        reference_structure=reference_structure,
        diffuser_params=config.DiffuserParams(T=50),
        contigmap_params=config.ContigMap(contigs=["38"]),
        sampling_temp=0.3,
    ),
)

In [None]:
designed_structure.show()

In [None]:
designed_sequence

### Sequence denoising

Denoise a reference sequence.

In [None]:
designed_structure, designed_sequence, _ = designer(
    config.InferenceConfig(
        sequence="SAKVEELLETAKALGISEEEVREILELLEAGFIVIEVVSLGDAVILILENKKLGKYYILKNGEIERIKKPENARELKRKIAEILNISVEEIEAIIEKLRAK",
        diffuser_params=config.DiffuserParams(T=50),
        contigmap_params=config.ContigMap(),
        sampling_temp=0.3,
    ),
)

In [None]:
designed_structure.show()

In [None]:
designed_sequence