## BayesDesign

Protein design for stability and conformational specificity, maximizing the p(structure|sequence) objective.


<figure>
<img src="https://github.com/dellacortelab/bayes_design/blob/master/data/figs/bayes_design.png?raw=true" width="700">
</figure>



[Stern J., Free T., Stern K., Gardiner S., Dalley N., Bundy B., Price J., Wingate D., Della Corte D. A probabilistic view of protein stability, conformational specificity, and design](https://www.biorxiv.org/content/10.1101/2022.12.28.521825v1?rss=1)

### Installs

In [None]:
!git clone https://github.com/dellacortelab/bayes_design.git
!pip install transformers==4.20.1 tokenizers==0.12.1 sentencepiece==0.1.96

### Use backbone_type pdb_id and enter a pdb_id, then hit `Runtime` -> `Run all`. Beam search with 128 beams should take about 1-4s x num_residues.

Optionally, also set the `fixed_positions` variable, setting the positions in the sequence for which you want to preserve the original amino acids.

To provide your own .pdb file with a custom backbone, use the backbone_type option "custom" and upload your .pdb file below the cell.

In [3]:
fixed_positions = []
# # Set ranges of fixed positions, like below
# fixed_positions = [10, 12, 34, 34]

In [None]:
from google.colab import files
import os

backbone_type = 'pdb_id' #@param ["pdb_id", "custom"]
#@markdown - Either provide a PDB id which is valid in the protein data bank, or upload a custom .pdb file when prompted below (after clicking `"Runtime -> Run all"`)

pdb_id = '6MRR' #@param {type:"string"}
#@markdown - The `pdb_id` argument is only necessary if `backbone_type == 'pdb'`

if backbone_type == "pdb":
  pass
elif backbone_type == "custom":
    custom_pdb_path = f"bayes_design/data"
    os.makedirs(custom_pdb_path, exist_ok=True)
    uploaded = files.upload()
    pdb_file = list(uploaded.keys())[0]
    os.rename(pdb_file, os.path.join(custom_pdb_path, pdb_file))
    pdb_id = os.path.splitext(pdb_file)[0]

model_name = "bayes_design" #@param ["bayes_design", "protein_mpnn", "xlnet"]
#@markdown - "none" = no template information is used, "pdb70" = detect templates in pdb70, "custom" - upload and search own templates (PDB or mmCIF format, see [notes below](#custom_templates))
decode_order = "n_to_c" #@param ["n_to_c", "proxmity", "reverse_proximity"]
#@markdown - "n_to_c" = decode from N-terminus to C-terminus
#@markdown - "proximity" = decode amino acids near fixed amino acids first
#@markdown - "reverse_proximity" = decode amino acids far from fixed amino acids first
decode_algorithm = "greedy" #@param ["beam", "greedy", "random", "sample"]
#@markdown - "beam" = beam search
#@markdown - "greedy" = greedy search
#@markdown - "sample" = sample decoded tokens according to probability
#@markdown - "random" = random decoding
n_beams = 128 #@param {type:"integer"}
#@markdown - number of beams, if using `beam` decode_algorithm
redesign = False #@param {type:"boolean"}
#@markdown - If `redesign == True`, we use the sequence in the provided pdb file as bidirectional context for each predicted amino acid. Otherwise (by default), we predict a sequence from only the provided backbone (and fixed amino acids).

In [None]:
import os
from subprocess import Popen, PIPE

fps = [str(pos) for pos in fixed_positions]
cmd = ['python3', 'design.py', '--model_name', f'{model_name}', '--protein_id', f'{pdb_id}', '--decode_order', f'{decode_order}', '--decode_algorithm', f'{decode_algorithm}', '--n_beams', f'{n_beams}', "--fixed_positions", *fps]
if redesign:
    cmd += ['--redesign']
print("Command:")
print(" ".join(cmd))
process = Popen(cmd, stdout=PIPE, stderr=PIPE, cwd='/content/bayes_design')
stdout, stderr = process.communicate()
print(stdout.decode('utf-8'))
print(stderr.decode('utf-8'))

To verify your designed sequence, try folding it using [AlphaFold](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb) or [ESMFold](https://esmatlas.com/resources/fold/result?fasta_header=%3Ecd1a&sequence=KTPEWWWPIINKWTMETMYYNTGTNEVTKEKPIG)