## BayesDesign

Protein design for stability and conformational specificity, maximizing the p(structure|sequence) objective.


<figure>
<img src="https://github.com/dellacortelab/bayes_design/blob/master/data/figs/bayes_design.png?raw=true" width="700">
</figure>



[Stern J., Free T., Stern K., Gardiner S., Dalley N., Bundy B., Price J., Wingate D., Della Corte D. A probabilistic view of protein stability, conformational specificity, and design](https://www.biorxiv.org/content/10.1101/2022.12.28.521825v1?rss=1)

### Installs

In [17]:
!git clone https://github.com/dellacortelab/bayes_design.git
!pip install transformers==4.20.1 tokenizers==0.12.1 sentencepiece==0.1.96

fatal: destination path 'bayes_design' already exists and is not an empty directory.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### Use backbone_type pdb_id and enter a pdb_id, then hit `Runtime` -> `Run all`


In [18]:
from google.colab import files
import os

backbone_type = 'pdb_id' #@param ["pdb_id", "custom"]
#@markdown - Either provide a PDB id which is valid in the protein data bank, or upload a custom .pdb file when prompted below (after clicking `"Runtime -> Run all"`)

pdb_id = '6MRR' #@param {type:"string"}
#@markdown - The `pdb_id` argument is only necessary if `backbone_type == 'pdb'`

if backbone_type == "pdb":
  pass
elif backbone_type == "custom":
    custom_pdb_path = f"bayes_design/data"
    os.makedirs(custom_pdb_path, exist_ok=True)
    uploaded = files.upload()
    pdb_file = list(uploaded.keys())[0]
    os.rename(pdb_file, os.path.join(custom_pdb_path, pdb_file))
    pdb_id = os.path.splitext(pdb_file)[0]

model_name = "bayes_design" #@param ["bayes_design", "protein_mpnn", "xlnet"]
#@markdown - "none" = no template information is used, "pdb70" = detect templates in pdb70, "custom" - upload and search own templates (PDB or mmCIF format, see [notes below](#custom_templates))
decode_order = "n_to_c" #@param ["n_to_c", "proxmity", "reverse_proximity"]
#@markdown - "n_to_c" = decode from N-terminus to C-terminus
#@markdown - "proximity" = decode amino acids near fixed amino acids first
#@markdown - "reverse_proximity" = decode amino acids far from fixed amino acids first
decode_algorithm = "beam" #@param ["beam", "greedy", "random", "sample"]
#@markdown - "beam" = beam search
#@markdown - "greedy" = greedy search
#@markdown - "sample" = sample decoded tokens according to probability
#@markdown - "random" = random decoding
n_beams = 128 #@param {type:"integer"}
#@markdown - number of beams, if using `beam` decode_algorithm
redesign = False #@param {type:"boolean"}
#@markdown - If `redesign == True`, we use the sequence in the provided pdb file as bidirectional context for each predicted amino acid. Otherwise (by default), we predict a sequence from only the provided backbone (and fixed amino acids).

In [19]:
!cd ./bayes_design && python3 design.py --model_name bayes_design --protein_id 6MRR --decode_order n_to_c --decode_algorithm beam --n_beams 128 --fixed_positions 67 68

Number of edges: 48
Training noise level: 0.3
Model loaded
j: 0
j: 1
j: 2
j: 3
j: 4
j: 5
j: 6
j: 7
j: 8
j: 9
j: 10
j: 11
j: 12
j: 13
j: 14
j: 15
j: 16
j: 17
j: 18
j: 19
j: 20
j: 21
j: 22
j: 23
j: 24
j: 25
j: 26
j: 27
j: 28
j: 29
j: 30
j: 31
j: 32
j: 33
j: 34
j: 35
j: 36
j: 37
j: 38
j: 39
j: 40
j: 41
j: 42
j: 43
j: 44
j: 45
j: 46
j: 47
j: 48
j: 49
j: 50
j: 51
j: 52
j: 53
j: 54
j: 55
j: 56
j: 57
j: 58
j: 59
j: 60
j: 61
j: 62
j: 63
j: 64
j: 65
j: 66
j: 67
Original sequence
GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE
Masked sequence (tokens to predict are indicated by a dash)
------------------------------------------------------------------IE
Designed sequence
HMDPELEAEKNKLEKFLKKENITNVKISLKDGCLPINVPGCNEDCKNYFCNLCKRLQSKGYRCEIKIE


To verify your designed sequence, try folding it using [AlphaFold](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb) or [ESMFold](https://esmatlas.com/resources/fold/result?fasta_header=%3Ecd1a&sequence=KTPEWWWPIINKWTMETMYYNTGTNEVTKEKPIG)