<a href="https://colab.research.google.com/github/patrickbryant1/RareFold/blob/main/rarefold.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**RareFold** predicts single-chain protein structures containing rare noncanonical amino acids and enables the design of novel peptide binders through the **EvoBindRare** framework contained within this notebook.

[Read more here](https://www.biorxiv.org/content/10.1101/2025.05.19.654846v1)


RareFold is available under the [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0).  \
The RareFold parameters for prediction are made available under the terms of the [CC BY 4.0 license](https://creativecommons.org/licenses/by/4.0/legalcode). \
The design protocol EvoBindRare and the parameters for design are made available under the terms of the [CC BY-NC 4.0 license](https://creativecommons.org/licenses/by-nc/4.0/).

**You may not use these files except in compliance with the licenses.**

## Local installation
For local installation of RareFold see: https://github.com/patrickbryant1/RareFold

If you like RareFold - **please star the repo!**

## Citation
If you use RareFold, please cite:

Li Q, Daumiller D, Bryant P. RareFold: Structure prediction and design of proteins with noncanonical amino acids. bioRxiv. 2025. p. 2025.05.19.654846. doi:10.1101/2025.05.19.654846
[Link to preprint](https://www.biorxiv.org/content/10.1101/2025.05.19.654846v1)

In [None]:
#@title Install dependencies
#@markdown Make sure your runtime is GPU.
#@markdown In the menu above do: Runtime --> Change runtime type --> Hardware accelerator (set to GPU)

#@markdown **Press play.**

#@markdown Simply press play on each cell below and follow the instructions.
#@markdown The installation takes a few minutes.

#@markdown After it finishes (the play button wheel stops spinning) do: Runtime > Restart session (above).

!pip install -q --no-warn-conflicts dm-haiku==0.0.11
!pip install -q --no-warn-conflicts biopython==1.81
!pip install -q --no-warn-conflicts chex==0.1.5
!pip install -q --no-warn-conflicts dm-tree==0.1.8
!pip install -q --no-warn-conflicts immutabledict==2.0.0
!pip install -q --no-warn-conflicts scipy==1.7.3
!pip install -q --no-warn-conflicts tensorflow==2.11.0
!pip install -q --no-warn-conflicts rdkit-pypi
!pip install -q --no-warn-conflicts py3Dmol
!pip install -q --no-warn-conflicts ml-collections
!pip install -q --no-warn-conflicts numpy==1.26.4
!pip uninstall -y jax jaxlib
!pip install -q  --no-warn-conflicts jaxlib==0.4.35
!pip install -q --no-warn-conflicts 'jax[cuda12_pip]'==0.4.35 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

In [None]:
#@title Clone the RareFold github repo
import shutil
try:
  shutil.rmtree('/content/RareFold', ignore_errors=True)
except:
  print('')

!git clone https://github.com/patrickbryant1/RareFold.git

In [None]:
#@title #EvoBindRare
#@markdown Follow all steps outlined below to design a binder.
#@markdown To try the **test case** [1ssc](https://www.rcsb.org/3d-view/1SSC), press the play button to the left.
\
#@markdown If you don't want to run the test case, **change the input parameters**.

#@markdown ##Parameters
#@markdown - *TARGET_ID* - ID of the target structure
#@markdown - *TARGET_SEQUENCE* - amino acid sequence of the target protein chain
#@markdown - *BINDER_LENGTH*
#@markdown - *NCAA* - What Rare amino acids to use. Input with "," separator.
#@markdown Choose from: MSE, TPO, MLY, CME, PTR, SEP, SAH, CSO, PCA, KCX, CAS, CSD, MLZ, OCS, ALY, CSS, CSX, HIC, HYP, YCM, YOF, M3L, PFF, CGU, FTR, LLP, CAF, CMH, MHO
#@markdown - *NITER: Number of iterations* - how many iterations to optimise (default=300)
#@markdown - **Optional:** CYCLIC_OFFSET - design a cyclic peptide binder.
#@markdown - **Target MSA** - currently no MSA search is available directly in this notebook, therefore you have to provide your own MSA in a3m format and upload it here. \
#@markdown There are two ways of doing this: \
#@markdown 1. Search uniclust_30 locally with HHblits \
#@markdown 2. Go to https://toolkit.tuebingen.mpg.de/tools/hhblits \
#@markdown Paste the receptor sequence in the search field in fasta format --> Submit. \
#@markdown When the search is finished, go to the tab "Query MSA" and "Download Full A3M" \
#@markdown - Upload the MSA here: \
#@markdown Click the folder icon (Files) to the left and select the upload file icon. Upload the .a3m file.
#@markdown Make sure the MSA is named **PDBID**_receptor.a3m, where PDBID is the PDBID specified above.
import sys, os
from google.colab import files
import pandas as pd
import numpy as np
import urllib.request
import py3Dmol
import matplotlib.pyplot as plt
import glob
sys.path.insert(0,'/content/RareFold/src')
sys.path.insert(0,'/content/RareFold/src/colab')
TARGET_ID = "1ssc" #@param {type:"string"}
TARGET_SEQUENCE = "KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFVHESLADVQAVCSQKNVACKNGQTNCYQSYSTMSITDCRETGSSKYPNCAYKTTQANKHIIVACEG" #@param {type:"string"}
BINDER_LENGTH =  10#@param {type:"integer"}
NCAA = "MSE,MLY,PTR,SEP,TPO,MLZ,ALY,HIC,HYP,M3L,PFF,MHO" #@param {type:"string"}
NITER =  300#@param {type:"integer"}
CYCLIC_OFFSET = True # @param {type:"boolean"}
TARGET_MSA = "1ssc_receptor.a3m" #@param {type:"string"}
OUTDIR="/content/"+TARGET_ID+'/'
#Make outdir
if not os.path.exists(OUTDIR):
  os.mkdir(OUTDIR)

#Check the MSA
if TARGET_ID=='1ssc':
  TARGET_MSA = '/content/RareFold/data/design_test_case/'+TARGET_ID+'/'+TARGET_MSA
else:
  TARGET_MSA ='/content/'+TARGET_MSA
if not os.path.exists(TARGET_MSA):
  print("Can't find MSA:",TARGET_MSA)
else:
  print('Using MSA:',TARGET_MSA)

from check_msa_colab import process_a3m
PROCESSED_MSA=TARGET_MSA.split('.')[0]+'_processed.a3m'
process_a3m(TARGET_MSA, TARGET_SEQUENCE, PROCESSED_MSA)
TARGET_MSA=PROCESSED_MSA
#Write fasta
TARGET_FASTA=OUTDIR+TARGET_ID+'_receptor.fasta'
with open(TARGET_FASTA, 'w') as file:
  file.write('>'+TARGET_ID+'\n')
  file.write(TARGET_SEQUENCE)


NCAA = NCAA.split(',')
NCAA = [x.strip() for x in NCAA]
print('Using rare amino acids', NCAA)

Using MSA: /content/RareFold/data/design_test_case/1ssc/1ssc_receptor.a3m
Using rare amino acids ['MSE', 'MLY', 'PTR', 'SEP', 'TPO', 'MLZ', 'ALY', 'HIC', 'HYP', 'M3L', 'PFF', 'MHO']


In [None]:
#@markdown #Run the design

#@markdown Click play to design a binder.

#@markdown The whole process will take approximately **7 hours** (for 300 iterations). Relax and wait for your binder.
#@markdown The run will continue where you left it if it was interrupted for some reason.

#@markdown The RareFold design params are fetched here (if they are not already downloaded).
sys.path.insert(0,'/content/RareFold/src/colab')
sys.path.insert(0,'/content/RareFold/src')
import shutil
import collections
import pickle
import numpy as np
collections.Iterable = collections.abc.Iterable
from rarefold.model import config
#Update config
config.CONFIG.model.embeddings_and_evoformer['cyclic_offset'] = CYCLIC_OFFSET

PARAMS="/content/rf_params/design_params.npy"
if not os.path.exists(PARAMS):
  if not os.path.exists('/content/rf_params'):
    os.mkdir('/content/rf_params')
  !wget https://zenodo.org/records/14892196/files/finetuned_params25000.npy
  !mv /content/finetuned_params25000.npy $PARAMS
else:
  print('Parameters for design exists.')


#Make MSA feats
from colab.make_msa_seq_feats import process
#Get feats
feature_dict = process(TARGET_FASTA, [TARGET_MSA])

#Write out features as a pickled dictionary.
features_output_path = os.path.join(OUTDIR, 'msa_features.pkl')
with open(features_output_path, 'wb') as f:
    pickle.dump(feature_dict, f, protocol=4)
print('Saved MSA features to',features_output_path)

#Design
print('Starting design...')
print('Using Rare Amino Acids', NCAA)
from colab.mc_design_improved import design_binder
MSA_feats = np.load(features_output_path, allow_pickle=True)
design_binder(config.CONFIG,
            TARGET_ID,
            MSA_feats,
            num_recycles=3,
            binder_length=BINDER_LENGTH,
            num_iterations=NITER,
            resample_every_n=100,
            batch_size=1,
            params=PARAMS,
            rare_AAs=NCAA,
            save_best_only=True,
            outdir=OUTDIR)

In [None]:
#@markdown #Analyse the results
#@markdown Only the best model is visualised. As a rule of thumb, a **plDDT value above 85** represents a reliable binder.

#@markdown Click the DOWNLOAD box to download the top models and their sequences.

RECEPTOR_STYLE = "cartoon" #@param ["cartoon", "sphere", "stick"]
BINDER_STYLE = "stick" #@param ["cartoon", "sphere", "stick"]
DOWNLOAD = False #@param {type:"boolean"}
metrics = pd.read_csv(OUTDIR+'metrics.csv')

#Convert
from ast import literal_eval
metrics_conv = {}
for col in ['if_dist_binder', 'plddt', 'inter_clash_frac','intra_clash_frac', 'loss', 'sequence', 'int_seq']:
  list_col = [literal_eval(x)[0] for x in metrics[col]]
  metrics_conv[col] = list_col
metrics_conv['iteration'] = metrics.iteration.values

metrics_conv = pd.DataFrame.from_dict(metrics_conv)
metrics_conv = metrics_conv.sort_values(by='loss').reset_index()
#Print
print('The best iteration, sequence, loss and plDDT value is:')
print(metrics_conv.loc[0].iteration, metrics_conv.loc[0]['sequence'], metrics_conv.loc[0]['loss'],  metrics_conv.loc[0]['plddt'])


#Vis
view = py3Dmol.view(js='https://3dmol.org/build/3Dmol.js',)
top_model = metrics_conv.loc[0].iteration
if top_model=='init':
  model_path = OUTDIR+top_model+'_0.pdb'
else:
  model_path = OUTDIR+'unrelaxed_'+top_model+'_0.pdb'
view.addModel(open(model_path,'r').read(),'pdb')
view.setStyle({'chain':'A'},{RECEPTOR_STYLE: {'color':'green'}})
view.setStyle({'chain':'B'},{BINDER_STYLE: {'color':'cyan'}})
view.setStyle({'chain':'C'},{BINDER_STYLE: {'color':'magenta'}})

view.zoomTo()
view.show()

#@title Download the results
import shutil
if not os.path.exists(OUTDIR+'best_models'):
  os.mkdir(OUTDIR+'best_models')

#Download
if DOWNLOAD==True:
  rank=1
  shutil.copy(model_path, OUTDIR+'best_models/rank_'+str(rank)+'.pdb')

  for file in glob.glob(OUTDIR+'best_models/rank_*.pdb'):
    files.download(file)

The best iteration, sequence, loss and plDDT value is:
3 VAL-LEU-M3L-MLY-HIC-MLY-CYS-ARG-TYR-TPO 0.3911007946009666 15.415902
