
---

<div align="center">    
 
# ImmuneBuilder: Deep-Learning models fo predicting the structures of immune proteins 

</div>

---

Immune receptor proteins play a key role in the immune system and have shown great promise as biotherapeutics. The structure of these proteins is critical for understanding their antigen binding properties. Here, we present ImmuneBuilder, a set of deep learning models trained to accurately predict the structure of antibodies (ABodyBuilder2), nanobodies (NanoBodyBuilder2) and T-Cell receptors (TCRBuilder2). We show that ImmuneBuilder generates structures with state of the art accuracy while being far faster than AlphaFold2. For example, on a benchmark of 34 recently solved antibodies, ABodyBuilder2 predicts CDR-H3 loops with an RMSD of 2.81Å, a 0.09Å improvement over AlphaFold-Multimer, while being over a hundred times faster. Similar results are also achieved for nanobodies, (NanoBodyBuilder2 predicts CDR-H3 loops with an average RMSD of 2.89Å, a 0.55Å improvement over AlphaFold2) and TCRs. By predicting an ensemble of structures, ImmuneBuilder also gives an error estimate for every residue in its final prediction.


In [1]:
#@title Input chain sequence(s), then hit `Runtime` -> `Run all`
import sys
python_version = f"{sys.version_info.major}.{sys.version_info.minor}"

#@markdown Select what type of immune protein you are modelling

protein_type = "Antibody" #@param ["Antibody", "Nanobody","TCR"]

#@markdown Insert the sequence for the variable domain. If modelling a Nanobody, only use one of the fields

sequence_1 = 'VKLLEQSGAEVKKPGASVKVSCKASGYSFTSYGLHWVRQAPGQRLEWMGWISAGTGNTKYSQKFRGRVTFTRDTSATTAYMGLSSLRPEDTAVYYCARDQAGYTGGKSEFDYWGQGTLVTVSS' #@param {type:"string"}
sequence_2 = 'ELVMTQSPSSLSASVGDRVNIACRASQGISSALAWYQQKPGKAPRLLIYDASNLESGVPSRFSGSGSGTDFTLTISSLQPEDFAIYYCQQFNSYPLTFGGGTKVEIKRTV' #@param {type:"string"}

# remove whitespaces
sequence_1 = "".join(sequence_1.split())
sequence_2 = "".join(sequence_2.split())

#@markdown Insert the output file name

filename = 'ImmuneBuilder_model.pdb' #@param {type:"string"}




In [2]:
#@title Install dependencies
%%capture
%%bash -s $python_version

#@markdown This script will download and install the ImmuneBuilder code, ANARCI and OpenMM

PYTHON_VERSION=$1
set -e

if [ ! -f CODE_READY ]; then
  # install dependencies
  pip install ImmuneBuilder 2>&1 1>/dev/null
  pip install py3Dmol 2>&1 1>/dev/null
  touch CODE_READY
fi

# setup conda
if [ ! -f CONDA_READY ]; then
  wget -qnc https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
  bash Miniconda3-latest-Linux-x86_64.sh -bfp /usr/local 2>&1 1>/dev/null
  rm Miniconda3-latest-Linux-x86_64.sh
  touch CONDA_READY
fi

# setup openmm for amber refinement
if [ ! -f AMBER_READY ]; then
  conda install -y -q -c conda-forge openmm=7.7.0 python="${PYTHON_VERSION}" pdbfixer 2>&1 1>/dev/null
  touch AMBER_READY
fi

# setup anarci
if [ ! -f ANARCI_READY ]; then
  conda install -y -q -c bioconda anarci python="${PYTHON_VERSION}" 2>&1 1>/dev/null
  touch ANARCI_READY
fi

In [3]:
#@title Download the model weights
%%capture

#@markdown This will take a few seconds the first time

if f"/usr/local/lib/python{python_version}/site-packages/" not in sys.path:
    sys.path.insert(0, f"/usr/local/lib/python{python_version}/site-packages/")

from ImmuneBuilder import ABodyBuilder2, NanoBodyBuilder2, TCRBuilder2

if protein_type == "Antibody":
    predictor = ABodyBuilder2()
elif protein_type == "Nanobody":
    predictor = NanoBodyBuilder2()
elif protein_type == "TCR":
    predictor = TCRBuilder2()


In [4]:
#@title Predict the structure

from anarci import number

# Find which input sequence is which
_, chain1 = number(sequence_1)
_, chain2 = number(sequence_2)

input = dict()
if chain1:
    input[chain1] = sequence_1
if chain2:
    input[chain2] = sequence_2

try:
    predictor.predict(input).save(filename)
except KeyError as e:
    print(f"ERROR: Missing sequence for chain {str(e)}")

In [5]:
#@title Visualise the prediction

import py3Dmol

#@markdown Choose visualization settings (rerun this cell to update):

colour_by = "predicted_error" #@param ["predicted_error", "chain", "rainbow"]

show_sidechains = True #@param {type:"boolean"}
show_mainchains = False #@param {type:"boolean"}


#First we assign the py3Dmol.view as view
view=py3Dmol.view()
#The following lines are used to add the addModel class
#to read the PDB files of chain B and C
view.addModel(open(filename, 'r').read(),'pdb')
#Zooming into all visualized structures 
view.zoomTo()
#Here we set the background color as white
view.setBackgroundColor('white')


if colour_by == "chain":
    #Here we set the visualization style for chain B and C
    view.setStyle({'chain':'H'},{'cartoon': {'color':'purple'}})
    view.setStyle({'chain':'L'},{'cartoon': {'color':'green'}})
elif colour_by == "rainbow":
    view.setStyle({'cartoon': {'color':'spectrum'}})
elif colour_by == "predicted_error":
    # Here we set visualization by b factor
    print("The error is calulated by comparing how much different models agree or disagree on the placement of each residue")
    view.setStyle({'cartoon': {'colorscheme': {'prop':'b','gradient': 'roygb','min':5,'max':0}}})

if show_sidechains:
    BB = ['C','O','N']
    view.addStyle({'and':[{'resn':["GLY","PRO"],'invert':True},{'atom':BB,'invert':True}]},
                        {'stick':{'colorscheme':f"WhiteCarbon",'radius':0.3}})
    view.addStyle({'and':[{'resn':"GLY"},{'atom':'CA'}]},
                        {'sphere':{'colorscheme':f"WhiteCarbon",'radius':0.3}})
    view.addStyle({'and':[{'resn':"PRO"},{'atom':['C','O'],'invert':True}]},
                        {'stick':{'colorscheme':f"WhiteCarbon",'radius':0.3}})  
if show_mainchains:
    BB = ['C','O','N','CA']
    view.addStyle({'atom':BB},{'stick':{'colorscheme':f"WhiteCarbon",'radius':0.3}})



#And we finally visualize the structures using the command below
view.zoomTo()
view.show()


The error is calulated by comparing how much different models agree or disagree on the placement of each residue


In [6]:
#@title Download predicted structure
#@markdown If you are having issues downloading the result archive, try disabling your adblocker and run this cell again. If that fails click on the little folder icon to the left, navigate to the file, right-click and select \"Download\".

from google.colab import files

files.download(filename)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# Instructions <a name="Instructions"></a>
**Quick start**
1. Paste the sequence(s) of your antibody, nanobody or TCR in the input field.
2. Select what type of protein it is.
3. Press "Runtime" -> "Run all".
4. The pipeline consists of 5 steps. The currently running step is indicated by a circle with a stop sign next to it.

**Troubleshooting**
* Check your input sequences. They should be antibody, nanobody or TCRs sequences. ImmuneBuilder is not capable of predicting the structure of general proteins.
* Check that the runtime type is set to GPU at "Runtime" -> "Change runtime type".
* Try to restart the session "Runtime" -> "Factory reset runtime".

**Aknowledgements**
* This colab notebook was heavily inspired by [ColabFold](https://github.com/sokrypton/ColabFold).