<a href="https://colab.research.google.com/github/Jingqiqi777/Jingqiqi777.github.io/blob/master/tcrdock_colab_pipeline_v1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# TCRdock TCR:pMHC Structure Prediction Colab

This colab notebook is based on the AlphaFold colab notebook https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb -- many thanks to the AlphaFold developers for creating and sharing their code and related content.



## Setup

Start by running the 4 cells below to set up TCRdock and all required software.

In [5]:
# Set environment variables before running any other code.
import os
os.environ['TF_FORCE_UNIFIED_MEMORY'] = '1'
os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION'] = '4.0'


from IPython.utils import io
import os
import subprocess
import tqdm.notebook

TQDM_BAR_FORMAT = '{l_bar}{bar}| {n_fmt}/{total_fmt} [elapsed: {elapsed} remaining: {remaining}]'

try:
  with tqdm.notebook.tqdm(total=100, bar_format=TQDM_BAR_FORMAT) as pbar:
    with io.capture_output() as captured:
      # Uninstall default Colab version of TF.
      %shell pip uninstall -y tensorflow
      pbar.update(6)

      # Install py3dmol.
      %shell pip install py3dmol
      pbar.update(2)

      # Install OpenMM and pdbfixer.
      %shell rm -rf /opt/conda
      %shell wget -q -P /tmp \
        https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
          && bash /tmp/Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda \
          && rm /tmp/Miniconda3-latest-Linux-x86_64.sh
      pbar.update(12)

      PATH=%env PATH
      %env PATH=/opt/conda/bin:{PATH}
      %shell conda install -qy -c conda-forge \
            python=3.10
# for some reason-- this conda installation (from alphafold colab notebook) hangs:
#      %shell conda install -qy conda==23.5.2 \
#          && conda install -qy -c conda-forge \
#            python=3.10
      pbar.update(80)

except subprocess.CalledProcessError:
  print(captured)
  raise

#print(captured)


  0%|          | 0/100 [elapsed: 00:00 remaining: ?]

In [2]:
GIT_REPO = 'https://github.com/phbradley/TCRdock'

PARAMS_URLS = ['https://www.dropbox.com/s/e3uz9mwxkmmv35z/params_model_2_ptm.npz',
               'https://www.dropbox.com/s/jph8v1mfni1q4y8/tcrpmhc_run4_af_mhc_params_891.pkl']

PARAMS_DIR = './alphafold_params/params'

try:
  with tqdm.notebook.tqdm(total=100, bar_format=TQDM_BAR_FORMAT) as pbar:
    with io.capture_output() as captured:
      %shell rm -rf TCRdock
      %shell git clone --branch main {GIT_REPO} TCRdock
      pbar.update(20)
      # Install the required versions of all dependencies.
      %shell pip3 install -r ./TCRdock/requirements_colab_af232.txt
      pbar.update(60)

      # Load parameters
      %shell mkdir --parents "{PARAMS_DIR}"
      for URL in PARAMS_URLS:
        PARAMS_PATH = os.path.join(PARAMS_DIR, os.path.basename(URL))
        %shell wget -O "{PARAMS_PATH}" "{URL}"
      pbar.update(20)

except subprocess.CalledProcessError:
  print(captured)
  raise

#print(captured)

import jax
if jax.local_devices()[0].platform == 'tpu':
  raise RuntimeError('Colab TPU runtime not supported. Change it to GPU via Runtime -> Change Runtime Type -> Hardware accelerator -> GPU.')
elif jax.local_devices()[0].platform == 'cpu':
  raise RuntimeError('Colab CPU runtime not supported. Change it to GPU via Runtime -> Change Runtime Type -> Hardware accelerator -> GPU.')
else:
  print(f'Running with {jax.local_devices()[0].device_kind} GPU')

# Make sure everything we need is on the path.
import sys
sys.path.append('/opt/conda/lib/python3.10/site-packages')




  0%|          | 0/100 [elapsed: 00:00 remaining: ?]

Running with Tesla T4 GPU


In [3]:
cd TCRdock/

/content/TCRdock


In [4]:
%shell python download_blast.py

python3: can't open file '/content/download_blast.py': [Errno 2] No such file or directory


CalledProcessError: Command 'python download_blast.py' returned non-zero exit status 2.

## Enter info on the modeling targets

You can use the form in the next block, which will create a file `user_targets.tsv` with the supplied information to prepare for modeling a single target.

Or, to run multiple targets, you could upload a TSV-formatted file (ie, tab-separated-values) to this running colab session using the upload button in the upper left corner. If you save it in `/content/TCRdock/` and give it the filename `user_targets.tsv` then you can skip the next block with the form and go directly to the `setup_for_alphafold.py` command. Or call the file whatever you want and modify the `setup_for_alphafold.py` command to provide the location of your new file with the  `--targets_tsvfile` flag.

In [1]:
#@title Enter the information on the TCR:pMHC complex to be modeled. When you are finished, press the play button (triangle inside circle) on the left.

#@markdown If there are any problems with the data, like unrecognized V/J gene names, there will be errors when the next cell is run.


#consulting here;
# https://colab.research.google.com/notebooks/forms.ipynb
import pandas as pd

organism = 'human' #@param ["human", "mouse"]

mhc_class = 1 #@param [1,2] {type:"raw"}

mhc = 'A*02:01'  #@param {type:"string"}

#@markdown For class II, the peptide should be 11 amino acids long (9 residue core plus 1 residue on either side)
peptide = 'GSMNRRPILTG' #@param {type:"string"}

#@markdown The gene names should include allele information (ie, they should end in "*01" or something like that)
va = 'TRAV19*01' #@param {type:"string"}
ja ='TRAJ56*01' #@param {type:"string"}
#@markdown The CDR3 sequence starts with the conserved C and ends with the F/Y/W that comes before the GXG in the J region.
#@markdown The CDR3 sequences should be at least 6 residues long.
cdr3a = 'CALSDPPYGANSKLTF' #@param {type:"string"}
vb = 'TRBV20-3*01' #@param {type:"string"}
jb = 'TRBJ1-1*01' #@param {type:"string"}
cdr3b = 'CSARDPGQGNTEAFF' #@param {type:"string"}

targets = pd.DataFrame([
    dict(organism=organism, mhc_class = mhc_class, mhc=mhc, peptide=peptide,
         va=va, ja=ja, cdr3a=cdr3a, vb=vb, jb=jb, cdr3b=cdr3b,
    )])

targets_filename = 'user_targets.tsv'
targets.to_csv(targets_filename, sep='\t', index=False)
print('made:', targets_filename)




made: user_targets.tsv


In [2]:
%shell which python

/usr/local/bin/python




## Generate the inputs for AlphaFold modeling

In [3]:
%shell python setup_for_alphafold.py --targets_tsvfile user_targets.tsv --output_dir user_output --new_docking


python3: can't open file '/content/setup_for_alphafold.py': [Errno 2] No such file or directory


CalledProcessError: Command 'python setup_for_alphafold.py --targets_tsvfile user_targets.tsv --output_dir user_output --new_docking' returned non-zero exit status 2.

# Run AlphaFold with the generated inputs

This next python command will build TCRdock models for the targets with information listed in the file `user_output/targets.tsv` . The first target will take longer because the neural network model is being compiled. After that, remaining targets will be much (~5x) faster.

In [29]:
%shell python run_prediction.py --verbose \
    --targets user_output/targets.tsv \
    --outfile_prefix user_output \
    --model_names model_2_ptm_ft4 \
    --data_dir /content/alphafold_params/ \
    --model_params_files /content/alphafold_params/params/tcrpmhc_run4_af_mhc_params_891.pkl


# this command computes the PAE between pMHC and TCR
%shell python add_pmhc_tcr_pae_to_tsvfile.py --infile user_output_final.tsv \
    --outfile user_output_w_pae.tsv


imported alphafold.model from <module 'alphafold.model.model' from '/content/TCRdock/alphafold/model/model.py'>
cmd: run_prediction.py --verbose --targets user_output/targets.tsv --outfile_prefix user_output --model_names model_2_ptm_ft4 --data_dir /content/alphafold_params/ --model_params_files /content/alphafold_params/params/tcrpmhc_run4_af_mhc_params_891.pkl
local_device: gpu hostname: 2f59101f6f40 num_targets: 1 max_len= 411
config: model_2_ptm_ft4
load_model_runners:: small_msas==True setting small max_extra_msa and max_msa_clusters
loading model_2_ptm_ft4 params from file: /content/alphafold_params/params/tcrpmhc_run4_af_mhc_params_891.pkl
ignoring other_params: {}
START: 0 of 1
running model_2_ptm_ft4


























model_2_ptm_ft4 pLDDT: 88.99404306573709 Time: 260.07888473599996
model_1 88.99404306573709
made: user_output_final.tsv
Calculating pmhc_tcr_pae for model: model_2_ptm_ft4
made: user_output_w_pae.tsv




## Look at the TCRdock output

The next cell should generate as output a table with the pMHC-TCR PAE values (in the `pmhc_tcr_pae` column). Models with PAE values less than 6.5 or 7 are higher confidence; models with PAE values greater than 7.5 or 8 are low confidence.

In [30]:
# look at the output
import pandas as pd
results = pd.read_table('user_output_w_pae.tsv')

cols = 'pmhc_tcr_pae mhc peptide va cdr3a vb cdr3b model_pdbfile'.split()
results[cols]


Unnamed: 0,pmhc_tcr_pae,mhc,peptide,va,cdr3a,vb,cdr3b,model_pdbfile
0,9.549142,A*02:01,GSMNRRPILTG,TRAV22*01,CAVETSYDKVIF,TRBV11-3*01,CASSLDLLGQGYNEQFF,user_output_T00000_A0201_GSMNRRPILTG_0_model_1...


In [31]:
# show the output PDB files
!ls *.pdb

user_output_T00000_A0201_GSMNRRPILTG_0_model_1_model_2_ptm_ft4.pdb


In [32]:
from google.colab import files
from matplotlib import gridspec
import matplotlib.pyplot as plt
import numpy as np
import py3Dmol
from glob import glob

from IPython import display
from ipywidgets import GridspecLayout
from ipywidgets import Output

# the name of the file we want to visualize:
files = glob('user_output_T00000_*_model_2_ptm_ft4.pdb')
fname = files[0]
print('loading:', fname)

with open(fname,'r') as f:
    to_visualize_pdb = f.read()


show_sidechains = True #False

view = py3Dmol.view(width=800, height=600)
view.addModelsAsFrames(to_visualize_pdb)
style = {'cartoon': {}} #{'colorscheme': {'prop': 'b', 'map': color_map}}}
if show_sidechains:
  style['stick'] = {}
view.setStyle({'model': -1}, style)
view.zoomTo()

grid = GridspecLayout(1, 2)
out = Output()
with out:
  view.show()
grid[0, 0] = out

out = Output()
#with out:
#  plot_plddt_legend().show()
grid[0, 1] = out

display.display(grid)


loading: user_output_T00000_A0201_GSMNRRPILTG_0_model_1_model_2_ptm_ft4.pdb


GridspecLayout(children=(Output(layout=Layout(grid_area='widget001')), Output(layout=Layout(grid_area='widget0…

In [33]:
%shell mkdir tcrdock_prediction/
%shell cp user_output*pdb user_output_w_pae.tsv tcrdock_prediction/
%shell tar -czvf tcrdock_prediction.tgz tcrdock_prediction/
from google.colab import files
files.download('tcrdock_prediction.tgz')

tcrdock_prediction/
tcrdock_prediction/user_output_T00000_A0201_GSMNRRPILTG_0_model_1_model_2_ptm_ft4.pdb
tcrdock_prediction/user_output_w_pae.tsv


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [34]:
ls

add_pmhc_tcr_pae_to_tsvfile.py
algorithms_from_the_paper.py
[0m[01;34malphafold[0m/
changes_to_alphafold.txt
compute_docking_rmsds.py
compute_tcrdists.py
[01;34mdatasets_from_the_paper[0m/
[01;34mdocker[0m/
download_blast.py
[01;34mexamples[0m/
[01;34m_images[0m/
LICENSE
[01;34mncbi-blast-2.11.0+[0m/
ncbi-blast-2.11.0+-x64-linux.tar.gz
original_alphafold_LICENSE
parse_tcr_pmhc_pdbfile.py
predict_utils.py
[01;34m__pycache__[0m/
README.md
requirements_colab_af232.txt
requirements_colab_python310.txt
requirements_colab_python38.txt
requirements.txt
run_prediction.py
setup_for_alphafold.py
[01;34mtcrdock[0m/
tcrdock_colab_pipeline_v1.ipynb
[01;34mtcrdock_prediction[0m/
tcrdock_prediction.tgz
[01;34muser_output[0m/
user_output_final.tsv
user_output_T00000_A0201_GSMNRRPILTG_0_model_1_model_2_ptm_ft4.pdb
user_output_T00000_A0201_GSMNRRPILTG_0_model_1_model_2_ptm_ft4_plddt.npy
user_output_T00000_A0201_GSMNRRPILTG_0_model_1_model_2_ptm_ft4_predicted_aligned_error.npy
user_o

# Some random potentially useful commands for debugging.

In [None]:
# for figuring out what CUDA versions are installed
! nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0


In [None]:
# same for CUDNN
!cat /usr/include/x86_64-linux-gnu/cudnn_v*.h | grep CUDNN_MAJOR -A 2


#define CUDNN_MAJOR 8
#define CUDNN_MINOR 9
#define CUDNN_PATCHLEVEL 6
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

/* cannot use constexpr here since this is a C-only file */


In [None]:
%shell echo $PATH

/opt/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tools/node/bin:/tools/google-cloud-sdk/bin




In [None]:
%shell which python

/opt/conda/bin/python




In [None]:
%shell which pip3

/opt/conda/bin/pip3


