<a href="https://colab.research.google.com/github/drchristhorpe/esmfold-hla-class-i/blob/main/Creating_ESMfold_molecules_of_HLA_Class_I_alpha_chains_%5Balpha%5D.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Creating ESMfold molecules of HLA Class I alpha chains

**Note:** these structure predictions have not yet been benchmarked against crystal structures (which will happen very soon). 

This is just a proof of concept of using the [ESMFold API](https://esmatlas.com/about#api) released on 1st November 2022.

Many thanks to the MetaAI team for not just [open sourcing the models and the code](https://github.com/facebookresearch/esm/tree/main/esm), but also for making an API. 

To test out ESMFold in GoogleColab you can use a [notebook](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/ESMFold.ipynb) authored by [Sergey Ovchinnikov](https://twitter.com/sokrypton)

The ESMFold paper can be found [here](https://www.biorxiv.org/content/10.1101/2022.07.20.500902v2)


**Evolutionary-scale prediction of atomic level protein structure with a language model**

Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, Robert Verkuil, Ori Kabeli, Yaniv Shmueli, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Salvatore Candido, Alexander Rives







In [35]:
#@title 1. Import libraries

#@markdown Running this cell will install the following libraries into the notebook:

#@markdown - py3Dmol

!pip install py3Dmol


import py3Dmol
import requests
from google.colab import files
import os


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [36]:
#@title 2. Load HLA class I allele sequences

#@markdown Running this cell retrieves a subset of allele sequences derived from the [IPD-IMGT/HLA dataset](https://www.ebi.ac.uk/ipd/imgt/hla/). 

#@markdown The sequences are for the amino acids 1-275 of the sequence without the signal peptide (the residues found in most MHC Class I crystal structures).

#@markdown Only the lowest numbered unique alleles are in this set.

loci = ['a','b','c','e','f','g']

alleles = {}

locus_base_url = 'https://raw.githubusercontent.com/drchristhorpe/esmfold-hla-class-i/main/hla_loci/'

for locus in loci:
  locus_url = f'{locus_base_url}hla_{locus}.json'
  locus_slug = f'hla_{locus}'
  locus_name = f'HLA-{locus.upper()}'
  r = requests.get(locus_url)
  print (locus_name)
  if r.status_code == 200:
    alleles[locus_slug] = r.json()
    print (f'{len(alleles[locus_slug])} alleles loaded')
  

HLA-A
2389 alleles loaded
HLA-B
2919 alleles loaded
HLA-C
2315 alleles loaded
HLA-E
119 alleles loaded
HLA-F
11 alleles loaded
HLA-G
24 alleles loaded


In [37]:
#@title 3. Select allele

locus_letter = 'A' #@param ['A', 'B', 'C', 'E', 'F', 'G'] {type:"string"}

allele_group = "02" #@param

specific_allele = "01" #@param

#@markdown e.g. for HLA-A*02:01

#@markdown locus_letter 'A'
#@markdown allele_group '02'
#@markdown specific_allele '01'


allele_slug = None


locus_slug = f'hla_{locus_letter.lower()}'

allele_name = f'HLA-{locus_letter.upper()}*{allele_group}:{specific_allele}'.upper()
allele_slug = f'{locus_slug}_{allele_group}_{specific_allele}'

if locus_slug in alleles:
  if allele_slug in alleles[locus_slug]:
    allele = alleles[locus_slug][allele_slug]
    sequence = allele['sequence']
    filename = f'{allele_slug}.pdb'
  else:
    sequence = None
    allele = None
    filename = None

if sequence:
  print (f'IPD-IMGT/HLA identifier : {allele["identifier"]}\n')
  print (f'>{allele_name}')
  print (sequence)
else:
  print ('Allele not in collection, please try again')


IPD-IMGT/HLA identifier : ipd-imgt:HLA34423

>HLA-B*35:01
GSHSMRYFYTAMSRPGRGEPRFIAVGYVDDTQFVRFDSDAASPRTEPRAPWIEQEGPEYWDRNTQIFKTNTQTYRESLRNLRGYYNQSEAGSHIIQRMYGCDLGPDGRLLRGHDQSAYDGKDYIALNEDLSSWTAADTAAQITQRKWEAARVAEQLRAYLEGLCVEWLRRYLENGKETLQRADPPKTHVTHHPVSDHEATLRCWALGFYPAEITLTWQRDGEDQTQDTELVETRPAGDRTFQKWAAVVVPSGEEQRYTCHVQHEGLPKPLTLRWE


In [38]:
#@title 4. Run the prediction

if sequence:  

  esmfold_api_url = 'https://api.esmatlas.com/foldSequence/v1/pdb/'

  r = requests.post(esmfold_api_url, data=sequence)

  if r.status_code == 200:
    structure = r.text
    print (f'Prediction for {allele_name} is now complete and ready for download')
  else:
    print ('An error has occurred, please leave it a bit and try again - the ESMFold API may be having problems')
    structure = None
  


Prediction for HLA-B*35:01 is now complete and ready for download


In [39]:
#@title Prediction display

print (f'{allele_name}')

#@markdown The colour of residues in the visualisation relates to the confidence of the prediction at that specific residue. Red is low confidence, dark blue is high confidence.

pymol_color_list = ["#33ff33","#00ffff","#ff33cc","#ffff00","#ff9999","#e5e5e5","#7f7fff","#ff7f00",
                    "#7fff7f","#199999","#ff007f","#ffdd5e","#8c3f99","#b2b2b2","#007fff","#c4b200",
                    "#8cb266","#00bfbf","#b27f7f","#fcd1a5","#ff7f7f","#ffbfdd","#7fffff","#ffff7f",
                    "#00ff7f","#337fcc","#d8337f","#bfff3f","#ff7fff","#d8d8ff","#3fffbf","#b78c4c",
                    "#339933","#66b2b2","#ba8c84","#84bf00","#b24c66","#7f7f7f","#3f3fa5","#a5512b"]

view = py3Dmol.view(js='https://3dmol.org/build/3Dmol.js', width=800, height=400)
view.addModel(structure,'pdb')
view.setStyle({'cartoon': {'colorscheme': {'prop':'b','gradient': 'roygb','min':0.5,'max':0.9}}})
view.zoomTo()


HLA-B*35:01


<py3Dmol.view at 0x7f3e4726fcd0>

In [40]:
#@title Download prediction


with open(filename, 'w') as pdb_file:
  pdb_file.write(structure)  

os.system(f"zip {allele_slug}.zip {filename}")
files.download(f'{allele_slug}.zip')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [41]:
print (filename)

hla_b_35_01.pdb
