# MHC-Fine Colab

Change log:
- March 7:
  - added upgrade for gdown to deal with model loading issue
  - updated mhc-fine repo to fix issue with np.object and np.int
  - added new MSA generation and fixed the naming issue
- March 18:
  - updated model to fix some minor issues

## Warning

As of May 1, 2025 - this notebook only works with "Fallback runtime version" of Colab.  This is due to an upgrade of default NumPy version using in Colab to 2.0.2, which breaks code that rely on lower NumPy versions. More information about this here: https://github.com/googlecolab/colabtools/issues/5115

To run the Fallback runtime version, go to Commands (Komutlar) in the top left corner, and select "Use fallback runtime version (Yedek çalışma zamanı sürümünü kullan) before running this notebook on Colab.

## Setup

In [1]:
import os
if not os.path.exists('mhc-fine'):
  os.system("git clone https://bitbucket.org/abc-group/mhc-fine.git")

Environment setup

In [2]:
!pip install -q condacolab
import condacolab
condacolab.install()
!conda install -c bioconda kalign3
!pip install numpy==1.26.3 torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
!pip install Bio
!pip install --upgrade --no-cache-dir gdown

⏬ Downloading https://github.com/jaimergp/miniforge/releases/download/24.11.2-1_colab/Miniforge3-colab-24.11.2-1_colab-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:06
🔁 Restarting kernel...
Channels:
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / - \ done
Solving environment: / - \ failed

PackagesNotFoundError: The following packages are not available from current channels:

  - kalign3
  - bioconda

Current channels:

  - https://conda.anaconda.org/conda-forge

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.


Looking in indexes: https://download.pytorch.org/whl/cu118
Collecting numpy==1.26.3
  Downloading https://download.pytorch.org/whl/numpy-1.26.3-cp311-cp311-manylinux_2_17_x86

Collecting gdown
  Downloading gdown-5.2.0-py3-none-any.whl.metadata (5.8 kB)
Collecting beautifulsoup4 (from gdown)
  Downloading beautifulsoup4-4.13.4-py3-none-any.whl.metadata (3.8 kB)
Collecting soupsieve>1.2 (from beautifulsoup4->gdown)
  Downloading soupsieve-2.7-py3-none-any.whl.metadata (4.6 kB)
Downloading gdown-5.2.0-py3-none-any.whl (18 kB)
Downloading beautifulsoup4-4.13.4-py3-none-any.whl (187 kB)
Downloading soupsieve-2.7-py3-none-any.whl (36 kB)
Installing collected packages: soupsieve, beautifulsoup4, gdown
Successfully installed beautifulsoup4-4.13.4 gdown-5.2.0 soupsieve-2.7


After all libraries are installed, the notebook will be restarted. Just continue running the following cells.


Libraries

In [1]:
import torch
if not torch.cuda.is_available():
    print("Please check your setup of GPU.")

In [2]:
cd /content/mhc-fine

/content/mhc-fine


In [3]:
from src import preprocess, model
import pandas as pd
import gdown
import os
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [4]:
#load the model
model_path = "data/model/mhc_fine_weights.pt"
if not os.path.exists(model_path):
    file_id = "1gZkMGOhwXAHAmTCpR5Azd7lzkW0s-nlK"
    gdown.download(f"https://drive.google.com/uc?id={file_id}", model_path)

Downloading...
From (original): https://drive.google.com/uc?id=1gZkMGOhwXAHAmTCpR5Azd7lzkW0s-nlK
From (redirected): https://drive.google.com/uc?id=1gZkMGOhwXAHAmTCpR5Azd7lzkW0s-nlK&confirm=t&uuid=0af234b7-2993-4dd2-85b4-84904746c705
To: /content/mhc-fine/data/model/mhc_fine_weights.pt
100%|██████████| 388M/388M [00:02<00:00, 139MB/s]


## Make msa_run executable

In [6]:
!chmod +x a3m_generation/msa_run

## Input your data, get MSA data, preprocess data, run AlphaFold


In [21]:
# A dummy list of 5 peptides
# You may wish to load them from a CSV file you save on your google drive, and load it here.
# Example code for this (place the file in a suitable folder)
#from google.colab import drive
#drive.mount('/content/drive')
#import pandas as pd
#df = pd.read_csv('/content/drive/MyDrive/file.csv')
#peptide_list = df['Peptide_sequence'] # Assuming each row contains a peptide sequence in this column.
#For now we prepare the list right here.

peptide_list = ["HMTEVVRHC","HMTEVVRHV","HMTEVVRHK","HMTEVVRHN","HMTEVVRHF"]

In [22]:
for pep in peptide_list:
  unique_id = f"A_02_01_{pep}" # A unique file name that contains peptide sequence.
  protein_sequence = "GSHSMRYFFTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWIEQEGPEYWDGETRKVKAHSQTHRVDLGTLRGYYNQSEAGSHTVQRMYGCDVGSDWRFLRGYHQYAYDGKDYIALKEDLRSWTAADMAAQTTKHKWEAAHVAEQLRAYLEGTCVEWLRRYLENGKETLQRT"
  peptide_sequence = pep
  a3m_path = f"/content/mhc-fine/a3m_generation/{unique_id}.a3m"
  preprocess.get_a3m(protein_sequence, a3m_path, unique_id)
  np_sample = preprocess.preprocess_for_inference(protein_sequence, peptide_sequence, a3m_path)
  my_model = model.Model()
  my_model.inference(np_sample, unique_id)
  print(f"Inference done for {unique_id}")


Reading a3m file...
Processing protein chain...
GSHSMRYFFTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWIEQEGPEYWDGETRKVKAHSQTHRVDLGTLRGYYNQSEAGSHTVQRMYGCDVGSDWRFLRGYHQYAYDGKDYIALKEDLRSWTAADMAAQTTKHKWEAAHVAEQLRAYLEGTCVEWLRRYLENGKETLQRT
Processing peptide chain...
HMTEVVRHC
Mering features...
Running inference...
Writing predicted structure:  ./output/A_02_01_HMTEVVRHC.pdb
Inference done for A_02_01_HMTEVVRHC
Reading a3m file...
Processing protein chain...
GSHSMRYFFTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWIEQEGPEYWDGETRKVKAHSQTHRVDLGTLRGYYNQSEAGSHTVQRMYGCDVGSDWRFLRGYHQYAYDGKDYIALKEDLRSWTAADMAAQTTKHKWEAAHVAEQLRAYLEGTCVEWLRRYLENGKETLQRT
Processing peptide chain...
HMTEVVRHV
Mering features...
Running inference...
Writing predicted structure:  ./output/A_02_01_HMTEVVRHV.pdb
Inference done for A_02_01_HMTEVVRHV
Reading a3m file...
Processing protein chain...
GSHSMRYFFTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWIEQEGPEYWDGETRKVKAHSQTHRVDLGTLRGYYNQSEAGSHTVQRMYGCDVGSDWRFLRGYHQYAYDGKDYIALKEDLRSWTAADM