# MHC-Fine Colab

Change log:
- March 7:
  - added upgrade for gdown to deal with model loading issue
  - updated mhc-fine repo to fix issue with np.object and np.int
  - added new MSA generation and fixed the naming issue
- March 18:
  - updated model to fix some minor issues

## Setup

In [2]:
import os
if not os.path.exists('mhc-fine'):
  os.system("git clone https://bitbucket.org/abc-group/mhc-fine.git")

Environment setup

In [3]:
!pip install -q condacolab
import condacolab
condacolab.install()
!conda install -c bioconda kalign3
!pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
!pip install Bio
!pip install --upgrade --no-cache-dir gdown

⏬ Downloading https://github.com/conda-forge/miniforge/releases/download/23.11.0-0/Mambaforge-23.11.0-0-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:12
🔁 Restarting kernel...
Channels:
 - bioconda
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ 

Collecting gdown
  Downloading gdown-5.1.0-py3-none-any.whl.metadata (5.7 kB)
Collecting beautifulsoup4 (from gdown)
  Downloading beautifulsoup4-4.12.3-py3-none-any.whl.metadata (3.8 kB)
Collecting soupsieve>1.2 (from beautifulsoup4->gdown)
  Downloading soupsieve-2.5-py3-none-any.whl.metadata (4.7 kB)
Downloading gdown-5.1.0-py3-none-any.whl (17 kB)
Downloading beautifulsoup4-4.12.3-py3-none-any.whl (147 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m147.9/147.9 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading soupsieve-2.5-py3-none-any.whl (36 kB)
Installing collected packages: soupsieve, beautifulsoup4, gdown
Successfully installed beautifulsoup4-4.12.3 gdown-5.1.0 soupsieve-2.5
[0m

After all libraries are installed, the notebook will be restarted. Just continue running the following cells.


Libraries

In [1]:
import torch
if not torch.cuda.is_available():
    print("Please check your setup of GPU.")

Please check your setup of GPU.


In [2]:
cd /content/mhc-fine

/content/mhc-fine


In [3]:
from src import preprocess, model
import pandas as pd
import gdown
import os
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [4]:
#load the model
model_path = "data/model/mhc_fine_weights.pt"
if not os.path.exists(model_path):
    file_id = "1gZkMGOhwXAHAmTCpR5Azd7lzkW0s-nlK"
    gdown.download(f"https://drive.google.com/uc?id={file_id}", model_path)

Downloading...
From (original): https://drive.google.com/uc?id=1gZkMGOhwXAHAmTCpR5Azd7lzkW0s-nlK
From (redirected): https://drive.google.com/uc?id=1gZkMGOhwXAHAmTCpR5Azd7lzkW0s-nlK&confirm=t&uuid=41f304b2-ecc5-443c-9c43-19f4e577b913
To: /content/mhc-fine/data/model/mhc_fine_weights.pt
100%|██████████| 388M/388M [00:03<00:00, 98.2MB/s]


## Input your data


In [5]:
unique_id1 = "RA-Wild"
protein_sequence1 = "GSHSMRYFHTSVSRPGRGEPRFITVGYVDDTLFVRFDSDAASPREEPRAPWIEQEGPEYWDRETQICKAKAQTDREDLRTLLRYYNQSEAGSHTLQNMYGCDVGPDGRLLRGYHQDAYDGKDYIALNEDLSSWTAADTAAQITQRKWEAARVAEQLRAYLEGECVEWLRRYLENGKETLQRADPPKTHVTHHPISDHEATLRCWALGFYPAEITLTWQRDGEDQTQDTELVETRPAGDRTFQKWAAVVVPSGEEQRYTCHVQHEGL"
peptide_sequence = "IWMGKMQKK"

In [6]:
unique_id2 = "RA-Mutant"
protein_sequence2 = "GSHSMRYFDTSVSRPGRGEPRFITVGYVDDTLFVRFDSDAASPREEPRAPWIEQEGPEYWDRETQICKAKAQTDREDLRTLLRYYNQSEAGSHTLQNMYGCDVGPDGRLLRGYHQDAYDGKDYIALNEDLSSWTAADTAAQITQRKWEAARVAEQLRAYLEGECVEWLRRYLENGKETLQRADPPKTHVTHHPISDHEATLRCWALGFYPAEITLTWQRDGEDQTQDTELVETRPAGDRTFQKWAAVVVPSGEEQRYTCHVQHEGL"
peptide_sequence = "IWMGKMQKK"

## Get the MSA data

Run part to query the MSA data from the database.

In [7]:
!chmod +x a3m_generation/msa_run

In [8]:
a3m_path1 = f"/content/mhc-fine/a3m_generation/{unique_id1}.a3m"
preprocess.get_a3m(protein_sequence1, a3m_path1, unique_id1)


In [9]:
a3m_path2 = f"/content/mhc-fine/a3m_generation/{unique_id2}.a3m"
preprocess.get_a3m(protein_sequence2, a3m_path2, unique_id2)

## Preprocess the data

In [10]:
np_sample1 = preprocess.preprocess_for_inference(protein_sequence1, peptide_sequence, a3m_path1)

Reading a3m file...
Processing protein chain...
GSHSMRYFHTSVSRPGRGEPRFITVGYVDDTLFVRFDSDAASPREEPRAPWIEQEGPEYWDRETQICKAKAQTDREDLRTLLRYYNQSEAGSHTLQNMYGCDVGPDGRLLRGYHQDAYDGKDYIALNEDLSSWTAADTAAQITQRKWEAARVAEQLRAYLEGECVEWLRRYLENGKETLQRADPPKTHVTHHPISDHEATLRCWALGFYPAEITLTWQRDGEDQTQDTELVETRPAGDRTFQKWAAVVVPSGEEQRYTCHVQHEGL
Processing peptide chain...
IWMGKMQKK
Mering features...


In [11]:
np_sample2 = preprocess.preprocess_for_inference(protein_sequence2, peptide_sequence, a3m_path2)

Reading a3m file...
Processing protein chain...
GSHSMRYFDTSVSRPGRGEPRFITVGYVDDTLFVRFDSDAASPREEPRAPWIEQEGPEYWDRETQICKAKAQTDREDLRTLLRYYNQSEAGSHTLQNMYGCDVGPDGRLLRGYHQDAYDGKDYIALNEDLSSWTAADTAAQITQRKWEAARVAEQLRAYLEGECVEWLRRYLENGKETLQRADPPKTHVTHHPISDHEATLRCWALGFYPAEITLTWQRDGEDQTQDTELVETRPAGDRTFQKWAAVVVPSGEEQRYTCHVQHEGL
Processing peptide chain...
IWMGKMQKK
Mering features...


## Run AlphaFold, display metrics and save prediction

In [14]:
my_model = model.Model()

FileNotFoundError: [Errno 2] No such file or directory: 'model_path'

In [14]:
my_model.inference(np_sample1, unique_id1)
my_model.inference(np_sample2, unique_id2)

Running inference...
Writing predicted structure:  ./output/RA-Wild.pdb


{'mean_plddt': 96.37762451171875, 'mean_masked_plddt': 79.95874701605902}

In [15]:
!pip install pyrosettacolabsetup
import pyrosettacolabsetup; pyrosettacolabsetup.install_pyrosetta()
import pyrosetta; pyrosetta.init()
from pyrosetta import *
from pyrosetta.teaching import *
init()

Collecting pyrosettacolabsetup
  Downloading pyrosettacolabsetup-1.0.9-py3-none-any.whl.metadata (294 bytes)
Downloading pyrosettacolabsetup-1.0.9-py3-none-any.whl (4.9 kB)
Installing collected packages: pyrosettacolabsetup
Successfully installed pyrosettacolabsetup-1.0.9
[0mPyRosetta-4 2023 [Rosetta PyRosetta4.MinSizeRel.python310.ubuntu 2024.01+release.00b79147e63be743438188f93a3f069ca75106d6 2023-12-25T16:35:48] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
core.init: Checking for fconfig files in pwd and ./rosetta/flags
core.init: Rosetta version: PyRosetta4.MinSizeRel.python310.ubuntu r366 2024.01+release.00b79147e63 00b79147e63be743438188f93a3f069ca75106d6 http://www.pyrosetta.org 2023-12-25T16:35:48
core.init: command: PyRosetta -ex1 -ex2aro -database /usr/local/lib/python3.10/dist-packages/pyrosetta/database
basic.random.init_random_generator: 'RNG device' seed mode, using '/dev/u

In [16]:
protein1 = pyrosetta.pose_from_pdb("output/RA-Wild.pdb")
protein2 = pyrosetta.pose_from_pdb("output/RA-Mutant.pdb")
scorefxn = get_fa_scorefxn()
print(scorefxn(protein1))
print(scorefxn(protein2))

core.chemical.GlobalResidueTypeSet: Finished initializing fa_standard residue type set.  Created 985 residue types
core.chemical.GlobalResidueTypeSet: Total time to initialize 0.928125 seconds.
core.import_pose.import_pose: File 'output/RA-Wild.pdb' automatically determined to be of type PDB
core.conformation.Conformation: Found disulfide between residues 101 164
core.conformation.Conformation: Found disulfide between residues 203 259
core.scoring.ScoreFunctionFactory: SCOREFUNCTION: ref2015
core.scoring.etable: Starting energy table calculation
core.scoring.etable: smooth_etable: changing atr/rep split to bottom of energy well
core.scoring.etable: smooth_etable: spline smoothing lj etables (maxdis = 6)
core.scoring.etable: smooth_etable: spline smoothing solvation etables (max_dis = 6)
core.scoring.etable: Finished calculating energy tables.
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBPoly1D.csv
basic.io.database: Database file opened: scor