# MHCnuggets User Guide
This is a simple jupyter notebook illustrating how to incorporate MHCnuggets into your work flow

# Installation
MHCnuggets is pip installable as 
```bash
pip install mhcnuggets
```

# Prediction
MHCnuggets is a pan predictor that can make an IC50 binding affinity prediction on any MHC allele. However, its prediction is more reliable for alleles that are present in the IEDB. For a complete list of these alleles, refer to the `alleles_with_trained_models.txt` in the production data folders.

For alleles with available mass spectrometry (HLAp) training data, models incorporating binding affinity (BA) and HLAp data were trained (data at https://data.mendeley.com/datasets/8pz43nvvxh/3, abelin_peptides.mhcflurry_no_mass_spec.csv.bz2, provided by Tim O'Donnell and the Abelin et al. group). If available, these blended models (BA_to_HLAp) will be used by default unless the user specifically requests to use a pure BA model (example shown below)

In [None]:
# importing the predict module
from mhcnuggets.src.predict import predict

# predicting new line separated peptides present in the peptides_path file 
# for MHC class_I allele HLA-A02:01. Since a MS_BA blend model is available,
# this prediction will use the blended model (HLA-A02:01_BA_to_HLAp.h5)
predict(class_='I',
        peptides_path='data/test/test_peptides.peps', 
        mhc='HLA-A02:01')

# This prediction will use a pure BA model (HLA-A02:01_BA.h5)
predict(class_='I',
        peptides_path='data/test/test_peptides.peps', 
        mhc='HLA-A02:01', ba_models=True)

# similarly doing the same prediction for MHC class_II allele HLA-DRB101:01
# No MS_BA blend models are available for class II alleles, so this prediction will use a pure BA model
predict(class_='II',
        peptides_path='data/test/test_peptides.peps', 
        mhc='HLA-DRB101:01')

# as an example of prediction of rare alleles asking MHCnuggets to make predictions for HLA-A02:60
# will make it search for the closest allele (HLA-A*02:01 in this case), and use the corresponding 
# network for prediction
predict(class_='I',
        peptides_path='data/test/test_peptides.peps', 
        mhc='HLA-A02:60')


The above lines of code demonstrate using the default MHCnuggets models that are trained on the latest pull from IEDB. If you want to predict using your own models: 

In [None]:
# predicting using a user trained model
# Replace the model_weights_path with the path to your model
# To test if your predictions are working correctly - the output of this command should match the output shown in 
# saves/test/HLA-A01:01_test_model.predictions
predict(class_='I',
        peptides_path='data/test/test_peptides.peps', 
        mhc='HLA-A02:01', model_weights_path='saves/test/HLA-A01:01_test_model.h5')

# Training
MHCnuggets allows users to train their own models on their own datasets using either binding affinity (BA) or mass spectrometry (HLAp) data. The recommended protocol for trainning MHCnuggets utilizes transfer learning described in the publication. Briefly, to train a binding affinity model, one trains a model for HLA-A02:01 and HLA-DRB101:01 for 200 epochs, then trains all other alleles for 100 epochs with one of the aforementioned alleles as the base transfer weights, and finally, fine tunes certain alleles (refer to the `mhc_tuning.csv` file in the production data folders) for 25 epochs. Note that the transfer of weights occurs within the same MHC class, i.e. one can't tune the weights of a class II allele with a class I allele. This process is demonstrated below for the training for class I alleles: HLA-A02:01, HLA-B08:01, and HLA-B08:02

To train a model incorporating both HLAp and BA data, one trains a model for HLA-A\*02:01 for 200 epochs on BA data, then trains the allele of interest for 100 epochs on BA data with the HLA-A02:01 base transfer weights, and finally trains the allele of interest for 100 epochs on HLAp data using the allele's BA model base transfer weights. This process is demonstrated below for the training of a BA_to_HLAp model of HLA-A01:01.

In [None]:
# importing the train module
from mhcnuggets.src.train import train

# ----------- Training with binding affinity data ----------------
# training MHC class_I allele HLA-A02:01 using data present in the data file from scratch 
train(class_='I', data='data/production/mhcI/curated_training_data.csv',
      mhc='HLA-A02:01', save_path='saves/test/test_A02:01_BA.h5', n_epoch=200)

# training MHC class I allele HLA-B08:01 using transfer weights from class I allele HLA-A02:01
train(class_='I', data='data/production/mhcI/curated_training_data.csv',
      mhc='HLA-B08:01', save_path='saves/test/test_B08:01_BA.h5', n_epoch=100,
      transfer_path='saves/test/test_A02:01_BA.h5')

# training MHC class I allele HLA-B08:02 using transfer weights from class I allele
# HLA-B08:01, note that this is only train for n_epochs=25
train(class_='I', data='data/production/mhcI/curated_training_data.csv',
      mhc='HLA-B08:02', save_path='saves/test/test_B08:02_BA.h5', n_epoch=25,
      transfer_path='saves/test/test_B08:01_BA.h5')

# ----------- Training with HLAp and BA data ----------------
# training MHC class_I allele HLA-A02:01 using data present in the data file from scratch (BA training)
train(class_='I', data='data/production/mhcI/curated_training_data.csv',
      mhc='HLA-A02:01', save_path='saves/test/test_A02:01_BA.h5', n_epoch=200)

# training MHC class I allele HLA-A01:01 using transfer weights from class I allele HLA-A02:01 (BA training)
train(class_='I', data='data/production/mhcI/curated_training_data.csv',
      mhc='HLA-A01:01', save_path='saves/test/test_A01:01_BA.h5', n_epoch=100,
      transfer_path='saves/test/test_A02:01_BA.h5')

# training MHC class I allele HLA-A01:01 on HLAp data using transfer weights 
# from the HLA-A01:01 BA model
# NOTE: For this command to work, download the HLAp data at
# https://data.mendeley.com/datasets/8pz43nvvxh/3, abelin_peptides.mhcflurry_no_mass_spec.csv.bz2
# and replaec the 'data' path to the path to the unzipped dataset
train(class_='I', data='local path to downloaded mass spec data', mass_spec=True,
      mhc='HLA-A01:01', save_path='saves/test/test_A01:01_BA_to_HLAp.h5', n_epoch=25,
      transfer_path='saves/test/test_A01:01_BA.h5')


# Evaluation
MHCnuggets allows users to evaluate the training process through 3 metrics: AUC, F1, and Kendall Tau. This allows for evaluation of either user trained or default MHCnuggets models found in the `saves` directory. 

In [None]:
# importing the evaluation module
from mhcnuggets.src.evaluate import test

# Evaluating training performance of model HLA-A01:01_test_model.h5 on peptides 
# corresponding to class I allele HLA-A01:01 in database given by the 
# data path. 
test(class_='I',
     data='data/production/mhcI/curated_training_data.csv',
     model_path='saves/test/HLA-A01:01_test_model.h5', mhc='HLA-A01:01')