In [None]:
from peptdeep.model.ms2 import pDeepModel
from peptdeep.model.rt import IRT_PEPTIDE_DF
import numpy as np

### MS2 Prediction

This Notebook will walk you through different use cases supported by our pre-tarined ms2 for prediction. One important input to the prediction process, is the requested charged frag types to be predicted. 

Supported use cases with the new format:
| Fragtypes use case    ,     Override from weights (*) | Safe to predict |
|-----------------------------------------------------------|----------------
| requested = supported   (1)       ,       False                 |      ✅          |
| requested ⊆ supported    (2)     ,          False              |         ✅      |
| requested ⊈ supported     (3)    ,             False             |        ❌        |
|                  Any               ,            True              |       ✅         |

(1) The ideal use case where you know and request exactly the same fragment types supported in the model weights.

(2) You only need to predict a subset of the frag types supported by the loaded weights. 

(3) You request charged frag types that are not supported.

(*) `Override from weights` is the new argument added to the MS2 model, this allow you to load models without knowing exactly what are the supported frag types in a pretrained model. So this overrides the requested frag types and uses all supported frag types by the loaded model.

In [None]:
model_path = "../new_pretrained_models/generic/ms2.pth"


To interact with the underlying model, we need to instantiate a model interface that contains the interfaces for prediction and training. In peptdeep the default model interface for MS2 models is pDeepModel

In [3]:
model_interface = pDeepModel()
model_interface.load(model_path)

Next as a user you have the full flexibility to define which charged frag types the the model should predict. However, as described in the supported use cases, the requested charged frag types have to be a subset of the supported frag types of the underlying model. If you want to predict unsupported fragment types, make sure to check the transfer learning notebook where you can efficiently extend a pre-trained model to predict additional fragment types.

Great lets now check the different use cases we support.

In [4]:
# Function to create a test dataset to be used for prediction
def get_prediction_dataset():
    df=IRT_PEPTIDE_DF.copy()
    df['charge'] = 2
    df['mods'] = ''
    df['mod_sites'] = ''
    # sort by nAA
    df = df.sort_values('nAA')
    idxes = np.zeros(len(df)+1,dtype=np.int64)
    idxes[1:] = np.cumsum(df.nAA.values-1)
    df['frag_start_idx'] = idxes[:-1]
    df['frag_stop_idx'] = idxes[1:]
    df['nce'] = 30
    df['instrument'] = "Lumos"
    # sort by 
    return df

1. Predict all supported fragment types

In [None]:
# To see what fragment types are supported in the alpha ecosystem check alphabase.fragment.FRAGMENT_TYPES 
REQUESTED_FRAG_TYPES = ['b_z1', 'b_z2', 'y_z1', 'y_z2', 'b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2']
model_interface = pDeepModel(REQUESTED_FRAG_TYPES)
model_interface.load(model_path)

predictions = model_interface.predict(get_prediction_dataset())
predictions.head()

Unnamed: 0,b_z1,b_z2,y_z1,y_z2,b_modloss_z1,b_modloss_z2,y_modloss_z1,y_modloss_z2
0,0.0,0.0,1.0,0.004739,0.0,0.0,0.0,0.0
1,0.162034,0.0,0.360414,0.0,0.0,0.0,0.0,0.0
2,0.04666,0.0,0.10992,0.005516,0.0,0.0,0.0,0.0
3,0.018628,0.0,0.203326,0.0,0.0,0.0,0.0,0.0
4,0.01353,0.0,0.267507,0.0,0.0,0.0,0.0,0.0


2. Predict a subset of the supported fragment types for example:
- Masking modloss fragment types:
 

In [6]:
REQUESTED_FRAG_TYPES = ['b_z1', 'b_z2', 'y_z1', 'y_z2']
model_interface = pDeepModel(REQUESTED_FRAG_TYPES)
model_interface.load(model_path)

predictions = model_interface.predict(get_prediction_dataset())
predictions.head()

Unnamed: 0,b_z1,b_z2,y_z1,y_z2
0,0.0,0.0,1.0,0.004739
1,0.162034,0.0,0.360414,0.0
2,0.04666,0.0,0.10992,0.005516
3,0.018628,0.0,0.203326,0.0
4,0.01353,0.0,0.267507,0.0


- Masking non-modloss fragment types:

In [38]:
REQUESTED_FRAG_TYPES = ['b_z1', 'b_z2', 'b_modloss_z1', 'b_modloss_z2']
model_interface = pDeepModel(REQUESTED_FRAG_TYPES)
model_interface.load(model_path)
predictions = model_interface.predict(get_prediction_dataset())
predictions.head()

Unnamed: 0,b_z1,b_z2,b_modloss_z1,b_modloss_z2
0,0.0,0.0,0.0,0.0
1,0.162034,0.0,0.0,0.0
2,0.04666,0.0,0.0,0.0
3,0.018628,0.0,0.0,0.0
4,0.01353,0.0,0.0,0.0


What if you don't know what are the supported frag types by the pretrained model you have? You can choose to follow strictly what the underlying model support by setting the `override_from_weights` argument to True during the pDeepModel initialization. 

In [8]:
REQUESTED_FRAG_TYPES = ['b_z1'] 
model_interface = pDeepModel(REQUESTED_FRAG_TYPES, # will be overriden by the model weights
                             override_from_weights=True)
model_interface.load(model_path)

print(f"Supported fragment types: {model_interface.model.supported_charged_frag_types}")

predictions = model_interface.predict(get_prediction_dataset())
predictions.head()

Supported fragment types: ['b_z1', 'b_z2', 'y_z1', 'y_z2', 'b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2']


Unnamed: 0,b_z1,b_z2,y_z1,y_z2,b_modloss_z1,b_modloss_z2,y_modloss_z1,y_modloss_z2
0,0.0,0.0,1.0,0.004739,0.0,0.0,0.0,0.0
1,0.162034,0.0,0.360414,0.0,0.0,0.0,0.0,0.0
2,0.04666,0.0,0.10992,0.005516,0.0,0.0,0.0,0.0
3,0.018628,0.0,0.203326,0.0,0.0,0.0,0.0,0.0
4,0.01353,0.0,0.267507,0.0,0.0,0.0,0.0,0.0
