# Using the pretrained models

All pretrained MS2/RT/CCS models managed in the `peptdeep.pretrained_models`, wherein `class ModelManager` is the main entry of all models.

In [None]:
%reload_ext autoreload
%autoreload 2

### Predicting RT values

In [None]:
from peptdeep.pretrained_models import ModelManager

model_mgr = ModelManager()
model_mgr.load_installed_models()

We use the iRT peptides as the testing peptides

In [None]:
from peptdeep.model.rt import irt_pep
pep_df = irt_pep[['sequence','mods','mod_sites','irt']].copy()

First, we test the RT prediction model (`model_mgr.rt_model`)

In [None]:
model_mgr.predict_rt(pep_df)
pep_df

2022-09-25 16:44:42> Predicting RT ...


100%|██████████| 5/5 [00:00<00:00, 119.76it/s]


Unnamed: 0,sequence,mods,mod_sites,irt,nAA,rt_pred,rt_norm_pred
0,LGGNEQVTR,,,-24.92,9,0.072804,0.072804
1,YILAGVENSK,,,19.79,10,0.400949,0.400949
2,TPVISGGPYEYR,,,28.71,12,0.438901,0.438901
3,TPVITGAPYEYR,,,33.38,12,0.489774,0.489774
4,GTFIIDPGGVIR,,,70.52,12,0.757164,0.757164
5,GTFIIDPAAVIR,,,87.23,12,0.846791,0.846791
6,VEATFGVDESNAK,,,12.39,13,0.332649,0.332649
7,DGLDAASYYAPVR,,,42.26,13,0.542729,0.542729
8,ADVTPADFSEWSK,,,54.62,13,0.609782,0.609782
9,GAGSSEPVTGLDAK,,,0.0,14,0.271196,0.271196


We predict the normalized retention time (`rt_pred`, normally ranging from 0 to 1) instead of real RT values. It could be converted to real RT values by multiplying the maximal RT of the LC gradient. We can also convert `rt_pred` into iRT values (`irt_pred`) based on the 11 iRT peptides.

In [None]:
model_mgr.rt_model.add_irt_column_to_precursor_df(pep_df)
pep_df

Unnamed: 0,sequence,mods,mod_sites,irt,nAA,rt_pred,rt_norm_pred,irt_pred
0,LGGNEQVTR,,,-24.92,9,0.072804,0.072804,-28.148849
1,YILAGVENSK,,,19.79,10,0.400949,0.400949,21.806524
2,TPVISGGPYEYR,,,28.71,12,0.438901,0.438901,27.584271
3,TPVITGAPYEYR,,,33.38,12,0.489774,0.489774,35.328937
4,GTFIIDPGGVIR,,,70.52,12,0.757164,0.757164,76.035216
5,GTFIIDPAAVIR,,,87.23,12,0.846791,0.846791,89.679588
6,VEATFGVDESNAK,,,12.39,13,0.332649,0.332649,11.408902
7,DGLDAASYYAPVR,,,42.26,13,0.542729,0.542729,43.390475
8,ADVTPADFSEWSK,,,54.62,13,0.609782,0.609782,53.598396
9,GAGSSEPVTGLDAK,,,0.0,14,0.271196,0.271196,2.053492


### Predicting CCS values

After adding `charge` into the `pep_df`, we can predict the CCS values for the given peptide (precursor) using `model_mgr.predict_mobility()`, and then convert them into mobility values. Note that these mobility values are Bruker timsTOF mobility values.

In [None]:
pep_df['charge'] = 3
model_mgr.predict_mobility(pep_df)
pep_df

2022-09-25 16:44:43> Predicting mobility ...


100%|██████████| 5/5 [00:00<00:00, 146.49it/s]


Unnamed: 0,sequence,mods,mod_sites,irt,nAA,rt_pred,rt_norm_pred,irt_pred,charge,ccs_pred,precursor_mz,mobility_pred
0,LGGNEQVTR,,,-24.92,9,0.072804,0.072804,-28.148849,3,382.416138,325.173562,0.627622
1,YILAGVENSK,,,19.79,10,0.400949,0.400949,21.806524,3,444.596069,365.201118,0.73079
2,TPVISGGPYEYR,,,28.71,12,0.438901,0.438901,27.584271,3,451.388611,446.894465,0.74365
3,TPVITGAPYEYR,,,33.38,12,0.489774,0.489774,35.328937,3,470.053528,456.238232,0.774563
4,GTFIIDPGGVIR,,,70.52,12,0.757164,0.757164,76.035216,3,448.219818,415.571434,0.737861
5,GTFIIDPAAVIR,,,87.23,12,0.846791,0.846791,89.679588,3,469.222473,424.915201,0.772623
6,VEATFGVDESNAK,,,12.39,13,0.332649,0.332649,11.408902,3,479.035492,456.221018,0.789363
7,DGLDAASYYAPVR,,,42.26,13,0.542729,0.542729,43.390475,3,488.397858,466.561374,0.804969
8,ADVTPADFSEWSK,,,54.62,13,0.609782,0.609782,53.598396,3,465.785065,484.892901,0.767984
9,GAGSSEPVTGLDAK,,,0.0,14,0.271196,0.271196,2.053492,3,453.66864,430.217496,0.747111


### Predicting MS2 fragment (b/y) ion intensities

`model_mgr.predict_ms2()` predicts the fragment ion intensities of the `pep_df`. We need `nce` and `instrument` for fragment prediction. We store the predicted fragment intensities in a new dataframe.

In [None]:
pep_df['nce'] = 0.3
pep_df['instrument'] = 'Lumos'
fragment_intensity_df = model_mgr.predict_ms2(pep_df)
fragment_intensity_df

2022-09-25 16:44:43> Predicting MS2 ...


100%|██████████| 5/5 [00:00<00:00, 71.83it/s]


Unnamed: 0,b_z1,b_z2,y_z1,y_z2,b_modloss_z1,b_modloss_z2,y_modloss_z1,y_modloss_z2
0,0.000000,0.000000,0.938887,0.137162,0.0,0.0,0.0,0.0
1,0.575347,0.000000,0.389546,0.133736,0.0,0.0,0.0,0.0
2,0.200786,0.000000,0.157263,0.091070,0.0,0.0,0.0,0.0
3,0.185773,0.000000,0.438499,0.061092,0.0,0.0,0.0,0.0
4,0.171135,0.000000,0.676110,0.015279,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...
118,0.000000,0.025911,0.113351,0.007397,0.0,0.0,0.0,0.0
119,0.000000,0.004030,0.336241,0.069407,0.0,0.0,0.0,0.0
120,0.000000,0.000000,0.024834,0.000000,0.0,0.0,0.0,0.0
121,0.000000,0.000000,0.083723,0.000000,0.0,0.0,0.0,0.0


`model_mgr.predict_ms2()` will also appends `frag_start_idx` and `frag_stop_idx` into the `pep_df`, they points to the start and end (stop) positions for the corresponding peptides.

In [None]:
pep_df['sequence,mods,mod_sites,frag_start_idx,frag_stop_idx'.split(',')]

Unnamed: 0,sequence,mods,mod_sites,frag_start_idx,frag_stop_idx
0,LGGNEQVTR,,,0,8
1,YILAGVENSK,,,8,17
2,TPVISGGPYEYR,,,17,28
3,TPVITGAPYEYR,,,28,39
4,GTFIIDPGGVIR,,,39,50
5,GTFIIDPAAVIR,,,50,61
6,VEATFGVDESNAK,,,61,73
7,DGLDAASYYAPVR,,,73,85
8,ADVTPADFSEWSK,,,85,97
9,GAGSSEPVTGLDAK,,,97,110


We can calculate the fragment mz values for the `pep_df` with `create_fragment_mz_dataframe`. As there are already `frag_start_idx` and `frag_stop_idx` in the `pep_df` pointing to a fragment dataframe (i.e. `fragment_intensity_df`), so we have to set `reference_fragment_df` argument as `fragment_intensity_df` to make sure that the `fragment_mz_df` and `fragment_intensity_df` have the same order.

In [None]:
import alphabase.peptide.fragment as fragment
fragment_mz_df = fragment.create_fragment_mz_dataframe(
    pep_df, ['b_z1','b_z2','y_z1','y_z2'], 
    reference_fragment_df=fragment_intensity_df
)
print(pep_df.sequence.values[0])

display(fragment_mz_df.iloc[
    pep_df.frag_start_idx.values[0]
    :pep_df.frag_stop_idx.values[0]
])
display(fragment_intensity_df.iloc[
    pep_df.frag_start_idx.values[0]
    :pep_df.frag_stop_idx.values[0]
])

LGGNEQVTR


Unnamed: 0,b_z1,b_z2,y_z1,y_z2
0,114.09134,57.549308,860.42207,430.714673
1,171.112804,86.06004,803.400606,402.203941
2,228.134268,114.570772,746.379143,373.69321
3,342.177195,171.592236,632.336215,316.671746
4,471.219788,236.113532,503.293622,252.150449
5,599.278366,300.142821,375.235045,188.121161
6,698.34678,349.677028,276.166631,138.586954
7,799.394458,400.200867,175.118952,88.063114


Unnamed: 0,b_z1,b_z2,y_z1,y_z2,b_modloss_z1,b_modloss_z2,y_modloss_z1,y_modloss_z2
0,0.0,0.0,0.938887,0.137162,0.0,0.0,0.0,0.0
1,0.575347,0.0,0.389546,0.133736,0.0,0.0,0.0,0.0
2,0.200786,0.0,0.157263,0.09107,0.0,0.0,0.0,0.0
3,0.185773,0.0,0.438499,0.061092,0.0,0.0,0.0,0.0
4,0.171135,0.0,0.67611,0.015279,0.0,0.0,0.0,0.0
5,0.117663,0.06102,0.86516,0.0,0.0,0.0,0.0,0.0
6,0.053193,0.0,1.0,0.0,0.0,0.0,0.0,0.0
7,0.008439,0.0,0.287427,0.0,0.0,0.0,0.0,0.0


We can also `create_fragment_mz_dataframe()` first and then predict the ms2 intensities.

In [None]:
del pep_df['frag_start_idx']
del pep_df['frag_stop_idx']

fragment_mz_df2 = fragment.create_fragment_mz_dataframe(
    pep_df, ['b_z1','b_z2','y_z1','y_z2'],
)
fragment_intensity_df2 = model_mgr.predict_ms2(pep_df, reference_frag_df=fragment_mz_df2)

import numpy as np
assert np.allclose(fragment_intensity_df.values, fragment_intensity_df2.values)

2022-09-25 16:44:43> Predicting MS2 ...


100%|██████████| 5/5 [00:00<00:00, 86.36it/s]


In [None]:
pep_df

Unnamed: 0,sequence,mods,mod_sites,irt,nAA,rt_pred,rt_norm_pred,irt_pred,charge,ccs_pred,precursor_mz,mobility_pred,nce,instrument,frag_start_idx,frag_stop_idx
0,LGGNEQVTR,,,-24.92,9,0.072804,0.072804,-28.148849,3,382.416138,325.173562,0.627622,0.3,Lumos,0,8
1,YILAGVENSK,,,19.79,10,0.400949,0.400949,21.806524,3,444.596069,365.201118,0.73079,0.3,Lumos,8,17
2,TPVISGGPYEYR,,,28.71,12,0.438901,0.438901,27.584271,3,451.388611,446.894465,0.74365,0.3,Lumos,17,28
3,TPVITGAPYEYR,,,33.38,12,0.489774,0.489774,35.328937,3,470.053528,456.238232,0.774563,0.3,Lumos,28,39
4,GTFIIDPGGVIR,,,70.52,12,0.757164,0.757164,76.035216,3,448.219818,415.571434,0.737861,0.3,Lumos,39,50
5,GTFIIDPAAVIR,,,87.23,12,0.846791,0.846791,89.679588,3,469.222473,424.915201,0.772623,0.3,Lumos,50,61
6,VEATFGVDESNAK,,,12.39,13,0.332649,0.332649,11.408902,3,479.035492,456.221018,0.789363,0.3,Lumos,61,73
7,DGLDAASYYAPVR,,,42.26,13,0.542729,0.542729,43.390475,3,488.397858,466.561374,0.804969,0.3,Lumos,73,85
8,ADVTPADFSEWSK,,,54.62,13,0.609782,0.609782,53.598396,3,465.785065,484.892901,0.767984,0.3,Lumos,85,97
9,GAGSSEPVTGLDAK,,,0.0,14,0.271196,0.271196,2.053492,3,453.66864,430.217496,0.747111,0.3,Lumos,97,110


# Using `PredictSpecLib` for DIA

For a given precursor_df or peptide_df, we can also directly predict the spectrum libraries using the `PredictSpecLib` class in `peptdeep.spec_lib.predict_lib`.

In [None]:
from peptdeep.spec_lib.predict_lib import PredictSpecLib

pep_df = irt_pep.copy()
pep_df['charge'] = 2
pep_df['nce'] = 0.3
pep_df['instrument'] = 'Lumos'
lib = PredictSpecLib(model_mgr, ['b_z1','b_z2','y_z1','y_z2'])
lib.precursor_df = pep_df
lib.predict_all()

2022-09-25 16:44:44> Calculating precursor isotope distributions ...
2022-09-25 16:44:44> Predicting RT/IM/MS2 ...
2022-09-25 16:44:44> Predicting RT ...


100%|██████████| 5/5 [00:00<00:00, 131.17it/s]

2022-09-25 16:44:44> Predicting mobility ...



100%|██████████| 5/5 [00:00<00:00, 136.49it/s]

2022-09-25 16:44:44> Predicting MS2 ...



100%|██████████| 5/5 [00:00<00:00, 98.45it/s]

2022-09-25 16:44:44> End Predicting RT/IM/MS2





In [None]:
lib.precursor_df

Unnamed: 0,sequence,pep_name,irt,mods,mod_sites,nAA,rt_pred,charge,nce,instrument,...,isotope_right_most_intensity,isotope_right_most_index,isotope_m1_mz,isotope_apex_mz,isotope_right_most_mz,rt_norm_pred,ccs_pred,mobility_pred,frag_stop_idx,frag_start_idx
0,LGGNEQVTR,RT-pep a,-24.92,,,9,0.072804,2,0.3,Lumos,...,0.485883,1,487.758355,487.256705,487.758355,0.072804,331.279816,0.815533,8,0
1,YILAGVENSK,RT-pep d,19.79,,,10,0.400949,2,0.3,Lumos,...,0.20104,2,547.799689,547.298039,548.301339,0.400949,364.828003,0.8995,17,8
2,TPVISGGPYEYR,RT-pep e,28.71,,,12,0.438901,2,0.3,Lumos,...,0.300977,2,670.339709,669.838059,670.841359,0.438901,394.317596,0.974434,28,17
3,TPVITGAPYEYR,RT-pep f,33.38,,,12,0.489774,2,0.3,Lumos,...,0.317267,2,684.355359,683.853709,684.857009,0.489774,399.848633,0.988309,39,28
4,GTFIIDPGGVIR,RT-pep i,70.52,,,12,0.757164,2,0.3,Lumos,...,0.263701,2,623.355162,622.853512,623.856812,0.757164,379.443451,0.936954,50,39
5,GTFIIDPAAVIR,RT-pep k,87.23,,,12,0.846791,2,0.3,Lumos,...,0.279015,2,637.370813,636.869163,637.872463,0.846791,387.88678,0.958034,61,50
6,VEATFGVDESNAK,RT-pep c,12.39,,,13,0.332649,2,0.3,Lumos,...,0.287225,2,684.329539,683.827889,684.831189,0.332649,394.208893,0.974369,73,61
7,DGLDAASYYAPVR,RT-pep g,42.26,,,13,0.542729,2,0.3,Lumos,...,0.316367,2,699.840073,699.338423,700.341723,0.542729,399.736542,0.988252,85,73
8,ADVTPADFSEWSK,RT-pep h,54.62,,,13,0.609782,2,0.3,Lumos,...,0.342913,2,727.337364,726.835714,727.839014,0.609782,405.532562,1.002953,97,85
9,GAGSSEPVTGLDAK,RT-pep b,0.0,,,14,0.271196,2,0.3,Lumos,...,0.248636,2,645.324256,644.822606,645.825906,0.271196,382.269806,0.944286,110,97


In [None]:
print(lib.precursor_df.sequence.values[0])

lib.fragment_mz_df.iloc[
    lib.precursor_df.frag_start_idx.values[0]
    :lib.precursor_df.frag_stop_idx.values[0]
]

LGGNEQVTR


Unnamed: 0,b_z1,b_z2,y_z1,y_z2
0,114.09134,57.549308,860.42207,430.714673
1,171.112804,86.06004,803.400606,402.203941
2,228.134268,114.570772,746.379143,373.69321
3,342.177195,171.592236,632.336215,316.671746
4,471.219788,236.113532,503.293622,252.150449
5,599.278366,300.142821,375.235045,188.121161
6,698.34678,349.677028,276.166631,138.586954
7,799.394458,400.200867,175.118952,88.063114
