# 📜 Generate In Silico Spectral Library Guideline

This notebook demonstrates how to generate in silico spectral library

All thing needed:
- peptide infomation
- DL Model to predict mass spectra of peptide

In [1]:
import warnings

warnings.filterwarnings('ignore')

In [None]:
import os
import sys

sys.path.append(os.path.abspath(os.path.join(os.path.dirname("../"), "")))

In [None]:
from src.generate_spectral_lib import generate_spectral_lib
import pandas as pd

### 🤖 | Import model

In [None]:
frag_types = ['b_z1', 'b_z2', 'y_z1', 'y_z2', 'b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2']

In [None]:
from peptdeep.model.ms2 import ModelMS2Bert
from src.custom_model import CustomModelManager, CustompDeepModel

ms2_model = CustompDeepModel(charged_frag_types=frag_types, mask_modloss=False,model_class=ModelMS2Bert, modloss_type=["modloss", 'NH3', 'H2O'])
modelMans = CustomModelManager(mask_modloss=False, ms2_model=ms2_model)

'''
    You can load your own model here:
    modelMans.rt_model.load('YOUR_PRETRAINED_MODEL_PATH/rt.pth')
    modelMans.ccs_model.load('YOUR_PRETRAINED_MODEL_PATH/ccs.pth')
    modelMans.ms2_model.load('YOUR_PRETRAINED_MODEL_PATH/ms2.pth')
'''

### 🗂️ | Load data

In [None]:
psm_df = pd.read_csv('YOUR_DATASET_PATH/psm_df.csv')
psm_df['mod_sites'] = psm_df['mod_sites'].astype(str)
psm_df['mod_sites'].fillna('', inplace=True)
psm_df['mods'] = psm_df['mods'].astype(str)
psm_df['mods'].fillna('', inplace=True)
psm_df['mods'] = psm_df['mods'].apply(lambda x: x.replace('Acetyl@Protein N-term', 'Acetyl@Protein_N-term'))

mz_df = pd.read_csv('YOUR_DATASET_PATH/fragment_mz_df.csv')
intensity_df = pd.read_csv('YOUR_DATASET_PATH/fragment_intensity_df.csv')

### ✏️ | Generate in silico spectral library

##### 🛠️ Function: `generate_spectral_lib(...)`

```python
generate_spectral_lib(
    output_file: str,
    model_mgr: ModelManager,
    psm_df: pd.DataFrame,
    predict_items: List[str],
    frag_types: List[str],
    multiprocessing: bool = False
)
```

**Parameters**:
- `output_file`: Path to save the output spectral library in .tsv format.
- `model_mgr`: Model Manager to predict RT, CSS, MS2
- `psm_df`: Peptide-Spectrum Match dataframe containing the input peptide information.
- `predict_items`: Items to predict (['rt', 'ccs', 'ms2'])
- `frag_types`: List type of ms2 fragment
- `multiprocessing`: Is predicted with multiprocessing 

In [None]:
generate_spectral_lib(
        output_file=f'YOUR_SPEC_LIB_PATH/SPECTRAL_LIBRARY_NAME.tsv',
        model_mgr=modelMans,
        psm_df=psm_df,
        predict_items=['mobility', 'ms2'],
        frag_types=frag_types,
        multiprocessing=False
)

#### 💾 Output:

This function will create spectral library and store results as: `YOUR_SPEC_LIB_PATH/SPECTRAL_LIBRARY_NAME.tsv`

This file contains these informations:
- `ModifiedPeptide`: Peptide sequence contains post-translational modification information (these mods is expressed by UniMod id)
- `PeptideSequence`: Peptide sequence don't contains post-translational modification information
- `Run`: Name of experiment
- `PrecursorCharge`: Charge of precursor
- `PrecursorMz`: Mass-to-charge ratio of precursor
- `Tr_recalibrated`: Retention time of peptide
- `ProteinName`: Protein name
- `FragmentCharge`: Charge of fragment
- `FragmenType`: Type of fragment
- etc. (please read file README.md for more information)
