In [None]:
#---#| default_exp rescore.feature_extractor

# Feature Extractor

Functionalities to extract features for ML rescore

In [None]:
from peptdeep.rescore.feature_extractor import *

### `ScoreFeatureExtractor`
`ScoreFeatureExtractor` and its sub-classes are the most important functionalities for rescoring.

They define the fine-tuning parameters to tune RT/CCS/MS2 models. They also define the feature list to feed to Percolator.

We recommend to use `ScoreFeatureExtractorMP` as it is faster by using multiprocessing while remain the prediction performance.

### `ScoreFeatureExtractorMP`

`ScoreFeatureExtractorMP` uses multiprocessing to accelerate the feature extraction procedure. The pipeline is:
1. Randomly select some raw files to fine-tuning the model. This step also includes two steps: a. ms2 matching with multiprocessing; b. model tuning with a single thread (or GPU). 
2. Match PSM results of each raw file in the psm_df against the corresponding ms2 file in ms2_file_dict ({`raw_name`: `ms2_file_path`}) to get matched fragment intensity with multiprocessing.
3. Predict fragment intensity as well as RT and mobility with a single thread (or GPU). We use a single thread here becauase GPU memory is the main limitation, and we also enable raw-specific fine-tuning for different raw files.
4. Calculate the feature values with multiprocessing.

The key in `ScoreFeatureExtractorMP` is to access the GPU section without multiprocessing to avoid GPU memory conflicts.

The processing speed is very fast with a normal GPU (GTX1080). In our testing, peptdeep can rescore 371 HLA raw files within 1 hour.

In [None]:
#| hide

from peptdeep.pretrained_models import ModelManager

In [None]:
#| hide
model_mgr = ModelManager(device='cpu')
model_mgr.load_installed_models()
ScoreFeatureExtractorMP(model_mgr)
ScoreFeatureExtractor(model_mgr)
pass