This repository provides codes and examples for the analysis of the paper:
ModiFinder: Tandem Mass Spectral Alignment Enables Structural Modification Site Localization
Mohammad Reza Zare Shahneh, Michael Strobel, Giovanni Andrea Vitale, Christian Geibel, Yasin El Abiead, Berenike C Wagner, Karl Forchhammer, Neha Garg, Allegra T Aron, Vanessa V Phelan, Daniel Petras, Mingxun Wang
-
After cloning the repository, you need to add the ModiFinder module:
git submodule update --init --recursive -
Install the conda enviroment, We recommend using mamba instead of conda for fast install (e.g.,
mamba env create -f environment.yml):conda env create -f environment.yml -
Install
nextflow -
Activate the environment:
conda activate modi-finder-analysis
First, you need to set the directory of the data in run_config file. Then you can download the data or create it from scratch:
-
You can download the files used in this project from:
Zenodoand put them in the data directory defined earlier. The final format should be similar to this:your_data_directory/ ├── matches/ ├── helpers/ ├── SIRIUS/ └── cfmid_exp/Please note that if you choose to download data in this manner, due to the necessity of requesting information for each individual compound in real-time, it is essential to restrict the number of concurrent processes to avoid exceeding the server's request limits.
-
You can download and create the data used in this project from scratch, by running the
data_prepare_main.py:conda activate modi-finder-analysis python ./data_preparation/data_prepare_main.pyPlease note that the data for SIRIUS has to be dowloaded from the provided link in the previous section or use
gnps2to run the workflow. -
You can download the random forest model and then load it using:
import joblib trained_model = joblib.load(trained_model_path) inputs = trained_model['input'] model = trained_model['model']given that
scikit-learn==1.3.2is installed.
To run our experiments, you can run the following command:
conda activate modi-finder-analysis python ./experiments_runners/experiments_runner.py './experiments_settings/all_experiments_settings.csv'
you can check paper_figures/performance_results.ipynb notebook for performance result illustrations.
you can check paper_figures/how_much_helpers_help.ipynb notebook for helpers contribution.
you can check paper_figures/evaluation_score_illustration.ipynb notebook for evaluation score illustration.
you cak check paper_figures/datasets.ipynb notebook for dataset stats.