![Representative examples of DeepLCMS predictions accompanied by their corresponding probability estimates.](../data/experiment_results/exp-5-final_evaluation/prediction_matrix.png){fig-align="center" width=50%}


Welcome to **DeepLCMS**!

A project that explores the application of deep learning models to classify mass spectrometry-based pseudo-images without the need for extensive data processing, eliminating the need for peak alignment, data annotation, quantitation, and other time-consuming steps. To showcase the capabilities of pre-trained neural networks for high-resolution LC/MS data, we successfully apply our convolutional neural network (CNN) to categorize substance abuse cases characterized by metabolomics data. Finally, we demonstrate how to gain insights into the network's decision-making process via the TorchCam library. This helps narrow down potential compound classes that the network considers crucial for decision-making, enabling the separation of classes and identifying areas of interest based on retention time and molecular weight.

# Introduction

While computer vision has gained widespread adoption in various aspects of our lives[@dobson_birth_2023], its application in medical imaging and biosciences has lagged behind, primarily due to limitations in clinical dataset size, accessibility, privacy concerns, experimental complexity, and high acquisition costs. For such applications, transfer learning has emerged as a potential solution[@seddiki_towards_2020]. This technique is particularly effective with small datasets, requiring fewer computational resources while achieving good classification accuracy compared to models built from scratch. Transfer learning involves a two-step process. Initially, a robust data representation is learned by training a model on a dataset comprising a vast amount of annotated data encompassing numerous categories (ImageNet for example). This representation is then utilized to construct a new model based on a smaller annotated dataset containing fewer categories. The model can be trained either by exclusively training the final decision layer(s) or by further fine-tuning the entire model with the reduced category set.

## Application of Pretrained Neural Networks for Mass Spectrometry Data

The use of pre-trained neural networks for mass spectrometry data analysis is relatively new, with only a handful of publications available to date. These studies have demonstrated the potential of deep learning models to extract meaningful information from raw mass spectrometry data and perform predictive tasks without the need for extensive data processing as required by the traditional workflows.

## Previous Research

In 2020, a study utilized MALDI-TOF images of rat brain samples to assess the ability of three different Convolutional Neural Networks (CNN) architectures – LeNet, Lecun, and VGG9 – to differentiate between different types of cancers based on their molecular profiles.

In 2021, Cadow et al. explored the use of pre-trained networks for the classification of tumors from normal prostate biopsies derived from SWATH-MS data. They delved into the potential of deep learning models for analyzing raw mass spectrometry data and performing predictive tasks without the need for protein quantification. To process raw MS images, the authors employed pre-trained neural network models to convert them into numerical vectors, enabling further processing. They then compared several classifiers, including logistic regression, support vector machines, and random forests, to accurately predict the phenotype.

In 2022, deepPseudoMSI, a deep learning-based pseudo-mass spectrometry imaging platform, was released to predict gestational age in pregnant women based on LC-MS-based metabolomics data. This application consists of two main components: 
Pseudo-MS Image Converter: This component converts LC-MS data into pseudo-images that can be processed by deep learning models. This process involves transforming mass spectrometry data into 2D images, where each pixel represents the intensity of a specific molecular featursis.


::: {.callout-note}
**DeepLCMS Project**

The DeepLCMS project aims to provide researchers with a reproducible source code for leveraging deep learning for mass spectrometry data analysis. It distinguishes itself from previous studies by:
* Comparing Diverse Architecture Families: Assessing a broader range of architecture families to find the most suitable one, including cutting-edge architectures like vision transformers.
* Hyperparameter Tuning: Conducting basic hyperparameter tuning to optimize the learning rate using Optuna including optimizer, and learning rate scheduler – crucial aspects beyond the architecture itself.
* Image Quality Analysis: Investigating the impact of image quality on validation metrics, examining image sharpness and image augmentation techniques.
* Regularization Techniques: Employing regularization techniques like random-tilting images and random erasing during training to improve model generalization.
* Interpreting Pretrained Network Decisions: Analyzing how the pre-trained network makes its decisions using TorchVision.

The DeepLCMS project aims to identify the most suitable architecture family from the timm library for a specific task and determine the optimal model size within that family based on validation metrics such as F1, precision, and recall throughout the training process. This approach aims to provide researchers with a valuable tool for harnessing the power of deep learning in mass spectrometry data analysis.
:::

::: {.callout-tip}
## Project Structure

The DeepLCMS project is divided into two main parts due to the need for a GPU for training the neural networks. The first part focuses on data preprocessing, specifically converting LC/MS data into pseudo-images using the PyOpenMS library, which is written in C++ and optimized for efficiency. This task can be handled on a CPU, and the corresponding source code is found in the `src/deeplcms_functions` directory.

For training the neural networks, which require GPU acceleration, the project utilizes the Pytorch Lightning framework, which provides a structured and organized codebase. The training experiments are conducted in Jupyter Notebooks using Google Colab, a cloud platform that offers GPU access. The training code, both in Python and Jupyter Notebook formats, can be found in the `src/train_google_colab` directory. This folder can be easily uploaded to Google Colab, and the modules can be imported without any issues.
:::

![Proposed conditions for evaluating selected image characteristics as hyperparameters.](../data/external/experimental_plan.jpg){#fig-1 fig-align="center" width=50%}


# How to use the App

1. Upload your CP-Seeker output files [here](https://cpseeker-postprocess.streamlit.app/).
2. Optionally adjust the confidence level, which is set to 80% by default.
3. Download the filtered and merged dataset.

# Get in touch

Did the app help with your research? Any ideas for making it better? Get in touch! I would love to hear from you.