![Representative examples of DeepLCMS predictions accompanied by their corresponding probability estimates.](exp-5-prediction_matrix.png){fig-align="center" width=50%}


Welcome to DeepLCMS, a project that combines mass spectrometry analysis with the power of deep learning models!

Unlike traditional methods, DeepLCMS eliminates the need for extensive data processing, including peak alignment, data annotation, quantitation, and other time-consuming steps. Instead, it relies on the power of deep learning to directly classify mass spectrometry-based pseudo-images with high accuracy. To demonstrate the capabilities of pre-trained neural networks for high-resolution LC/MS data, we successfully apply our convolutional neural network (CNN) to categorize substance abuse cases. We utilize the openly available Golestan Cohort Study's metabolomics HRMS/LC data to train and evaluate our CNN [@pourshams_cohort_2010; @ghanbari_metabolomics_2021; @li_untargeted_2020]. We also delve into the network's decision-making process through TorchCam library. This tool allows us to gain insights into the factors that influence the network's classifications, helping us identify key compound classes that play a crucial role in differentiating between classes. By analyzing retention time and molecular weight, we can pinpoint areas of interest within the data.

DeepLCMS paves the way for a new era of mass spectrometry analysis, offering a faster, more efficient, and more insightful approach to data interpretation. Its ability to directly classify pseudo-images without extensive preprocessing opens up a world of possibilities for researchers and clinicians alike.

::: {.callout-note}
**At a glance**

The DeepLCMS project aims to provide researchers with a reproducible source code for leveraging deep learning for mass spectrometry data analysis. It distinguishes itself from previous studies by:
* Comparing Diverse Architecture Families: Assessing a broader range of architecture families to find the most suitable one, including cutting-edge architectures like vision transformers.
* Hyperparameter Tuning: Conducting basic hyperparameter tuning to optimize the learning rate using Optuna including optimizer, and learning rate scheduler – crucial aspects beyond the architecture itself.
* Image Quality Analysis: Investigating the impact of image quality on validation metrics, examining image sharpness and data augmentation imitating retention time shift.
* Regularization Techniques: Employing regularization techniques like random-tilting images and random erasing during training to improve model generalization.
* Interpreting Pretrained Network Decisions: Analyzing how the pre-trained network makes its decisions using TorchVision.
:::

# Import libraries

In [None]:
import pandas as pd
from PIL import Image

# Introduction

While computer vision has gained widespread adoption in various aspects of our lives[@dobson_birth_2023], its application in medical imaging and biosciences has lagged behind, primarily due to limitations in clinical dataset size, accessibility, privacy concerns, experimental complexity, and high acquisition costs. For such applications, transfer learning has emerged as a potential solution[@seddiki_towards_2020]. This technique is particularly effective with small datasets, requiring fewer computational resources while achieving good classification accuracy compared to models trained from scratch. Transfer learning involves a two-step process. Initially, a robust data representation is learned by training a model on a dataset comprising a vast amount of annotated data encompassing numerous categories (ImageNet for example). This representation is then utilized to construct a new model based on a smaller annotated dataset containing fewer categories. 

## Application of Pretrained Neural Networks for Mass Spectrometry Data

The use of pre-trained neural networks for mass spectrometry data analysis is relatively new, with only a handful of publications available to date. These studies have demonstrated the potential of deep learning models to extract meaningful information from raw mass spectrometry data and perform predictive tasks without the need for extensive data processing as required by the traditional workflows.

## Previous Research

* In 2018, @behrmann_deep_2018 used deep learning techniques for tumor classification in Imaging Mass Spectrometry (IMS) data.

* In 2020, @seddiki_towards_2020 utilized MALDI-TOF images of rat brain samples to assess the ability of three different CNN architectures – LeNet, Lecun, and VGG9 – to differentiate between different types of cancers based on their molecular profiles.

* In 2021, @cadow_feasibility_2021 explored the use of pre-trained networks for the classification of tumors from normal prostate biopsies derived from SWATH-MS data. They delved into the potential of deep learning models for analyzing raw mass spectrometry data and performing predictive tasks without the need for protein quantification. To process raw MS images, the authors employed pre-trained neural network models to convert them into numerical vectors, enabling further processing. They then compared several classifiers, including logistic regression, support vector machines, and random forests, to accurately predict the phenotype.

* In 2022, @shen_deep_2022 released deepPseudoMSI, a deep learning-based pseudo-mass spectrometry imaging platform, designed to predict the gestational age in pregnant women based on LC-MS-based metabolomics data. This application consists of two components: Pseudo-MS Image Converter: for converting LC-MS data into pseudo-images and the deep learning model itself.


::: {.callout-tip}
## Project Structure

To accommodate the high computational demands of neural network training, the DeepLCMS project is divided into two main parts. The first part focuses on data preprocessing, specifically converting LC/MS data into pseudo-images using the PyOpenMS library, which is written in C++ and optimized for efficiency. This task can be handled on a CPU, and the corresponding source code is found in the `src/deeplcms_functions` directory.

To effectively train the neural networks that demand GPU acceleration, the project employs the PyTorch Lightning framework, a comprehensive solution for building and deploying deep learning models. Training experiments are conducted within Jupyter Notebooks hosted on Google Colab, a cloud platform equipped with free GPU access. The training code and corresponding modules reside within the `src/train_google_colab` directory. This folder seamlessly integrates with Google Colab, allowing for effortless module imports.
:::

# Materials and Methods
## Dataset

To ensure the feasibility of our proof-of-concept demonstration, we selected a suitable dataset from the Metabolomics Workbench. We prioritized studies with distinct groups and a minimum sample size of 200. Additionally, we chose a dataset with a disk requirement of less than 50 GB to minimize computational resource demands. Based on these criteria, we identified the Golestan Cohort Study [@pourshams_cohort_2010]. This study, conducted in northeastern Iran, primarily investigates the risk factors for upper gastrointestinal cancers in this high-risk region. Approximately 50,000 volunteers were analyzed, including opium users and their mortality outcomes. Quantitative targeted liquid chromatography mass spectrometric (LC-MS/MS) data was collected at the University of North Carolina at Chapel Hill [@ghanbari_metabolomics_2021; @li_untargeted_2020]. The dataset consisted of 218 opioid users and 80 non-users. After initial data inspection and conversion to mzML format using the ProteoWizard 3.0.22155 software, files were divided into training (n = 214), validation (n = 54), and test (n = 30) sets. To evaluate the impact of image characteristics and augmentation techniques on classification performance, four datasets were prepared. One dataset employed a bin size of 500 × 500 using the numpy library's histogram2d function and was named `ST001618_Opium_study_LC_MS_500`. Another dataset incorporated data augmentation using the augment_images function, generating nine additional images per training set with random offsets of up to 5 units in both x and y directions. This dataset was named `ST001618_Opium_study_LC_MS_500_augmented`. The third dataset employed a higher bin size of 1000 × 1000 using histogram2d for sharper pseudoimages and was named `ST001618_Opium_study_LC_MS_1000`. The final dataset applied the same augmentation technique to this dataset, generating nine additional images per training set with random offsets of up to 5 units in both x and y directions. It was named `ST001618_Opium_study_LC_MS_1000_augmented`.

## Software

The source code for the DeepLCMS project utilizes the following software packages and versions: PyTorch Lightning 2.1.3, Pytorch Image Models (timm)  0.9.12, torchinfo 1.8.0, Optuna 3.5.0, TorchCam 0.4.0, pandas 2.1.3, NumPy 1.26, and Matplotlib 3.7.1.



# Results and Discussion
## Selecting a model architecture family

From the readily available model architecture families provided by [Pytorch Image Models](https://github.com/huggingface/pytorch-image-models#models), 68 unique ones were chosen as representative examples of a given class and filtered based on their parameter count, selecting those with parameters counts between 10 and 20 million to allow for an unbiased comparison. Out of the 68 ones selected 32 were subsequently underwent training with validation metrics recorded. According to @tbl-exp-1-result (arranged according to validation loss), the MobileOne S3 emerged as the top performer with an F1 score of 0.95. It was followed by DenseNet (F1 = 0.92), MobileViTV2 (F1 = 0.90), and ConvNeXt Nano (F1 = 0.84). RepVit M3 rounded out the top five with an F1 score of 0.86. To delve deeper into the performance of each architecture family, we evaluated all individual architectures within each family (@tbl-exp-1-best_models).

In [None]:
# | label: tbl-exp-1-result
# | tbl-cap: Validation metrics of model architectural families from the PyTorch Image Models library using models with parameter counts ranging from 10 to 20 million.

(
    pd.read_csv("exp-1-result.csv")
    .rename(
        columns=lambda df: df.replace("_", " ").replace("val", "validation").title()
    )
    .rename(columns={"Minimal Param Model Count": "Parameter Count (M)"})
    .round(2)
)

As shown in @tbl-exp-1-best_models, the top four positions based on the validation metrics were occupied by models belonging to the ConvNeXt family. These models, particularly the first two (convnext_large_mlp.clip_laion2b_augreg_ft_in1k_384 with validation loss 0.19, convnext_large_mlp.clip_laion2b_augreg_ft_in1k with validation loss 0.22), were pre-trained on the extensive LAION-2B dataset and fine-tuned on ImageNet-1k, enabling them to learn complex patterns that generalize well to unseen data. This comprehensive dataset and model structure contribute to their superior performance in this task. The dominance of the ConvNeXt family is noteworthy, suggesting their effectiveness in handling complex data such as mass spectrometry pseudoimages. Apple's MobileOne also demonstrated remarkable results (validation loss = 0.27), ranking fourth in terms of validation loss. Finally, MobileViT (validation loss = 0.27), a lightweight network, secured the seventh position.  
Nevertheless, it is essential to acknowledge that while the ConvNeXt model achieved the highest validation metrics, it is considerably larger and more parameter-intensive. This implies that it demands more computational resources for training and necessitates greater care to prevent potential overfitting, as opposed to other models that are 10-20 times smaller.

In [None]:
# | label: tbl-exp-1-best_models
# | tbl-cap: Top 10 models based on evaluation metrics of all model architectures in families from the PyTorch Image Models library, regardless of their parameter counts.

(
    pd.read_csv("exp-1-best_models.csv")
    .rename(
        columns=lambda df: df.replace("_", " ").replace("val", "validation").title()
    )
    .round(2)
    .head(10)
)

To further assess the suitability of these models and the consistency of their performance, we trained the top three performing models from each family five consecutive times and calculated the median and standard deviation of their validation metrics (@fig-exp-2-replicates_result). This approach allowed us to identify the models that exhibited the most consistent performance across multiple training runs. The results revealed that the MobileViT family consistently performed similarly to the ConvNeXt across all five training runs, except for the validation losses where ConvNeXt exhibited the lowest values with 0.21. According to the replication study, the ConvNeXt emerged as the most effective model, surpassing both MobileOne and MobileViT2. Despite exhibiting similar patterns, ConvNeXt exhibited a statistically significant advantage over MobileViT in validation loss (p = 0.01) and achieved significantly better validation recall (p < 0.05). Among the three models tested, MobileOne consistently underperformed its counterparts in all performance metrics, except for validation recall, where it narrowly outperformed mobilevitv2 (p < 0.05). Based 

In [None]:
# | label: fig-exp-2-replicates_result
# | fig-cap: "Median and standard deviation of validation metrics achieved during consecutive trainings"

Image.open("exp-2-replicates_result.png").convert("RGB")

# Get in touch

Did the app help with your research? Any ideas for making it better? Get in touch! I would love to hear from you.