M/P-ratio-Pred

This repository provides the datasets and codes associated with the following research article:

Sudharsan Vijayaraghavan, Akshaya Lakshminarayanan, Naman Bhargava, Janani Ravichandran, R.P. Vivek-Ananth*, Areejit Samal*, Machine learning models for prediction of xenobiotic chemicals with high propensity to transfer into human milk, ACS Omega, 9(11):13006-13016, 2024.
(* Corresponding authors)

Schematic Workflow

Schematic diagram summarizes the workflow to build the classification- and regression-based machine learning models to predict xenobiotic chemicals with high propensity to transfer from maternal plasma to human milk. The figure shows the key steps involved in data curation, feature generation, data preprocessing, feature selection, and the training and evaluation of classification- and regression-based machine learning models.

Repository Organization

Dataset - This folder contains the train, (internal) test, and external test dataset used in this study
Models 
  ├── Classification - Codes used for classification models
  ├── Regression - Codes used for Regression models
ReadMe.md - Contains project and dataset description, along with steps to run the codes.

Dataset

To build the machine learning models, we leveraged a curated dataset of 375 chemicals with experimentally determined M/P ratios compiled from Vasios et al. (PMID: 27573378) and other published literature. For each chemical in this dataset, we obtained the 2D structure, generated the 3D structure, and computed 1875 molecular descriptors using PaDEL. We evaluated the generalizability of our best classification models by leveraging an external test dataset, comprising 202 chemicals, with high risk of transfer from maternal plasma to human milk.

train.csv - Training data
test.csv - (Internal) test data
external_test_dataset.csv - External test dataset

Models

The codes in this repository enable the reproduction of the results present in the manuscript to predict xenobiotic chemicals with a high propensity to transfer from maternal plasma to human milk. The code provided for the five models corresponding to the five different classification algorithms and three models corresponding to three different regression algorithms performs end-to-end processing of the data including the feature pre-processing, feature selection, hyperparameter tuning, training, and evaluation of the models.

Classification-

svm.py - Python code to train and evaluate Support Vector Machine model.
xg_boost.py - Python code to train and evaluate XGBoost model.
lda.py - Python code to train and evaluate Linear Discriminant Analysis model.
mlp.py - Python code to train and evaluate Multi Layer Perceptron model.
randomforest.py - Python code to train and evaluate Random Forest model.
external_set.py - Python code for evaluating the model on the external test dataset after applying domain of applicability.

Regression-

svm.py - Python code to train and evaluate Support Vector Machine model.
xgboost.py - Python code to train and evaluate Xgboost models.
randomforest.py - Python code to train and evaluate Random Forest model.
classification_based_regression.py - Python code to evaluate the classification based on the regression model on the (internal) test set.
external_set.py - Python code for evaluating the classification based on regression model on the external test dataset after applying domain of applicability.

Syntax to run the codes

Use the following command to download all the required dependencies.

   pip3 install -r requirements.txt

Commands to run python code for classification and regression tasks.

    python3 <path to python file> < # of top features to be considered>

Commands to run external set and classification based on regression.

    python3 <path to python file>  <path to result folder>

Citation

In case you use the codes herein, please cite the following research article:

Sudharsan Vijayaraghavan, Akshaya Lakshminarayanan, Naman Bhargava, Janani Ravichandran, R.P. Vivek-Ananth*, Areejit Samal*, Machine learning models for prediction of xenobiotic chemicals with high propensity to transfer into human milk, ACS Omega, 9(11):13006-13016, 2024.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Dataset		Dataset
Models		Models
LICENSE		LICENSE
README.md		README.md
SchematicWorkflow.png		SchematicWorkflow.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset

Dataset

Models

Models

LICENSE

LICENSE

README.md

README.md

SchematicWorkflow.png

SchematicWorkflow.png

requirements.txt

requirements.txt

Repository files navigation

M/P-ratio-Pred

Schematic Workflow

Repository Organization

Dataset

Models

Syntax to run the codes

Citation

About

Releases

Packages

Languages

License

asamallab/M-by-P-ratio-Pred

Folders and files

Latest commit

History

Repository files navigation

M/P-ratio-Pred

Schematic Workflow

Repository Organization

Dataset

Models

Syntax to run the codes

Citation

About

Resources

License

Stars

Watchers

Forks

Languages