This repository provides the datasets and codes associated with the following research article:
Sudharsan Vijayaraghavan, Akshaya Lakshminarayanan, Naman Bhargava, Janani Ravichandran, R.P. Vivek-Ananth*, Areejit Samal*,
Machine learning models for prediction of xenobiotic chemicals with high propensity to transfer into human milk,
ACS Omega, 9(11):13006-13016, 2024.
(* Corresponding authors)
Dataset - This folder contains the train, (internal) test, and external test dataset used in this study
Models
├── Classification - Codes used for classification models
├── Regression - Codes used for Regression models
ReadMe.md - Contains project and dataset description, along with steps to run the codes.
To build the machine learning models, we leveraged a curated dataset of 375 chemicals with experimentally determined M/P ratios compiled from Vasios et al. (PMID: 27573378) and other published literature. For each chemical in this dataset, we obtained the 2D structure, generated the 3D structure, and computed 1875 molecular descriptors using PaDEL. We evaluated the generalizability of our best classification models by leveraging an external test dataset, comprising 202 chemicals, with high risk of transfer from maternal plasma to human milk.
- train.csv - Training data
- test.csv - (Internal) test data
- external_test_dataset.csv - External test dataset
The codes in this repository enable the reproduction of the results present in the manuscript to predict xenobiotic chemicals with a high propensity to transfer from maternal plasma to human milk. The code provided for the five models corresponding to the five different classification algorithms and three models corresponding to three different regression algorithms performs end-to-end processing of the data including the feature pre-processing, feature selection, hyperparameter tuning, training, and evaluation of the models.
Classification-
- svm.py - Python code to train and evaluate Support Vector Machine model.
- xg_boost.py - Python code to train and evaluate XGBoost model.
- lda.py - Python code to train and evaluate Linear Discriminant Analysis model.
- mlp.py - Python code to train and evaluate Multi Layer Perceptron model.
- randomforest.py - Python code to train and evaluate Random Forest model.
- external_set.py - Python code for evaluating the model on the external test dataset after applying domain of applicability.
Regression-
- svm.py - Python code to train and evaluate Support Vector Machine model.
- xgboost.py - Python code to train and evaluate Xgboost models.
- randomforest.py - Python code to train and evaluate Random Forest model.
- classification_based_regression.py - Python code to evaluate the classification based on the regression model on the (internal) test set.
- external_set.py - Python code for evaluating the classification based on regression model on the external test dataset after applying domain of applicability.
- Use the following command to download all the required dependencies.
pip3 install -r requirements.txt
- Commands to run python code for classification and regression tasks.
python3 <path to python file> < # of top features to be considered>
- Commands to run external set and classification based on regression.
python3 <path to python file> <path to result folder>
In case you use the codes herein, please cite the following research article:
Sudharsan Vijayaraghavan, Akshaya Lakshminarayanan, Naman Bhargava, Janani Ravichandran, R.P. Vivek-Ananth*, Areejit Samal*, Machine learning models for prediction of xenobiotic chemicals with high propensity to transfer into human milk, ACS Omega, 9(11):13006-13016, 2024.