GitHub - ATOMconsortium/AMPL-Tutorial: AMPL software tutorials

The ATOM Modeling PipeLine (AMPL; https://github.com/ATOMconsortium/AMPL) is an open-source, modular, extensible software pipeline for building and sharing models to advance in silico drug discovery. To see the list of AMPL parameters, please check this link, https://github.com/ATOMconsortium/AMPL/blob/master/atomsci/ddm/docs/PARAMETERS.md

This page contains a collection of AMPL-COLAB tutorial notebooks.

+ Please note that if you have trouble opening up any of the following notebooks, please go to, https://nbviewer.jupyter.org/, and paste the notebook link to view the contents.

0. Basic Google COLAB Introduction (Works best with Google Chrome)

Tutorial-00: Basic COLAB tutorial. For all the COLAB tutorials, click on the tutorial link, and then click on "Open in Colab" baner. You can open and run the notebook from the browser. If you want to save your edits to the notebook, you need to save a copy in your Google Drive. Usually, Google COLAB saves the notebook files under the "My Drive > Colab Notebooks" folder

1. Data Collection and creating Machine-Learning ready datasets:

The data that we gather for modeling is small-molecule/drug binding data. The following links will introduce some of the concepts and outcome measures related to this topic:

For the tutorials, we will use the small-molecule binding data obtained from either one of the following resources, ChEMBL (https://www.ebi.ac.uk/chembl/), Drug Target Commons (DTC; https://drugtargetcommons.fimm.fi/) and Excape-DB (https://solr.ideaconsult.net/search/excape/).

Click here to learn about single-target focussed data.

Explore HTR3A binding data from ExCAPE-DB

Tutorial-01: (Time: ~ 3 minutes) This COLAB notebook will use AMPL for data cleaning, EDA and clustering on ExCAPE-DB (https://solr.ideaconsult.net/search/excape/) data for HTR3A protein
Tutorial-02: (Time: ~ 6 minutes) This COLAB notebook will use AMPL for data curation of HTR3A protein data from ExCAPE-DB (https://solr.ideaconsult.net/search/excape/) Data

Explore HTR3A binding data from Drug Target Commons database

Tutorial-03: (Time: ~ 4 minutes) This COLAB notebook will use AMPL for Data cleaning, EDA and clustering of HTR3A protein data from Drug Target Commons (DTC)
Tutorial-04: (Time: ~ 10 minutes) This COLAB notebook will use AMPL for Data curation of HTR3A protein data from Drug Target Commons (DTC)

Curating, merging and visualizing two datasets

Tutorial-05: (Time: ~ 4 minutes) This COLAB notebook will use AMPL to upload datasets (small-molecule activity data from ChEMBL), clean, merge and do some basic Exploratory Data Analysis.
Tutorial-06: (Time: ~ 4 minutes) This COLAB notebook with use AMPL to merge HTR3A binding data from two different data sources, DTC and ExCAPE-DB.

Exploratory Data Analysis (EDA) Notebooks

Tutorial-07: (Time: ~ 4 minutes). The notebook uses HTR3A as the protein target. The notebook accomplishes the following tasks:
- Uses AMPL software
- Reads in data from three database sources: ChEMBL, Excape-DB and DTC
- Cleans, standardizes and analyzes the data
- Merges and harmonizes to create a dataset
Tutorial-08: Exploratory Data Analysis-Regression
Tutorial-09: Exploratory Data Analysis-Regression

2. Model training and tuning:

Random Forest modeling to predict solubility

Tutorial-10: (Time: ~ 2 minutes): Simple supervised learning example. AMPL will read the public data (117 chemical compounds), curate, fit a Random Forest model to predict solubility and test the model. For additional information on the dataset, please check this publication,https://pubmed.ncbi.nlm.nih.gov/15154768/

Graph Convolution modeling to predict SCN5A binding affinities

Tutorial-11: (Mode: AMPL_GPU; Time: ~ 18 minutes): This COLAB notebook will use AMPL for predicting binding affinities -pIC50 values- of ligands that could bind to human Sodium channel protein type 5 subunit alpha protein (Gene: SCN5A) using Graph Convolutional Network Model. ChEMBL database is the data source of binding affinities (pIC50)

3. Hyper-parameter Optimization (HPO), Uncertainty Quantification (UQ), and using metrics for analyzing model performance.

This notebook also explores AMPL functions for saving and loading prebuild AMPL models for analysis.

Tutorial-12 Hyper-parameter Optimization (HPO) and Uncertainty Quantification (UQ).
Tutorial-13 Notebook includes HPO Grid Search on three different modeling methods (Random Forest, NN and XGBoost).

4. Creating high-quality models

Tutorial-12 Notebook provides the framework for visualizing the results of HPO results and use them to identify best models.

5. Model Inference:

Tutorial-14: This notebook creates an AMPL (RF) model using BSEP dataset (reference: https://pubmed.ncbi.nlm.nih.gov/33502191/), and makes predictions (inference) on an external sample test dataset.

AMPL Workshops

Workshop date, June 05, 2021: Protein Target-focussed Binding Data Curation, Exploratory Data Analysis and Featurization using AMPL. Please note that Google Chrome browser works best with the COLAB Jupyter notebooks

Click here to access the presentation slides.
Click here to open the tutorial Jupyter COLAB notebook. If for some reason, the notebook link doesnt work for you, please use this one, https://nbviewer.jupyter.org/github/ravichas/AMPL-Tutorial/blob/master/AMPL_FNL_Workshop_06052021.ipynb
For use with GCP or other pre-installed AMPL environments: https://github.com/ravichas/AMPL-Tutorial/blob/master/GCP_AMPL_FNL_Workshop_06052021.ipynb

Supporting links

Similar chemoinformatics, drug-discovery software tools:

DeepChem, https://deepchem.io/
rdkit, https://www.rdkit.org/

Chemoinformatics databases

ChEMBL: https://www.ebi.ac.uk/chembl/
PubChem: https://pubchem.ncbi.nlm.nih.gov/
Drug Target Commons (DTC): https://drugtargetcommons.fimm.fi/
ExCAPE-DB: https://solr.ideaconsult.net/search/excape/
DrugBank: https://go.drugbank.com/

Acknowledgements

Most of the tutorial code chunks came from multple Jupyter notebooks generously shared by the ATOM team.

Amanda Paulson
Ben Madej
Da Shi
Hiran Ranganathan
Jessica Mauvais
Jonathan Allen
Kevin Mcloughlin
Sarangan Ravichandran
Stewart He
Ya Ju Fan
Contributions from the following student programs:
- The Purdue Data Mine; https://datamine.purdue.edu/
- Butler University
- Columbia University

Name		Name	Last commit message	Last commit date
Latest commit History 772 Commits
Img		Img
Other-notebooks		Other-notebooks
config		config
data_curation		data_curation
datasets		datasets
models		models
solutions		solutions
supp_codes		supp_codes
supp_md		supp_md
00_BasicCOLAB_Tutorial.ipynb		00_BasicCOLAB_Tutorial.ipynb
01_Exploring_Target_Activity_ExcapeDB.ipynb		01_Exploring_Target_Activity_ExcapeDB.ipynb
02_Explore_Data_ExcapeDB_curation.ipynb		02_Explore_Data_ExcapeDB_curation.ipynb
03_Explore_Data_DTC.ipynb		03_Explore_Data_DTC.ipynb
04_Explore_Data_DTC_Curate.ipynb		04_Explore_Data_DTC_Curate.ipynb
05_EDA_Curate_Merge_Visualize.ipynb		05_EDA_Curate_Merge_Visualize.ipynb
06_Combine_Datasets.ipynb		06_Combine_Datasets.ipynb
06a_UnionDSet.ipynb		06a_UnionDSet.ipynb
07_EDA_With_Harmonization.ipynb		07_EDA_With_Harmonization.ipynb
08_AMPL_EDA_Part2.ipynb		08_AMPL_EDA_Part2.ipynb
09_AMPL_EDA_Part2_Classification.ipynb		09_AMPL_EDA_Part2_Classification.ipynb
10_Delaney_Solubility_Prediction.ipynb		10_Delaney_Solubility_Prediction.ipynb
11_CHEMBL26_SCN5A_IC50_prediction.ipynb		11_CHEMBL26_SCN5A_IC50_prediction.ipynb
12_AMPL_HPO_demo.ipynb		12_AMPL_HPO_demo.ipynb
13_AMPL_HPO_Part2.ipynb		13_AMPL_HPO_Part2.ipynb
14_BSEP_modeling.ipynb		14_BSEP_modeling.ipynb
AMPL_FNL_Workshop_06052021.ipynb		AMPL_FNL_Workshop_06052021.ipynb
AMPL_FNL_Wrshp2_1.ipynb		AMPL_FNL_Wrshp2_1.ipynb
AMPL_FNL_Wrshp2_2.ipynb		AMPL_FNL_Wrshp2_2.ipynb
AMPL_FNL_Wrshp2_3.ipynb		AMPL_FNL_Wrshp2_3.ipynb
GCP_AMPL_FNL_Workshop_06052021 (1).ipynb		GCP_AMPL_FNL_Workshop_06052021 (1).ipynb
GCP_AMPL_FNL_Workshop_06052021.ipynb		GCP_AMPL_FNL_Workshop_06052021.ipynb
GCP_AMPL_FNL_Wrshp2_1.ipynb		GCP_AMPL_FNL_Wrshp2_1.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

0. Basic Google COLAB Introduction (Works best with Google Chrome)

1. Data Collection and creating Machine-Learning ready datasets:

Explore HTR3A binding data from ExCAPE-DB

Explore HTR3A binding data from Drug Target Commons database

Curating, merging and visualizing two datasets

Exploratory Data Analysis (EDA) Notebooks

2. Model training and tuning:

Random Forest modeling to predict solubility

Graph Convolution modeling to predict SCN5A binding affinities

3. Hyper-parameter Optimization (HPO), Uncertainty Quantification (UQ), and using metrics for analyzing model performance.

4. Creating high-quality models

5. Model Inference:

AMPL Workshops

Supporting links

Similar chemoinformatics, drug-discovery software tools:

Chemoinformatics databases

Acknowledgements

About

Releases

Packages

Languages

ATOMconsortium/AMPL-Tutorial

Folders and files

Latest commit

History

Repository files navigation

0. Basic Google COLAB Introduction (Works best with Google Chrome)

1. Data Collection and creating Machine-Learning ready datasets:

Explore HTR3A binding data from ExCAPE-DB

Explore HTR3A binding data from Drug Target Commons database

Curating, merging and visualizing two datasets

Exploratory Data Analysis (EDA) Notebooks

2. Model training and tuning:

Random Forest modeling to predict solubility

Graph Convolution modeling to predict SCN5A binding affinities

3. Hyper-parameter Optimization (HPO), Uncertainty Quantification (UQ), and using metrics for analyzing model performance.

4. Creating high-quality models

5. Model Inference:

AMPL Workshops

Supporting links

Similar chemoinformatics, drug-discovery software tools:

Chemoinformatics databases

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages