Code for Predicting MIEs from Gene Expression and Chemical Target Labels with Machine Learning (MIEML)
This package provides scripts in R. For training models in the R package Caret, only R is required.
All code in this package has been tested on R 3.6.0, but any version of R 3.x should be sufficient.
Package versions may have an impact due to changes in functionality or API.
The following R libraries are also required to use all features of the R code:
- caret - Required for training binary classifiers. Code has been tested with v6.0-83
- data.table - Required for internal data handling. Code has been tested with v1.12.2
- dplyr - Required for internal data handling. Code has been tested with v0.8.3.
- rlist - Required for internal data handling. Code has been tested with v0.4.6.1.
- cmapR - Required for importing LINCS L1000 gene expression data internally. Code has been tested with vcmapR_1.0.1.
- parallel - Required for parallelizing model training. Code has been tested with v3.6.0.
- doParallel - Required for parallelizing model training. Code has been tested with v1.0.15.
- foreach - Required for parallelizing model training. Code has been tested with v1.4.7.
Classifier training relies on several publicly available data sets:
- LINCS L1000 Phase 1 release - GEO entry for LINCS phase 1
- LINCS L1000 Phase 2 release - GEO entry for LINCS phase 2
- RefChemDB Supplemental Information - RefChemDB table - supplemental table 12 should be exported and converted to .csv
- MSigDB gene sets - MSigDB gene sets required if training classifiers using GSEA and pathway scoring - this file should be converted to .txt.
Currently, it is recommended to clone the entire repo to a user or analysis directory by running:
git clone https://github.com/USEPA/CompTox-MIEML.git (/path/to/analysis)
This is only recommended if users wish to train classifiers using pathway scores. Currently, it is recommended to clone the entire repo to a user or analysis directory by running:
git clone https://github.com/USEPA/CompTox-httrpathway.git (/path/to/analysis)
Vignette is located at mieml/notebooks/ML_functions_vignette.Rmd
Modules are in: mieml/scripts/
(12/20/21)
- Created new (current) branch for public release of MIEML version used to generate code in initial project-associated publication.
(2/17/22)
- Revised vignette and primary MIEML functions for ease of use
(3/2/22)
- Revised README and vignette for public repo