Integrative Machine Learning (IML)
Parvandeh et al., Consensus features nested cross-validation, Bioinformatics, 2020
-
Download the four data types from GDSC website
-
Annotate mutation data with Evolutionary Action
-
Preprocess four data types
Rscript 1_Preprocessing_data.R
-
Predict IC50 values for pan-cancer cell lines
Rscript 2_IML_PANCAN_1Dtype.R Rscript 3_IML_PANCAN_2Dtypes.R Rscript 4_IML_PANCAN_3Dtypes.R Rscript 5_IML_PANCAN_4Dtypes.R
-
Predict IC50 values for cancer specific cell lines
Rscript 6_IML_CASpecific_1Datatype.R Rscript 7_IML_CASpecific_2Datatypes.R Rscript 8_IML_CASpecific_3Datatypes.R Rscript 9_IML_CASpecific_4Datatypes.R
-
Define sensitivity signature using genes that identified from step 5 for RNA-Seq
Rscript 10_IML_SensitivitySignature.R
-
Visualize the results
Rscript 11_Visualization.R
-
Validate on CCLE data
Rscript 12_Validation_CCLE.R
install.packages(c('CORElearn', 'Rcpp', 'dplyr', 'parallel', 'foreach', 'doParallel', 'glmnet', 'randomForest', 'e1071', 'rpart'))
Other R packages
install.packages(c('ggplot2', 'tidyverse', 'hrbrthemes', 'viridis', 'ggpubr', 'ggbeeswarm', 'forcats', 'cvTools'))
Helper function (included)
Rcpp::sourceCpp("VCF2DM.cpp")