Skip to content

DIVERSE: Bayesian Data IntegratiVE learning for precise drug ResponSE prediction

Notifications You must be signed in to change notification settings

bguvenc/DIVERSE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DIVERSE: Bayesian Data IntegratiVE learning for precise drug ResponSE prediction

This repository contains the implementation of the computational methods in the paper Güvenç Paltun et al., DIVERSE: Bayesian Data IntegratiVE learning for precise drug ResponSE prediction.

Main files:

  • Diverse_out.py Python script for performing out-of-matrix cross-validation experiments.
  • Diverse_in.py Python script for performing in-matrix cross-validation experiments.
  • metrics.py Contains functions for computing mean square error (MSE) and drug-averaged Sc (Spearman correlation) evaluation scores

Source files [1]: Folder containing the source code for DIVERSE methdod.

Requirements

  • Python 2.7
  • numpy
  • scipy

Data

Pre-processed datasets are available in the DIVERSE/data/original_data director

  • gdsc_patient_drug_sorted.xlsx : Drug response data (IC50 values hat gives the effectiveness of drugs on different cell lines) for 956 cancer cell lines and 265 drugs [2].
  • gdsc_patient_gene_sorted.xlsx : Gene expreesion data that consists of 232 genes with their interactions with 956 cell lines [2].
  • pubchem_drug_similarity_sorted.xlsx : Drug similarity data, based on the chemical structural similarity between compounds, is usually used to identify compounds sharing similar biological or chemical activity [3].
  • string_gene_interactions_sorted.csv : Gene interactions which includes physical and functional associations [4].
  • gdsc_chembl_gene_drug_inteaction_sorted.csv : Drug–target interaction data which includes interactions for 255 drugs [5].

Processed datasets are available in the DIVERSE/data director : after the pre-processing, we obtained 255 drugs, 956 cell lines and 232 genes. For the consistency between integrated data sources, all data sets were scaled to the range between [0,1].

Contact

Betül Güvenç Paltun, betul.guvenc@aalto.fi

Work was done in the Probabilistic Machine Learning research group at Aalto University.

Reference

[1] Brouwer, T. et al (2017) Bayesian Hybrid Matrix Factorisation for Data Integration, Proceedings of Machine Learning Research, 54, 557–566.

[2] Yang, W. et al (2012) Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells, Nucleic acids research, 41, D955–D961.

[3] Kim, S. et al (2015) PubChem substance and compound databases, Nucleic acids research, 44, D1202–D1213.

[4] Szklarczyk, D. et al (2010) The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored, Nucleic acids research, 39, D561–D568.

[5] Gaulton, A. et al (2017) The ChEMBL database in 2017, Bioinformatics45, D945– D954.

About

DIVERSE: Bayesian Data IntegratiVE learning for precise drug ResponSE prediction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages