Comparing different multi-omics data integration tools

Overview

Several methods for multi-omics data integration exist, however, choosing the best method for a given dataset is still a challenge. In a recently available paper, Cantini et al (2021)¹ tried to benchmark nine different data integration methods on TCGA cancer data and found that the methods Multi-omics factor analysis (MOFA)², Multiple co-inertia analysis (MCIA)³, Joint and Individual Variation Explained (JIVE)⁴ and Regularized Generalized Canonical Correlation Analysis (RGCCA)⁵ performed consistently better than other five methods in finding factors associated with corresponding clinical/biological annotations. However, this benchmark required matching of samples across different omics datasets, did not account for their performance on discrete data and did not account for the variations introduced by different imputation methods. Therefore, the aim of this project will be to benchmark the first three multi-omics data-integration methods on the chronic lymphocytic leukaemia (CLL) dataset from Dietrich et al (2018)⁶ which provides an opportunity to account for these shortcomings.

Dataset

The CLL dataset taken from Dietrich et al (2018) contains the following information from 200 patients:

somatic mutations (69 x 200)
RNA expression data (5000 x 136)
DNA methylation (4248 x 196)
ex vivo drug response (310 x 184)

Benchmarks

For benchmarking purposes, different methods will be assessed on their ability to

identify IGHV status and trisomy of chromosome 12
selectively identify clinical annotations
identify biologically meaningful pathways
selectively identify biological pathways
predict time to next treatment and overall survival

See references 1 and 2 for more details. The corresponding code could be found at momix-notebook and MOFA v2.

Take-away

At the end of the course the participants will:

get hands-on experience in using three different data-integration methods
learn advantages and limitations of different methods
get an intuition of which method to apply for which kind of dataset
get hands-on experience with handling missing values

Pre-requisites

Working knowledge of R, basic understanding of maths/statistics and familiarity with gene-set enrichment analysis and survival analysis required.

Schedule

Day2
15:00-16:00: [1:00] Session 2.1 - Introduction to the methods
16:00-16:15: [0:15] Break
16:15-18:00: [1:45] Session 2.2 - Data exploration and training the models

Day3
15:30-16:15: [0:45] Session 3.1 - Benchmarking on association with clinical annotations
16:15-16:30: [0:15] Break
16:30-17:30: [1:00] Session 3.2 - Benchmarking on association with biological annotations

Day4
16:15-17:00: [0:45] Session 4.1 - Benchmarking on prediction of survival outcomes
17:00-17:15: [0:15] Break
17:15-18:00: [0:45] Session 4.2 - Preparing presentation (advantages and limitations of these tools)\

References

Cantini, L., Zakeri, P., Hernandez, C. et al. Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer. Nat Commun 12, 124 (2021). https://doi.org/10.1038/s41467-020-20430-7
Argelaguet, Ricard, et al. "Multi‐Omics Factor Analysis—a framework for unsupervised integration of multi‐omics data sets." Molecular systems biology 14.6 (2018).
Bady, Pierre, et al. "Multiple co-inertia analysis: a tool for assessing synchrony in the temporal variability of aquatic communities." Comptes rendus biologies 327.1 (2004): 29-36.
Lock, Eric F., et al. "Joint and individual variation explained (JIVE) for integrated analysis of multiple data types." The annals of applied statistics 7.1 (2013): 523.
Tenenhaus, Arthur, and Michel Tenenhaus. "Regularized generalized canonical correlation analysis." Psychometrika76.2 (2011): 257.
Dietrich, Sascha, et al. "Drug-perturbation-based stratification of blood cancer." The Journal of clinical investigation 128.1 (2018): 427-445.

Recommended watching

If you prefer watching videos over reading papers.

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
.ipynb_checkpoints		.ipynb_checkpoints
OtherCoolPapersFoundByPreviousParticipants		OtherCoolPapersFoundByPreviousParticipants
PreviousGroupPresentations		PreviousGroupPresentations
data		data
image		image
results		results
scripts		scripts
.DS_Store		.DS_Store
Intro.Rmd		Intro.Rmd
Intro.html		Intro.html
IntroToTheMethods.Rmd		IntroToTheMethods.Rmd
IntroToTheMethods.html		IntroToTheMethods.html
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparing different multi-omics data integration tools

Overview

Dataset

Benchmarks

Take-away

Pre-requisites

Schedule

References

Recommended watching

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Comparing different multi-omics data integration tools

Overview

Dataset

Benchmarks

Take-away

Pre-requisites

Schedule

References

Recommended watching

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages