Multimodal Biomedical Data Fusion Using Sparse Canonical Correlation Analysis and Cooperative Learning: A Cohort Study on COVID-19

Ahmet Gorkem Er^1,2,3, Daisy Yi Ding⁴, Berrin Er⁵, Mertcan Uzun³, Mehmet Cakmak⁶, Christoph Sadee¹, Gamze Durhan⁷, Mustafa Nasuh Ozmen⁷, Mine Durusu Tanriover⁶, Arzu Topeli⁵, Yesim Aydin Son², Robert Tibshirani^4,8, Serhat Unal³, Olivier Gevaert^1,4

Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Türkiye
Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
Department of Internal Medicine, Division of Intensive Care Medicine, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
Department of Internal Medicine, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
Department of Radiology, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
Department of Statistics, Stanford University, Stanford, CA, 94305, USA

Corresponding authors:

Ahmet Gorkem Er E-mail: ahmetgorkemer@gmail.com

Olivier Gevaert E-mail: ogevaert@stanford.edu

Abstract

Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients: ICU admission. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (𝑐𝑜𝑟𝑟(𝑋u𝟏, Zv𝟏) = 0.596, p-value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
example_data		example_data
Canonical_Correlation_Analysis.R		Canonical_Correlation_Analysis.R
Cooperative_Learning.R		Cooperative_Learning.R
LICENSE		LICENSE
README.md		README.md
Viral_Word2Vec_Model.py		Viral_Word2Vec_Model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

example_data

example_data

Canonical_Correlation_Analysis.R

Canonical_Correlation_Analysis.R

Cooperative_Learning.R

Cooperative_Learning.R

LICENSE

LICENSE

README.md

README.md

Viral_Word2Vec_Model.py

Viral_Word2Vec_Model.py

Repository files navigation

Multimodal Biomedical Data Fusion Using Sparse Canonical Correlation Analysis and Cooperative Learning: A Cohort Study on COVID-19

Abstract

About

Releases

Packages

Languages

License

ahmetgorkemer/multimodal_covid19_study

Folders and files

Latest commit

History

Repository files navigation

Multimodal Biomedical Data Fusion Using Sparse Canonical Correlation Analysis and Cooperative Learning: A Cohort Study on COVID-19

Abstract

About

Resources

License

Stars

Watchers

Forks

Languages