Skip to content

brmprnk/jointomicscomp

Repository files navigation

jointomicscomp

Python Version Conda Install MIT License

Wrapper and implementation for comparing models for multiomics data integration; Check out our paper "An in-depth comparison of linear and non-linear joint embedding methods for bulk and single-cell multi-omics" here: https://doi.org/10.1101/2023.04.10.535672

MOST RECENT UPDATES ARE FOUND IN THE generic-impl BRANCH!

Installation

Recommended installation uses a new Anaconda environment. To ease the process, this project includes an environment file. This can be plugged into Anaconda following this short tutorial.

Then to run the project, simply call the run.py wrapper with the desired config file like so:

python run.py -c configs/geme.yaml

This default call to run.py will run all implemented models. For specific models, use one or combine the -poe -moe -mofa -mvib -cgae flags. To see additional arguments, use argparse's built in -h command:

python run.py -h
  • The experiment name is set in the config file, or in the command line using -e experiment_name
  • All results and model outputs will be stored in the project's results directory
  • All metrics are written to a TensorBoard file, also found in the results folder
  • All informational print statements are saved to a log.txt file, using a logger found in the util folder.

General Status

  • For every model, the Z is saved to the results folder, together with a UMAP representation.
  • Imputation is done right after model training. The code for the imputation file is therefore found in the main file of each model.
  • All options for data/model specific features can be set in a config file found in the configs folder.

Task 1 Imputation:

  • MOFA+, PoE, CGAE, Baseline have this up and running
  • MoE is under construction to use other logic (see below)
  • MVIB is not suitable for imputation.

Task 2 Survival Time Prediction:

  • A file exists in the data_preprocessing folder that creates splits of the data to use in this task.
  • Implementation will be based on the Momix implementation, using the R survival package.
  • Currently under construction
  • There are also files left from cancer stage classification in the CGAE and MVAE folders.

Implementation details

Multi-Omics Factor Analysis V2 (MOFA+)

The implementation of MOFA+ is provided by MOFA+. The code allows for the model to be trained in pure Python. This model is saved to a .hdf5 file in the appropriate directory. Then the model's W and Z matrices (see documentation) have to be fetched using R. The Python library r2py takes care of that, but in the case of issues there is a MOFA_downstream.R file attached. Some notes:

  • The model is trained on the training set + validation set
  • Using the Moore-Penrose inverse (pseudoinverse) of the W matrix, we can multiply this with new data (Y) from a test set to get a corresponding Z for that test set.
  • We can do this for both omics and then impute back from their respective Z's to the other omic's Y matrix.

Product-of-Experts

The MVAE code was originally adapted from the Product-of-Experts MVAE as developed by Wu and Goodman.

  • Their implementation has remained mostly intact. Their Product-of-Experts function in model.py and their test/training methods for example.
  • The VAE architecture was changed to a more standard Vanilla-VAE architecture, based on the Pytorch-VAE.
  • The loss function in train.py was rewritten to also work more like the Pytorch-VAE. Their loss function uses Binary cross entropy.
  • Currently, this library was extended to also use a Mixture-of-Experts approach. Using all the same code but the actual combining of Gaussians. BEWARE: this code was written by what I thought was correct, but is not fully backed by a specific paper.
  • To use Mixture-of-Experts, see next section.

Mixture-of-Experts (UNDER CONSTRUCTION)

Instead of writing an in-house implementation with chance of scrutiny, this approach will be adapted from MMVAE. Currently, there is some work done on reusing their logic in the MVAE model.py file. It is not yet in finalized state.

MVIB

Based on the following paper.

CGAE

Some to-do's are listed in the Drive doc, concerning implementation of the MultiOmicsVAE in the nets.py file. The CGAE model is inspired by this paper.

File Structure

.
├── LICENSE
├── README.md
├── .gitignore
├── configs
│       └── geme.yaml
│       └── gegcn.yaml
│       └── gcnme.yaml
│       └── brca2_gegcn.yaml
│       └── etc.
├── data/
├── environment.yml
├── results/
├── 
├── run.py
└── src
    ├── baseline/
    ├── CGAE/
    ├── data_preprocessing/
    ├── MOFA2/
    ├── MVAE/
    ├── MVIB/
    ├── util/
    ├── nets.py
    └── survival.py

Authors

- Stavros Makrodimitris         S.Makrodimitris@tudelft.nl
- Tamim Abdelaal                T.R.M.Abdelaal-1@tudelft.nl
- Bram Pronk                    I.B.Pronk@student.tudelft.nl
- Marcel Reinders               M.J.T.reinders@tudelft.nl

Citations

  • Argelaguet, R. and Velten, B. and Arnol, D. and Dietrich, S. and Zenz, T. and Marioni, J. C. and Buettner, F. and Huber, W. and Stegle, O., Multi-Omics Factor Analysis-a framework for unsupervised integration of multi-omics data sets, Mol Syst Biol, 14.6, 2018.
  • Argelaguet, R. and Arnol, D. and Bredikhin, D. and Deloro, Y. and Velten, B. and Marioni, J.C. and Stegle, O.}, MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data, Genome Biology, 21.1, pp. 111, 2020.
  • Section will be expanded on in the future.

About

Wrapper and implementation for comparing five models for multiomics data integration; MOFA+, Mixture-of-Experts MVAE, Product-of-Experts MVAE, Multi-view Info Bottleneck, and CGAE models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors