jointomicscomp

Wrapper and implementation for comparing models for multiomics data integration; Check out our paper "An in-depth comparison of linear and non-linear joint embedding methods for bulk and single-cell multi-omics" here: https://doi.org/10.1101/2023.04.10.535672

MOST RECENT UPDATES ARE FOUND IN THE generic-impl BRANCH!

Installation

Recommended installation uses a new Anaconda environment. To ease the process, this project includes an environment file. This can be plugged into Anaconda following this short tutorial.

Then to run the project, simply call the run.py wrapper with the desired config file like so:

python run.py -c configs/geme.yaml

This default call to run.py will run all implemented models. For specific models, use one or combine the -poe -moe -mofa -mvib -cgae flags. To see additional arguments, use argparse's built in -h command:

python run.py -h

The experiment name is set in the config file, or in the command line using -e experiment_name
All results and model outputs will be stored in the project's results directory
All metrics are written to a TensorBoard file, also found in the results folder
All informational print statements are saved to a log.txt file, using a logger found in the util folder.

General Status

For every model, the Z is saved to the results folder, together with a UMAP representation.
Imputation is done right after model training. The code for the imputation file is therefore found in the main file of each model.
All options for data/model specific features can be set in a config file found in the configs folder.

Task 1 Imputation:

MOFA+, PoE, CGAE, Baseline have this up and running
MoE is under construction to use other logic (see below)
MVIB is not suitable for imputation.

Task 2 Survival Time Prediction:

A file exists in the data_preprocessing folder that creates splits of the data to use in this task.
Implementation will be based on the Momix implementation, using the R survival package.
Currently under construction
There are also files left from cancer stage classification in the CGAE and MVAE folders.

Implementation details

Multi-Omics Factor Analysis V2 (MOFA+)

The implementation of MOFA+ is provided by MOFA+. The code allows for the model to be trained in pure Python. This model is saved to a .hdf5 file in the appropriate directory. Then the model's W and Z matrices (see documentation) have to be fetched using R. The Python library r2py takes care of that, but in the case of issues there is a MOFA_downstream.R file attached. Some notes:

The model is trained on the training set + validation set
Using the Moore-Penrose inverse (pseudoinverse) of the W matrix, we can multiply this with new data (Y) from a test set to get a corresponding Z for that test set.
We can do this for both omics and then impute back from their respective Z's to the other omic's Y matrix.

Product-of-Experts

The MVAE code was originally adapted from the Product-of-Experts MVAE as developed by Wu and Goodman.

Their implementation has remained mostly intact. Their Product-of-Experts function in model.py and their test/training methods for example.
The VAE architecture was changed to a more standard Vanilla-VAE architecture, based on the Pytorch-VAE.
The loss function in train.py was rewritten to also work more like the Pytorch-VAE. Their loss function uses Binary cross entropy.
Currently, this library was extended to also use a Mixture-of-Experts approach. Using all the same code but the actual combining of Gaussians. BEWARE: this code was written by what I thought was correct, but is not fully backed by a specific paper.
To use Mixture-of-Experts, see next section.

Mixture-of-Experts (UNDER CONSTRUCTION)

Instead of writing an in-house implementation with chance of scrutiny, this approach will be adapted from MMVAE. Currently, there is some work done on reusing their logic in the MVAE model.py file. It is not yet in finalized state.

MVIB

Based on the following paper.

CGAE

Some to-do's are listed in the Drive doc, concerning implementation of the MultiOmicsVAE in the nets.py file. The CGAE model is inspired by this paper.

File Structure

.
├── LICENSE
├── README.md
├── .gitignore
├── configs
│       └── geme.yaml
│       └── gegcn.yaml
│       └── gcnme.yaml
│       └── brca2_gegcn.yaml
│       └── etc.
├── data/
├── environment.yml
├── results/
├── 
├── run.py
└── src
    ├── baseline/
    ├── CGAE/
    ├── data_preprocessing/
    ├── MOFA2/
    ├── MVAE/
    ├── MVIB/
    ├── util/
    ├── nets.py
    └── survival.py

Authors

- Stavros Makrodimitris         S.Makrodimitris@tudelft.nl
- Tamim Abdelaal                T.R.M.Abdelaal-1@tudelft.nl
- Bram Pronk                    I.B.Pronk@student.tudelft.nl
- Marcel Reinders               M.J.T.reinders@tudelft.nl

Citations

Argelaguet, R. and Velten, B. and Arnol, D. and Dietrich, S. and Zenz, T. and Marioni, J. C. and Buettner, F. and Huber, W. and Stegle, O., Multi-Omics Factor Analysis-a framework for unsupervised integration of multi-omics data sets, Mol Syst Biol, 14.6, 2018.
Argelaguet, R. and Arnol, D. and Bredikhin, D. and Deloro, Y. and Velten, B. and Marioni, J.C. and Stegle, O.}, MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data, Genome Biology, 21.1, pp. 111, 2020.
Section will be expanded on in the future.

Name		Name	Last commit message	Last commit date
Latest commit History 329 Commits
configs		configs
src		src
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
cogs.ipynb		cogs.ipynb
cox.sh		cox.sh
cox_os.txt		cox_os.txt
cox_pfi.txt		cox_pfi.txt
environment.yml		environment.yml
evaluation.sh		evaluation.sh
ll_newmarkers.ipynb		ll_newmarkers.ipynb
load-citeseq-with-markers.py		load-citeseq-with-markers.py
mi.sh		mi.sh
run.py		run.py
submit.sh		submit.sh
timing.sh		timing.sh
train_joint.sbatch		train_joint.sbatch
train_joint.sh		train_joint.sh
train_typeclf.sh		train_typeclf.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

jointomicscomp

Installation

General Status

Implementation details

Multi-Omics Factor Analysis V2 (MOFA+)

Product-of-Experts

Mixture-of-Experts (UNDER CONSTRUCTION)

MVIB

CGAE

File Structure

Authors

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

jointomicscomp

Installation

General Status

Implementation details

Multi-Omics Factor Analysis V2 (MOFA+)

Product-of-Experts

Mixture-of-Experts (UNDER CONSTRUCTION)

MVIB

CGAE

File Structure

Authors

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages