A Multi-Omic Multi-output Gaussian Processes for Integration of Multi Omics Data:
- which learns the nonlinear structure of data by combining the neural network layer with the Gaussian Process layer,
- in the single-view version, it learns separate latent representations for both cells and genes, and
- in the multi-view version, it learns a shared representation of cells and separate representations of features for each view in an interpretable manner.
The Running_MOGP.ipynb
file is the main entry point for loading the data and performing the inference.
This file is located in ./experiments/CITEseq/RNA
folder.
In this file, you can see the cell and gene embedding of the MOGP on RNA-seq data of sampled CITE-seq dataset. Running MOGP on the sampled data takes about 1 hour for 200 iterations.
Then to see the Gene Relevance Map results, you have to run Running_GeneRelevanceMAP.ipynb
script, located in ./experiments/CITEseq/RNA
folder.
We suggest using conda to manage your environments. Follow these steps to get MOGP
up and running!
- Create a python environment in
conda
:
conda env create -f environment_MOGP.yml
- Activate freshly created environment:
source activate MOGP-GPFLUX
- Create a python environment in
conda
:
conda env create -f seaCell.yml
- Activate freshly created environment:
source activate seaCell
This paper is under review
All figures presented in this paper are available in the ./experiments
folder for both the PBMC and CITE-seq datasets.
- PBMC 10k Dataset We utilized the PBMC 10k dataset from 10x Genomics, which includes paired single-cell multiome ATAC and gene expression sequencing. This dataset comprises:
11,909 cells 36,601 genes 134,726 peaks. The dataset can be accessed from the following link: PBMC 10k.
- 5k PBMC CITE-seq Dataset We also utilized the 5k PBMC CITE-seq dataset, which provides transcriptome-wide measurements for single cells, including gene expression data and surface protein levels for several dozen proteins. This dataset includes:
5,247 cells 33,538 genes 32 proteins. The dataset is available from CITE-seq.
Details about our preprocessing of these datasets can be found in the ./data
folder.