Skip to content

MLO-lab/MOMO-GP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MOMO-GP

A Multi-Omic Multi-output Gaussian Processes for Integration of Multi Omics Data:

  • which learns the nonlinear structure of data by combining the neural network layer with the Gaussian Process layer,
  • in the single-view version, it learns separate latent representations for both cells and genes, and
  • in the multi-view version, it learns a shared representation of cells and separate representations of features for each view in an interpretable manner.

Basic usage

The Running_MOGP.ipynb file is the main entry point for loading the data and performing the inference. This file is located in ./experiments/CITEseq/RNA folder. In this file, you can see the cell and gene embedding of the MOGP on RNA-seq data of sampled CITE-seq dataset. Running MOGP on the sampled data takes about 1 hour for 200 iterations.

Then to see the Gene Relevance Map results, you have to run Running_GeneRelevanceMAP.ipynb script, located in ./experiments/CITEseq/RNA folder.

Installation

We suggest using conda to manage your environments. Follow these steps to get MOGP up and running!

  1. Create a python environment in conda:
conda env create -f environment_MOGP.yml
  1. Activate freshly created environment:
source activate MOGP-GPFLUX
  1. Create a python environment in conda:
conda env create -f seaCell.yml
  1. Activate freshly created environment:
source activate seaCell

Citation

This paper is under review

Results on the paper

All figures presented in this paper are available in the ./experiments folder for both the PBMC and CITE-seq datasets.

  • PBMC 10k Dataset We utilized the PBMC 10k dataset from 10x Genomics, which includes paired single-cell multiome ATAC and gene expression sequencing. This dataset comprises:

11,909 cells 36,601 genes 134,726 peaks. The dataset can be accessed from the following link: PBMC 10k.

  • 5k PBMC CITE-seq Dataset We also utilized the 5k PBMC CITE-seq dataset, which provides transcriptome-wide measurements for single cells, including gene expression data and surface protein levels for several dozen proteins. This dataset includes:

5,247 cells 33,538 genes 32 proteins. The dataset is available from CITE-seq.

Details about our preprocessing of these datasets can be found in the ./data folder.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published