Authors: Ted Verhey, Sorana Morrissy
Contributors: Hyojin Song, Aaron Gillmor, Gurveer Gill, Courtney Hall
mosaicMPI is a Python package enabling mosaic integration of bulk, single-cell, and spatial expression data through program-level integration. Programs are first discovered using consensus non-negative matrix factorization (cNMF) across multiple-ranks and then integrated using a flexible network-based approach to group similar programs together across resolutions and datasets. Program communities are then interpreted using sample/cell metadata and gene set analyses. Integrative program communities enable metadata transfer across datasets.
Here are just a few of the things that mosaicMPI does well:
- Identifies interpretable, non-negative programs at multiple resolutions
- Mosaic integration does not require subsetting features/genes to a shared or overdispersed subset
- Multi-omics integration without shared sample IDs
- Ideal for incremental integration (adding datasets one at a time) since deconvolution is performed independently on each dataset
- Integration performs well even when the datasets have mismatched features (eg. Microarray, RNA-Seq, Proteomics) or sparsity (eg single-cell vs bulk RNA-Seq and ATAC-Seq)
- Metadata transfer across datasets
- Command-line interface for rapid data exploration and python interface for extensibility and flexibility
- Compatible with and tested on OS X, Windows and Linux systems
- Memory usage depends on size and number of datasets
Install the package with conda
:
# create an environment called mosaic and install
conda create -n mosaic -c conda-forge mosaicmpi
conda activate mosaic
For ssGSEA analysis, you will also need to install GSEApy into the same environment.
# if you have conda (MacOS_x86-64 and Linux only)
conda install -c bioconda gseapy
# Windows and MacOS_ARM64 (M1/2-Chip)
pip install gseapy
Read the documentation.
For errors arising during use of mosaicMPI, create and browse issues in the GitHub "issues" tab.