# Single Cell Multiomics

Modalities:
- scRNA (Cells x Genes)
- scATAC (Cells x Peaks)

Axes:
- Cells
- Genes
- Peaks

## Install Packages

```bash
# Create environment
conda create -n {YOUR ENVIRONMENT NAME} "conda-forge::python>=3.9,<3.12"
conda activate {YOUR ENVIRONMENT NAME}

# Install GmGM
pip install gmgm

# Install this example notebook's dependencies
pip install scikit-misc
conda install conda-forge::scanpy
conda install conda-forge::muon
conda install conda-forge::leidenalg
```

## Download Data

From [10x Genomics](https://www.10xgenomics.com/resources/datasets/10-k-human-pbm-cs-multiome-v-1-0-chromium-x-1-standard-2-0-0), download the "Filtered feature barcode matrix (HDF5)" from the "10k Human PBMCs, Multiome v1.0, Chromium X" dataset (the one with the ID `10k_PBMC_Multiome_nextgem_Chromium_X`, [linked here](https://www.10xgenomics.com/resources/datasets/10-k-human-pbm-cs-multiome-v-1-0-chromium-x-1-standard-2-0-0)).

## Setup Code

In [1]:
# ============ Parameters ============
# Number of edges to have in our output graphs (GmGM and KNN)
N_EDGES = 10
# Random state for reproducibility
RANDOM_STATE = 0
# Number of principal components to use when using PCA or approximate GmGM
N_COMPONENTS = 20

In [2]:
# Ignore nonsense warnings from scanpy
import warnings
warnings.filterwarnings("ignore", module="scanpy")
warnings.filterwarnings("ignore", module="umap")
warnings.filterwarnings("ignore", module="tqdm")

In [3]:
# Import GmGM
from GmGM import GmGM, Dataset

# `scanpy` is not a dependency of our algorithm, but we will use it for this example
import scanpy as sc
import muon as mu

# Other dependencies, already installed by either GmGM or scanpy
import anndata as ad
import mudata as md
import seaborn as sns
import dask.array as da

## Analyze

In [4]:
mdata: md.MuData = \
    mu.read_10x_h5("./data/10k_PBMC_Multiome_nextgem_Chromium_X_filtered_feature_bc_matrix.h5")
mdata

  utils.warn_names_duplicates("var")


Added `interval` annotation for features from ./data/10k_PBMC_Multiome_nextgem_Chromium_X_filtered_feature_bc_matrix.h5


  utils.warn_names_duplicates("var")


In [5]:
sc.pp.log1p(mdata["rna"])
sc.pp.log1p(mdata["atac"])
sc.pp.highly_variable_genes(mdata["rna"])
sc.pp.highly_variable_genes(mdata["atac"])
mdata.update()
mdata



In [6]:
# This dataset is very large so we will limit to highly variable genes and peaks
# This takes about a minute.
GmGM(
    mdata,
    centering_method="avg-overall",
    to_keep={
        "obs": N_EDGES,
        "rna-var": N_EDGES,
        "atac-var": N_EDGES,
    },
    threshold_method="rowwise-col-weighted",
    verbose=True,
    n_comps=N_COMPONENTS,
    use_highly_variable=True
)

Centering...




Calculating eigenvectors...
	by calculating left eigenvectors of concatenated matricizations...
Calculating eigenvalues...
@0: 29743623.02099571 (-549.4790042898392 + 29744172.5 + 0) ∆inf
Converged! (@5: 29743623.02099571)
Recomposing sparse precisions...
Converting back to MuData...
z
z


