EpiClust

Single cell module detection for ATAC-seq, multiomics, and peak-gene linking

To install

You may need to create a CONDA environment as specified in epiclust.yml.

After dependencies are met, this sequence of commands installs the library.

git clone git@github.com:KellisLab/epiclust.git
cd epiclust && pip install .

Preprocessing

Make sure your peaks are in an AnnData (H5AD) file.

Using a Gene accessiblity matrix (e.g. as estimated by epiclust_gene_estimation.py) helps aggregate nearby peaks better than by alone. This helps even with multiomic ATAC+GEX: use pseudo triple-omic (Peaks, Gene Accessiblity, Gene Expression) So, for combining these:

from sklearn.feature_extraction.text import TfidfTransformer
import anndata
import scanpy as sc
### Use TF-IDF (non-logTF) on peak matrix
peaks.X = TfidfTransformer().fit_transform(peaks.X)
### Normalize peak matrix to 10000 counts per cell to facilitate integration with gene accessibility
sc.pp.normalize_total(peaks, target_sum=10000)
### Normalize gene accessibility to 10000 counts per cell
sc.pp.normalize_total(gacc, target_sum=10000)
### Normalize gene expression (if multi-omic GEX+ATAC) to 10000 counts per cell
sc.pp.normalize_total(gexp, target_sum=10000)
### Concatenate modalities
adata = anndata.concat({
  "Peaks": peaks,
  "Gene Expression": gexp, ### if multi-omic. else omit
  "Gene Accessibility": gacc
}, axis=1, label="feature_types", merge="same")
### Rename accessibility genes if expression genes are already present
adata.var_names_make_unique("-GAcc-")
### Log-scale
sc.pp.log1p(adata)
### Calculate QC metrics
sc.pp.calculate_qc_metrics(adata, inplace=True, percent_top=[])
### PCA
sc.pp.pca(adata, n_comps=100)

multi-omic workflow (RECOMMENDED)

Note that the n_neighbors parameter can be changed, as n_neighbors is the per-batch number of neighbors

import epiclust as ec
import numpy as np
import seaborn as sns
for power in np.linspace(0, 1, 5):
	print("Calculating power:", power)
	ec.fit(adata, power=power, margin="log1p_total_counts", batch="feature_types")
	ec.neighbors(adata, n_neighbors=5)

graphs = ["pow_%.2f" % power for power in np.linspace(0, 1, 5)]
ec.filter_var(adata, graphs, min_cells=3)
ec.leiden(adata, graphs, resolution=1., max_comm_size=2500, min_comm_size=3)
ec.infomap(adata, graphs, min_comm_size=3)
ec.umap(adata, graphs)
ax = sns.scatterplot(x=adata.varm["X_umap"][:, 0], y=adata.varm["X_umap"][:, 1], hue=adata.var["leiden"])

Peak-Gene / Peak-Peak / Gene-Gene linking sample workflow

gtf = ec.gtf.load_gtf("gencode.gtf.gz")
dw = ec.gene_distance.distance_weight_all(ec.gene_distance.peak_names_to_var(adata.var.index.values), gtf)
dw = dw.loc[dw["gene"].isin(adata.var.index.values), :]
links = {}
for power in np.linspace(0, 1, 5):
	print("Calculating power:", power)
	ec.fit(adata, power=power, margin="log1p_total_counts", batch="feature_types",
	       squared_correlation=True,
               covariates=["batch", "pmi", "log1p_total_counts"]) ### technical covariates in .obs
	links[power] = ec.linking(adata, dw["gene"].values, dw["peak"].values)

### TODO combine table

single-omic workflow (DEPRECATED except for RNA-seq)

import epiclust as ec
for power in np.linspace(0, 1, 5):
	print("Calculating power:", power)
	ec.fit(adata, power=power, margin="log1p_total_counts")
	ec.neighbors(adata, key_added="pow_%.2f" % power)

ec.filter_var(adata, ["pow_%.2f" % power for power in np.linspace(0, 1, 5)], min_cells=3)
ec.leiden(adata, ["pow_%.2f" % power for power in np.linspace(0, 1, 5)], resolution=1., max_comm_size=2500, min_comm_size=3)
ec.infomap(adata, ["pow_%.2f" % power for power in np.linspace(0, 1, 5)], min_comm_size=3)

wishlist

typing
documentation
translate to muon (multi-omic) and anndata (single-omic) framework
.umap() without filtering giant matrix
cleanup function that removes varm, uns ???
batched gene_estimation for lower memory footprint
iterative expansion to rest of features
centrality-based module naming
pygreat integration?

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
.github/workflows		.github/workflows
epiclust		epiclust
scripts		scripts
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
README.org		README.org
epiclust.yml		epiclust.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EpiClust

To install

Preprocessing

multi-omic workflow (RECOMMENDED)

Peak-Gene / Peak-Peak / Gene-Gene linking sample workflow

single-omic workflow (DEPRECATED except for RNA-seq)

wishlist

About

Releases

Packages

Languages

License

KellisLab/epiclust

Folders and files

Latest commit

History

Repository files navigation

EpiClust

To install

Preprocessing

multi-omic workflow (RECOMMENDED)

Peak-Gene / Peak-Peak / Gene-Gene linking sample workflow

single-omic workflow (DEPRECATED except for RNA-seq)

wishlist

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages