**The following tutorial demonstrates how to use EpiCarousel for identifying metacells in  a single-cell chromatin accessibility dataset of human bone marrow mononuclear cells [(Luecken, et al., 2021)](https://openreview.net/forum?id=gN35BGa1Rt).**




Import EpiCarousel.

In [1]:
import epicarousel

Set a random seed to ensure reproducibility.

In [2]:
epicarousel.core.setup_seed(1)

In [3]:
# User specified parameters

## Core parameters
result_base = '/home/metacell/data/metacell/EpiCarousel/example'
data_name = 'BMMC'
data_dir = '/home/metacell/data/metacell/source_data/source_data_shuffled_seed_1/%s_shuffled.h5ad'%data_name
if_bi = 1
if_mc_bi = 1
threshold = 0
filter_rate = 0.01
chunk_size = 10000
carousel_resolution = 10
step = 20
threads=8
mc_mode = 'average'
index = 'cell_type'
carousel = epicarousel.core.Carousel(data_name=data_name,
                                     data_dir=data_dir,
                                     if_bi=if_bi,
                                     if_mc_bi=if_mc_bi,
                                     threshold=threshold,
                                     filter_rate=filter_rate,
                                     chunk_size=chunk_size,
                                     carousel_resolution=carousel_resolution,
                                     base=result_base,
                                     step=step,
                                     threads=threads,
                                     mc_mode=mc_mode,
                                     index=index
                                    )

In [4]:
# Create output directories.
carousel.make_dirs()

Project output directory: /home/metacell/data/metacell/EpiCarousel/example/EpiCarousel_export


In [5]:
# Partition the initial count matrix 
carousel.data_split()

Start spliting data!


7it [00:10,  1.47s/it]

Finish spliting data.





In [6]:
# Identify metacells.
carousel.identify_metacells()

/prog/cabins/metacell/metacell/CAROUSEL/reproduction/EpiCarousel/EpiCarousel_modify_pca_float32_int8
/home/metacell/.cache/Python-Eggs/epicarousel-0.0.1-py3.10.egg-tmp/epicarousel
/home/metacell/.cache/Python-Eggs/epicarousel-0.0.1-py3.10.egg-tmp/epicarousel
Start 1!
Start 2!
Start 3!
Start 4!
Start 5!
Start 6!
Start 7!
AnnData object with n_obs × n_vars = 10000 × 66330
    var: 'n_cells'
    uns: 'pca', 'neighbors'
    obsm: 'X_pca'
    varm: 'PCs'
    obsp: 'distances', 'connectivities'
Finish 5!
AnnData object with n_obs × n_vars = 10000 × 65143
    var: 'n_cells'
    uns: 'pca', 'neighbors'
    obsm: 'X_pca'
    varm: 'PCs'
    obsp: 'distances', 'connectivities'
Finish 3!
AnnData object with n_obs × n_vars = 9249 × 65342
    var: 'n_cells'
    uns: 'pca', 'neighbors'
    obsm: 'X_pca'
    varm: 'PCs'
    obsp: 'distances', 'connectivities'
Finish 7!
AnnData object with n_obs × n_vars = 10000 × 65467
    var: 'n_cells'
    uns: 'pca', 'neighbors'
    obsm: 'X_pca'
    varm: 'PCs'
 

In [7]:
# Aggregate metcells from each chunk.
carousel.merge_metacells()

Start aggregating metacells.


100%|█████████████████████████████████████████████████████████████| 6/6 [00:02<00:00,  2.69it/s]


Finish aggregating metacells.


... storing 'cells' as categorical


In [8]:
# Preprocess metacell-by-region matrix.
carousel.metacell_preprocess()

Start metacell preprocessing.
AnnData object with n_obs × n_vars = 6924 × 92743
    obs: 'which_fold', 'cells'
    var: 'feature_types', 'name', 'n_cells'
    uns: 'pca', 'neighbors'
    obsm: 'X_pca'
    varm: 'PCs'
    obsp: 'distances', 'connectivities'
Finish metacell preprocessing.


In [9]:
# Cluster using Dleiden, Dlouvain, Cleiden and Clouvain clustering strategies.
carousel.metacell_data_clustering()

Start metacell clustering.
      GEX_pct_counts_mt GEX_n_counts GEX_n_genes GEX_size_factors GEX_phase  \
0              0.431034       1624.0        1114         0.699399         S   
1                   0.0       1520.0         987         0.397154       G2M   
2              0.236726       2957.0        1477         0.831015       G2M   
3                   0.0       1252.0         882         0.651727       G2M   
4                   0.0        733.0         576         0.214664       G2M   
...                 ...          ...         ...              ...       ...   
69244          0.336323       1784.0        1252         1.438631         S   
69245          0.120096       2498.0        1646         0.807992       G2M   
69246          0.217297       2301.0        1384         0.636045       G2M   
69247          0.230017       1739.0        1260         1.510654         S   
69248          1.823708        658.0         518         0.220219         S   

      ATAC_nCount_peaks 

In [10]:
carousel.result_comparison()

AMI:0.690	ARI:0.564	NMI:0.690	Homo:0.693	CS:0.687	V-measure:0.690	FMI:0.602	
AMI:0.666	ARI:0.486	NMI:0.666	Homo:0.711	CS:0.627	V-measure:0.666	FMI:0.532	
AMI:0.687	ARI:0.521	NMI:0.687	Homo:0.700	CS:0.675	V-measure:0.687	FMI:0.562	
AMI:0.676	ARI:0.473	NMI:0.677	Homo:0.694	CS:0.660	V-measure:0.677	FMI:0.519	


... storing 'mc_celltype' as categorical


   AMI_Dlouvain  ARI_Dlouvain  NMI_Dlouvain  Homo_Dlouvain  CS_Dlouvain  \
0      0.689653      0.564017      0.689989       0.693449     0.686564   

   Vms_Dlouvain  FMI_Dlouvain  AMI_Dleiden  ARI_Dleiden  NMI_Dleiden  ...  \
0      0.689989      0.602216     0.665817     0.485746     0.666321  ...   

   ARI_Cleiden  NMI_Cleiden  Homo_Cleiden  CS_Cleiden  Vms_Cleiden  \
0     0.473451     0.676535      0.693656    0.660239     0.676535   

   FMI_Cleiden  origin_sparsity  carousel_sparsity  C-o_sparsity  C/o_sparsity  
0     0.518872         0.030786           0.162832      0.132047       5.28923  

[1 rows x 32 columns]


Unnamed: 0,AMI_Dlouvain,ARI_Dlouvain,NMI_Dlouvain,Homo_Dlouvain,CS_Dlouvain,Vms_Dlouvain,FMI_Dlouvain,AMI_Dleiden,ARI_Dleiden,NMI_Dleiden,...,ARI_Cleiden,NMI_Cleiden,Homo_Cleiden,CS_Cleiden,Vms_Cleiden,FMI_Cleiden,origin_sparsity,carousel_sparsity,C-o_sparsity,C/o_sparsity
0,0.689653,0.564017,0.689989,0.693449,0.686564,0.689989,0.602216,0.665817,0.485746,0.666321,...,0.473451,0.676535,0.693656,0.660239,0.676535,0.518872,0.030786,0.162832,0.132047,5.28923


In [11]:
# Removing intermediate files.
carousel.delete_dirs()

**Metacells identified by EpiCarousel can be seamlessly integrated into widely used scCAS data analysis workflow using**

In [12]:
carousel.mc_adata

AnnData object with n_obs × n_vars = 6924 × 92743
    obs: 'which_fold', 'cells', 'Dleiden', 'Dlouvain', 'leiden', 'Cleiden', 'louvain', 'Clouvain', 'purity', 'mc_celltype'
    var: 'feature_types', 'name', 'n_cells'
    uns: 'pca', 'neighbors', 'leiden', 'louvain'
    obsm: 'X_pca'
    varm: 'PCs'
    obsp: 'distances', 'connectivities'