**The following tutorial demonstrates how to use ASTER for estimating the number of cell types in a single-cell chromatin accessibility dataset of human hematopoietic cells ([Buenrostro, et al., 2018](https://doi.org/10.1016/j.cell.2018.03.074)).**

Import ASTER and the dependencies.

In [1]:
import epiaster as aster
import scanpy as sc
import pandas as pd
import numpy as np

Load the AnnData object of `n_obs` × `n_vars` count matrix (rows correspond to cells and columns to genes), the true number of cell types, and the list of optional numbers of cell types for estimation.

In [2]:
data_name = 'BoneMarrow'
adata = sc.read('./%s.h5ad'%data_name)
k_search = pd.read_csv('./%s_search.csv'%data_name, header=None).iloc[0,:].values
true_k = k_search[0]
search_list = list(k_search[1:])

Show the cell counts of various cell types.

In [3]:
adata.obs.cell_type.value_counts()

CMP     502
GMP     402
HSC     347
LMPP    160
MPP     142
pDC     141
MEP     138
CLP      78
mono     64
UNK      60
Name: cell_type, dtype: int64

Estimate the number of cell types using ASTER.

In [4]:
estimated_k = aster.estimate_k(adata, search_list)

Raw dataset shape:  (2034, 430107)
Dataset shape after preprocessing:  (2034, 100934)
Estimating by sum of squared distances...


100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:56<00:00,  2.97s/it]


Estimating by Davies-Bouldin score...


100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:37<00:00,  1.96s/it]


Estimating by silhouette coefficient...


100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:29<00:00,  1.54s/it]


Evaluate the estimation performance.

In [5]:
est_error = estimated_k - true_k
est_deviation = est_error / true_k
print({'data_name':data_name, 'true_k':true_k, 
       'estimated_k':estimated_k, 'est_error':est_error, 'est_deviation':est_deviation})

{'data_name': 'BoneMarrow', 'true_k': 10, 'estimated_k': 10, 'est_error': 0, 'est_deviation': 0.0}
