## Cellarium Cell Annotation Service (CAS) Quickstart Tutorial

<img src="https://cellarium.ai/wp-content/uploads/2024/07/cellarium-logo-medium.png" alt="drawing" width="96"/>

This Jupyter Notebook is a quickstart tutorial for using Cellarium CAS.

> **Note:**
> - Please populate your API token at the appropriate cell. This Notebook will not work without a valid API token.
> - The accuracy of Cellarium CAS is note formally benchmarked yet. We generally expect accurate results on tissues and cell types that are well-represented (and well-annotated) in CZI CELLxGENE data repository (approximately 86M cells).

### Load an AnnData file

In [None]:
import scanpy as sc
import warnings

# suppressing some of the informational warnings
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=UserWarning)

# set default figure resolution and size
sc.set_figure_params(dpi=80)

In [None]:
# Peripheral blood mononuclear cells (10x Genomics) [4,000 cells]
adata = sc.read('./resources/pbmc_10x_v3_4k.h5ad')

In [None]:
# primary visual cortex, human (V1) (Lein et al., 2023) [20,000 nuclei]
adata = sc.read('./resources/lein_2023_V1_cortex_10x_v3_20k.h5ad')

In [None]:
adata

In [None]:
adata.obs

In [None]:
sc.pl.umap(adata)

In [None]:
sc.pl.umap(adata, color='cluster_label')

### Submit AnnData to Cellarium CAS

In [None]:
from cellarium.cas.client import CASClient

api_token = "7c888f09-d653-472b-b874-d263856875b2.eaec8dc3-8a80-46d8-8d72-59f5cdd8ad9a"

cas = CASClient(api_token=api_token, api_url="https://cellarium-june-release-cas-api-vi7nxpvk7a-uc.a.run.app")

In [None]:
# select the annotation embedding model
cas_model_name = 'jrb_pca_512_all_genes_log1p_zscore'

# ontology-aware cell type query
cas_ontology_aware_response = cas.annotate_matrix_cell_type_ontology_aware_strategy(
    matrix=adata,
    chunk_size=500,
    feature_ids_column_name='gene_ids',
    feature_names_column_name='index',
    cas_model_name=cas_model_name)

### Explore Cellarium CAS response

To skip submitting data to CAS API and save time, you can load the precomputed results from the CAS API for **pbmc_10k_v3_4k** dataset by running the following cell:

In [None]:
import pickle

with open("./resources/pbmc_10x_v3_4k__cas_ontology_aware_response.pkl", "rb") as f:
    loader = pickle.Unpickler(f)
    cas_ontology_aware_response = loader.load()

To skip submitting data to CAS API and save time, you can load the precomputed results from the CAS API for **lein_2023_V1_cortex_10x_v3_20k** dataset by running the following cell:

In [None]:
import pickle

with open("./resources/lein_2023_V1_cortex_10x_v3_20k__cas_ontology_aware_response.pkl", "rb") as f:
    loader = pickle.Unpickler(f)
    cas_ontology_aware_response = loader.load()

In [None]:
from cellarium.cas._io import suppress_stderr
from cellarium.cas.visualization import CASCircularTreePlotUMAPDashApp

DASH_SERVER_PORT = 8050

with suppress_stderr():
    CASCircularTreePlotUMAPDashApp(
        adata,
        cas_ontology_aware_response,
        umap_marker_size=3,
        hidden_cl_names_set={"CL_0000117", "CL_0000099", "CL_0000402"},
        cluster_label_obs_column="cluster_label",
    ).run(port=DASH_SERVER_PORT, debug=False, jupyter_width="80%")

### Best cell type label assignment

In [None]:
import cellarium.cas.postprocessing.ontology_aware as pp
from cellarium.cas.postprocessing.cell_ontology import CellOntologyCache

with suppress_stderr():
    cl = CellOntologyCache()

#### Assing cell type calls to individual cells

In [None]:
pp.compute_most_granular_top_k_calls_single(
    adata=adata,
    cl=cl,
    min_acceptable_score=0.1)

In [None]:
sc.pl.umap(adata, color='cas_cell_type_label_1')
sc.pl.umap(adata, color='cas_cell_type_label_2')
sc.pl.umap(adata, color='cas_cell_type_label_2')

#### Assign cell type calls to predefined cell clusters

In [None]:
pp.compute_most_granular_top_k_calls_cluster(
    adata=adata,
    cl=cl,
    min_acceptable_score=0.1,
    cluster_label_obs_column='cluster_label',
    obs_prefix='cas_cell_type_cluster')

In [None]:
sc.pl.umap(adata, color='cas_cell_type_cluster_label_1')
sc.pl.umap(adata, color='cas_cell_type_cluster_label_2')
sc.pl.umap(adata, color='cas_cell_type_cluster_label_3')