## CAS v1 Client Demo

### Single-nucleus cross-tissue molecular reference maps to decipher disease gene function
https://www.science.org/doi/10.1126/science.abl4290

Understanding the function of genes and their regulation in tissue homeostasis and disease requires knowing the cellular context in which genes are expressed in tissues across the body. Single cell genomics allows the generation of detailed cellular atlases in human tissues, but most efforts are focused on single tissue types. Here, we establish a framework for profiling multiple tissues across the human body at single-cell resolution using single nucleus RNA-Seq (snRNA-seq), and apply it to 8 diverse, archived, frozen tissue types (three donors per tissue). We apply four snRNA-seq methods to each of 25 samples from 16 donors, generating a cross-tissue atlas of 209,126 nuclei profiles, and benchmark them vs. scRNA-seq of comparable fresh tissues. We use a conditional variational autoencoder (cVAE) to integrate an atlas across tissues, donors, and laboratory methods. We highlight shared and tissue-specific features of tissue-resident immune cells, identifying tissue-restricted and non-restricted resident myeloid populations. These include a cross-tissue conserved dichotomy between LYVE1- and HLA class II-expressing macrophages, and the broad presence of LAM-like macrophages across healthy tissues that is also observed in disease. For rare, monogenic muscle diseases, we identify cell types that likely underlie the neuromuscular, metabolic, and immune components of these diseases, and biological processes involved in their pathology. For common complex diseases and traits analyzed by GWAS, we identify the cell types and gene modules that potentially underlie disease mechanisms. The experimental and analytical frameworks we describe will enable the generation of large-scale studies of how cellular and molecular processes vary across individuals and populations.


In [None]:
import os
import sys
import matplotlib.pylab as plt
import scanpy as sc
import numpy as np
import pandas as pd
import scipy.sparse as sp
import warnings
import anndata
sys.path.append('../src')

from cas_client_helper import *

sc.settings.set_figure_params(dpi=80, facecolor='white')

In [None]:
adata_full = sc.read_h5ad('/home/jupyter/data/casp-cli-demo/a3ffde6c_gtex.h5ad')

In [None]:
ground_truth_cell_type_column = 'cell_type'

## "Ground Truth" cell type labels

In [None]:
sc.pl.umap(adata_full, color=ground_truth_cell_type_column)

In [None]:
# subset
rng = np.random.RandomState(42)
n_random_cells = 20_000

# select a small set of cells of the included types
adata_subset = adata_full[rng.permutation(adata_full.shape[0])[:n_random_cells]]

In [None]:
# validate and reformat adata
adata = validate_adata_for_cas(
    adata_subset,
    int_count_matrix='X',
    gene_symbols_column_name='feature_name',
    gene_ids_column_name='__index__',
    missing_features_policy='replace_with_zero',
    extra_features_policy='ignore',
    casp_feature_list_csv_path='../resources/casp_v1_feature_list.csv')

In [None]:
adata.raw = adata

In [None]:
sc.pl.umap(adata, color=ground_truth_cell_type_column)

## CAS

In [None]:
!pip uninstall -y cell-annotation-service-client

In [None]:
!pip install git+https://github.com/broadinstitute/cell-annotation-service-client.git@fg-annotate

In [None]:
from casp_cli import service

cli = service.CASPClientService()

In [None]:
# revert the raw adata (integer counts, no gene filter)
adata_raw = adata.raw.to_adata().copy()
adata_raw.raw = adata_raw

warnings.simplefilter('ignore', anndata.ImplicitModificationWarning)
cas_query_res = cli.annotate_anndata(adata_raw, chunk_size=2000)

## Explore CAS output

In [None]:
# reduce annotations per cluster
cluster_detailed_info_dict = reduce_cas_query_result_by_majority_vote_per_cluster(
    adata, cas_query_res, cluster_key=ground_truth_cell_type_column, ignore_set={'native cell'})

# visualize
sc.pl.umap(adata, color='cas_per_cluster_cell_type')

In [None]:
def highlight_cluster(
        adata,
        cluster_id,
        ground_truth_cell_type_column='cell_type',
        cas_cell_type_column='cas_per_cluster_cell_type',
        top_k=10):
    fig, ax = plt.subplots()
    ax.scatter(adata.obsm['X_umap'][:, 0], adata.obsm['X_umap'][:, 1], s=2, edgecolor='none', color='gray', alpha=0.25)
    adata_subset = adata[adata.obs[ground_truth_cell_type_column] == cluster_id]
    ax.scatter(adata_subset.obsm['X_umap'][:, 0], adata_subset.obsm['X_umap'][:, 1], s=2, edgecolor='none', color='red', alpha=1.)
    ax.grid(False)
    ax.set_xlabel('UMAP 1')
    ax.set_ylabel('UMAP 2')
    plt.show()
    print(f'GROUND TRUTH CELL TYPE:\n{cluster_id}\n')
    print(f'{"CAS CELL TYPE":100s} {"FREQUENCY"}')
    for cell_type, freq in cluster_detailed_info_dict[cluster_id][:top_k]:
        print(f'{cell_type:100s} {freq:.4f}')
    print()
        
for cluster_id in adata.obs['cell_type'].values.categories:
    highlight_cluster(adata, cluster_id, ground_truth_cell_type_column=ground_truth_cell_type_column)