## CAS v1 Client Demo

### Single-nucleus profiling of human dilated and hypertrophic cardiomyopathy
https://www.nature.com/articles/s41586-022-04817-8

Heart failure encompasses a heterogeneous set of clinical features that converge on impaired cardiac contractile function1,2 and presents a growing public health concern. Previous work has highlighted changes in both transcription and protein expression in failing hearts3,4, but may overlook molecular changes in less prevalent cell types. Here we identify extensive molecular alterations in failing hearts at single-cell resolution by performing single-nucleus RNA sequencing of nearly 600,000 nuclei in left ventricle samples from 11 hearts with dilated cardiomyopathy and 15 hearts with hypertrophic cardiomyopathy as well as 16 non-failing hearts. The transcriptional profiles of dilated or hypertrophic cardiomyopathy hearts broadly converged at the tissue and cell-type level. Further, a subset of hearts from patients with cardiomyopathy harbour a unique population of activated fibroblasts that is almost entirely absent from non-failing samples. We performed a CRISPR-knockout screen in primary human cardiac fibroblasts to evaluate this fibrotic cell state transition; knockout of genes associated with fibroblast transition resulted in a reduction of myofibroblast cell-state transition upon TGFβ1 stimulation for a subset of genes. Our results provide insights into the transcriptional diversity of the human heart in health and disease as well as new potential therapeutic targets and biomarkers for heart failure.

In [None]:
import os
import sys
import matplotlib.pylab as plt
import scanpy as sc
import numpy as np
import pandas as pd
import scipy.sparse as sp
import warnings
import anndata
sys.path.append('../src')

from cas_client_helper import *

sc.settings.set_figure_params(dpi=80, facecolor='white')

In [None]:
adata_full = sc.read_h5ad('/home/jupyter/data/casp-cli-demo/pcl_hcm_dcm.h5ad')

In [None]:
ground_truth_cell_type_column = 'leiden0.6_name'

In [None]:
adata_full.X = adata_full.layers['cellbender']

## "Ground Truth" cell type labels

In [None]:
sc.pl.umap(adata_full, color=ground_truth_cell_type_column)

In [None]:
# subset
rng = np.random.RandomState(42)
n_random_cells = 20_000

# select a small set of cells of the included types
adata_subset = adata_full[rng.permutation(adata_full.shape[0])[:n_random_cells]]

In [None]:
# validate and reformat adata
adata = validate_adata_for_cas(
    adata_subset,
    int_count_matrix='X',
    gene_symbols_column_name='__index__',
    gene_ids_column_name='gene_ids',
    missing_features_policy='replace_with_zero',
    extra_features_policy='ignore',
    casp_feature_list_csv_path='../resources/casp_v1_feature_list.csv')

In [None]:
adata.raw = adata

In [None]:
sc.pl.umap(adata, color=ground_truth_cell_type_column)

## CAS

In [None]:
!pip uninstall -y cell-annotation-service-client

In [None]:
!pip install git+https://github.com/broadinstitute/cell-annotation-service-client.git@fg-annotate

In [None]:
from casp_cli import service

cli = service.CASPClientService()

In [None]:
# revert the raw adata (integer counts, no gene filter)
adata_raw = adata.raw.to_adata().copy()
adata_raw.raw = adata_raw

warnings.simplefilter('ignore', anndata.ImplicitModificationWarning)
cas_query_res = cli.annotate_anndata(adata_raw, chunk_size=2000)

## Explore CAS output

In [None]:
# reduce annotations per cluster
cluster_detailed_info_dict = reduce_cas_query_result_by_majority_vote_per_cluster(
    adata, cas_query_res, cluster_key=ground_truth_cell_type_column, ignore_set={'native cell'})

# visualize
sc.pl.umap(adata, color='cas_per_cluster_cell_type')

In [None]:
def highlight_cluster(
        adata,
        cluster_id,
        ground_truth_cell_type_column='cell_type',
        cas_cell_type_column='cas_per_cluster_cell_type',
        top_k=10):
    fig, ax = plt.subplots()
    ax.scatter(adata.obsm['X_umap'][:, 0], adata.obsm['X_umap'][:, 1], s=2, edgecolor='none', color='gray', alpha=0.25)
    adata_subset = adata[adata.obs[ground_truth_cell_type_column] == cluster_id]
    ax.scatter(adata_subset.obsm['X_umap'][:, 0], adata_subset.obsm['X_umap'][:, 1], s=2, edgecolor='none', color='red', alpha=1.)
    ax.grid(False)
    ax.set_xlabel('UMAP 1')
    ax.set_ylabel('UMAP 2')
    plt.show()
    print(f'GROUND TRUTH CELL TYPE:\n{cluster_id}\n')
    print(f'{"CAS CELL TYPE":100s} {"FREQUENCY"}')
    for cell_type, freq in cluster_detailed_info_dict[cluster_id][:top_k]:
        print(f'{cell_type:100s} {freq:.4f}')
    print()
        
for cluster_id in adata.obs[ground_truth_cell_type_column].values.categories:
    highlight_cluster(adata, cluster_id, ground_truth_cell_type_column=ground_truth_cell_type_column)

In [None]:
# visualize
sc.pl.umap(adata, color='cas_per_cluster_cell_type_confidence_score')