## CAS and hierarchical annotation

It is common for cell type annotation to form a nested hierarchy of cell sets.  We can express that implicit heirarchy as a tree or a graph  Here's an example from the lung cell atlas

<img src="attachment:f06710b7-3053-4f2d-a7ac-7200a87d22c2.png" alt="Lung heirarchy" style="width:800px;"/>
*The cells annotated as 'Goblet' are a subset of those annotated as 'Secretory', which are a subset of those annotated with 'Epithelial' etc.*


### Cell Ontology Annotation Mapping
CELLxGENE format specifies that all cells must also have a single annotation with a term from the cell ontology. We can use these cell type annotations to specify cell sets. Where those cell sets are identical to cell sets defined by author annotations we can link the two, 
e.g.

CAS-tools uses associate CL terms to terms at specific levels in the heirarchy. e.g.

<img src="attachment:1455d7cd-940e-4c8a-afba-cf8948714043.png" alt="CL to Author Annotation mapping" style="width:600px;"/>
*The set of Cells annotation with Club is identical with the set of cells annotated with the Cell Ontology term 'Club Cell'.  The graph shows this mapping with the Author hierarchy below and the Cell Ontology heirarchy above.*

This link is represented in CAS as:

labelset: L4
cell_label: Club
cell_ontology_term_id: CL:0000158
cell_ontology_term: club cell

### BICAN hierarchy

The BICAN projects constucts taxonomies (hierarchical annotations) starting from a fixed set of clusters.  Typically, a dendrogram is generated based on transcriptomic similarity (e.g. on similarity in Transcription Factor expression profile).  This dendrogram is then used as a guide for manual construction of a fixed level heirarchy - taking various other pieces of data into account.  The heirarchy is typically expressed, along with annotation metadata, in a simple spreadsheet, e.g. 

cluster_id | cluster | Group | Subclass | Class | Neighborhood
-- | -- | -- | -- | -- | --
41 | IN_41 | BN LAMP5 LHX6 GABA | CN MGE LAMP5 GABA | CN MGE GABA | Subpallium GABA
42 | IN_42 | BN LAMP5 CXCL14 GABA | CN MGE LAMP5 GABA | CN MGE GABA | Subpallium GABA
... |  |  |  |  | 
71 | IN_71 | STR VIP GABA | CN CGE VIP GABA | CN CGE GABA | Subpallium GABA
213 | IN_213 | SN PVALB GATA3 GABA | SN PVALB GABA | CN MGE GABA | Subpallium GABA
119 | IN_119 | NDB SI LHX8 GABA | CN LHX6 LHX8 GABA | CN MGE GABA | Subpallium GABA
120 | IN_120 | NDB SI LHX8 GABA | CN LHX6 LHX8 GABA | CN MGE GABA | Subpallium GABA

Such spreadsheets typically contain both hierarchy and additional annotation metadata and are kept in-sync with anndata matrix files via ad-hoc scripting solutions.

CAS-tools supports automated conversion of these spreasheet hierarchies into CAS format and synchronising the resulting CAS files with paired h5ad files. This is illustrated in [allen_spreadsheet_to_cas](allen_spreadsheet_to_cas.ipynb)

CAS files can be stored as separate JSON files, containing cell IDs, or merged into the header. 










