# How to Run the Notebooks

For detailed instructions on setting up and running the notebooks, please refer to the [README.md](https://github.com/cellannotation/cas-tools/blob/main/notebooks/README.md) in the notebooks directory.


In [1]:
import json
import pandas as pd
import anndata as ad

#### Retrieve AnnData File for `CB Glut`  

Download the AnnData file corresponding to `CB Glut (CS20230722_CLAS_29)`. The original [WMB-10Xv2](https://allen-brain-cell-atlas.s3.us-west-2.amazonaws.com/index.html#expression_matrices/WMB-10Xv2/20230630/) and [WMB-10Xv3](https://allen-brain-cell-atlas.s3.us-west-2.amazonaws.com/index.html#expression_matrices/WMB-10Xv3/20230630/) AnnData files were generated based on dissection.  

These files were merged and then split into 34 top-level classes. The file specific to `CB Glut` is included in this collection.

In [2]:
!wget -N http://cellular-semantics.cog.sanger.ac.uk/public/merged_CS20230722_CLAS_29.h5ad

--2025-04-01 11:05:14--  http://cellular-semantics.cog.sanger.ac.uk/public/merged_CS20230722_CLAS_29.h5ad
Resolving cellular-semantics.cog.sanger.ac.uk (cellular-semantics.cog.sanger.ac.uk)... 172.27.51.131, 172.27.51.1, 172.27.51.2, ...
Connecting to cellular-semantics.cog.sanger.ac.uk (cellular-semantics.cog.sanger.ac.uk)|172.27.51.131|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://cellular-semantics.cog.sanger.ac.uk/public/merged_CS20230722_CLAS_29.h5ad [following]
--2025-04-01 11:05:14--  https://cellular-semantics.cog.sanger.ac.uk/public/merged_CS20230722_CLAS_29.h5ad
Connecting to cellular-semantics.cog.sanger.ac.uk (cellular-semantics.cog.sanger.ac.uk)|172.27.51.131|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2575377493 (2.4G) [application/x-hdf5]
Saving to: ‘merged_CS20230722_CLAS_29.h5ad’


2025-04-01 11:06:33 (31.1 MB/s) - ‘merged_CS20230722_CLAS_29.h5ad’ saved [2575377493/2575377493]


In [3]:
merged_anndata = ad.read_h5ad("merged_CS20230722_CLAS_29.h5ad", backed="r")
merged_anndata.obs[:5]

Unnamed: 0_level_0,cell_barcode,library_label,tissue,tissue_ontology_term_id,neurotransmitter,class,subclass,supertype,cluster,organism,disease,assay
cell_label,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
AAACCCAAGAACAAGG-472_A05,AAACCCAAGAACAAGG,L8TX_201217_01_G07,Cerebellum,UBERON:0002037,Glut,29 CB Glut,314 CB Granule Glut,1155 CB Granule Glut_2,5201 CB Granule Glut_2,Mus musculus,normal,10x 3' v2
AAACCCAAGAATCCCT-473_A06,AAACCCAAGAATCCCT,L8TX_201217_01_A08,Cerebellum,UBERON:0002037,Glut,29 CB Glut,314 CB Granule Glut,1155 CB Granule Glut_2,5201 CB Granule Glut_2,Mus musculus,normal,10x 3' v3
AAACCCAAGACTACCT-225_A01,AAACCCAAGACTACCT,L8TX_200227_01_F10,Medulla,UBERON:0001896,Glut,29 CB Glut,314 CB Granule Glut,1154 CB Granule Glut_1,5197 CB Granule Glut_1,Mus musculus,normal,10x 3' v2
AAACCCAAGAGCTGAC-231.2_B01,AAACCCAAGAGCTGAC,L8TX_200306_01_H12,Medulla,UBERON:0001896,Glut,29 CB Glut,314 CB Granule Glut,1154 CB Granule Glut_1,5197 CB Granule Glut_1,Mus musculus,normal,10x 3' v3
AAACCCAAGAGGACTC-478_A02,AAACCCAAGAGGACTC,L8TX_210107_02_H11,Cerebellum,UBERON:0002037,Glut,29 CB Glut,314 CB Granule Glut,1155 CB Granule Glut_2,5201 CB Granule Glut_2,Mus musculus,normal,10x 3' v2


### Extract cas from header

In [4]:
cas_json_str = merged_anndata.uns["cas"]
cas_json_obj = json.loads(cas_json_str)
print(json.dumps(cas_json_obj, indent=2))

{
  "cellannotation_schema_version": "1.0.0",
  "author_name": "Hongkui Zeng",
  "labelsets": [
    {
      "name": "neurotransmitter",
      "description": "Clusters are assigned based on the average expression of both neurotransmitter transporter genes and key neurotransmitter synthesizing enzyme genes."
    },
    {
      "name": "class",
      "description": "The top level of cell type definition in the mouse whole brain taxonomy. It is primarily determined by broad brain region and neurotransmitter type. All cells within a subclass belong to the same class. Class provides a broader categorization of cell types.",
      "rank": 3
    },
    {
      "name": "subclass",
      "description": "The coarse level of cell type definition in the mouse whole brain taxonomy. All cells within a supertype belong to the same subclass. Subclass groups together related supertypes.",
      "rank": 2
    },
    {
      "name": "supertype",
      "description": "The second finest level of cell type d

In [5]:
merged_anndata.file.close()

## Export CAS content to CAP AnnData format

In [6]:
!cas export2cap --help

usage: cas export2cap [-h] [--json JSON] [--anndata ANNDATA] [--output OUTPUT]
                      [--fill-na]

Flattens all content of CAS annotations to an AnnData file.

options:
  -h, --help         show this help message and exit
  --json JSON        Optional input JSON file path. If not provided, the CAS
                     JSON will be extracted from the AnnData file's 'uns'
                     section.
  --anndata ANNDATA  Optional input AnnData file path. If not provided, the
                     AnnData file will be downloaded using the matrix file id
                     from the CAS JSON.
  --output OUTPUT    Output AnnData file name.
  --fill-na          Boolean flag indicating whether to fill missing values in
                     the 'obs' field with pd.NA. If provided, missing values
                     will be replaced with pd.NA; if not provided, they will
                     remain as empty strings.


In [7]:
# export2cap with cas json file from header
!cas export2cap --anndata merged_CS20230722_CLAS_29.h5ad --output flatten_cas_CS20230722_CLAS_29.h5ad

INFO:root:All labelsets exist in obs.
INFO:root:All labelset members exist in the corresponding obs columns.
INFO:root:Parent-child relationships are consistent between CAS and OBS.
INFO:root:All labelsets exist in obs.
INFO:root:All labelset members exist in the corresponding obs columns.
INFO:root:Parent-child relationships are consistent between CAS and OBS.


In [8]:
flatten_df = ad.read_h5ad("flatten_cas_CS20230722_CLAS_29.h5ad", backed="r")
flatten_df.obs

Unnamed: 0_level_0,cell_barcode,library_label,tissue,tissue_ontology_term_id,neurotransmitter,organism,disease,assay,class,class--cell_set_accession,...,supertype--cell_set_accession,supertype--parent_cell_set_accession,supertype--author_annotation_fields,cluster,cluster--cell_set_accession,cluster--parent_cell_set_accession,cluster--author_annotation_fields,cluster--neurotransmitter_accession,cluster--neurotransmitter_rationale,cluster--neurotransmitter_marker_gene_evidence
cell_label,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AAACCCAAGAACAAGG-472_A05,AAACCCAAGAACAAGG,L8TX_201217_01_G07,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v2,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_314,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7
AAACCCAAGAATCCCT-473_A06,AAACCCAAGAATCCCT,L8TX_201217_01_A08,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v3,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_314,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7
AAACCCAAGACTACCT-225_A01,AAACCCAAGACTACCT,L8TX_200227_01_F10,Medulla,UBERON:0001896,Glut,Mus musculus,normal,10x 3' v2,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1154,CS20230722_SUBC_314,{'supertype.markers.combo _within subclass_': ...,5197 CB Granule Glut_1,CS20230722_CLUS_5197,CS20230722_SUPT_1154,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:8.41,Slc17a7
AAACCCAAGAGCTGAC-231.2_B01,AAACCCAAGAGCTGAC,L8TX_200306_01_H12,Medulla,UBERON:0001896,Glut,Mus musculus,normal,10x 3' v3,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1154,CS20230722_SUBC_314,{'supertype.markers.combo _within subclass_': ...,5197 CB Granule Glut_1,CS20230722_CLUS_5197,CS20230722_SUPT_1154,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:8.41,Slc17a7
AAACCCAAGAGGACTC-478_A02,AAACCCAAGAGGACTC,L8TX_210107_02_H11,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v2,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_314,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
TTTGTTGTCTGATTCT-171_A01,TTTGTTGTCTGATTCT,L8TX_191029_01_C08,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v3,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_314,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7
TTTGTTGTCTTCCACG-471_A04,TTTGTTGTCTTCCACG,L8TX_201217_01_A07,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v2,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_314,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7
TTTGTTGTCTTGATTC-471_A04,TTTGTTGTCTTGATTC,L8TX_201217_01_A07,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v3,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_314,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7
TTTGTTGTCTTGCAGA-181_B01,TTTGTTGTCTTGCAGA,L8TX_191119_01_F11,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v2,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_314,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7


## Edit CAS

In [9]:
flatten_df.obs[flatten_df.obs["supertype"]=="1155 CB Granule Glut_2"]

Unnamed: 0_level_0,cell_barcode,library_label,tissue,tissue_ontology_term_id,neurotransmitter,organism,disease,assay,class,class--cell_set_accession,...,supertype--cell_set_accession,supertype--parent_cell_set_accession,supertype--author_annotation_fields,cluster,cluster--cell_set_accession,cluster--parent_cell_set_accession,cluster--author_annotation_fields,cluster--neurotransmitter_accession,cluster--neurotransmitter_rationale,cluster--neurotransmitter_marker_gene_evidence
cell_label,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AAACCCAAGAACAAGG-472_A05,AAACCCAAGAACAAGG,L8TX_201217_01_G07,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v2,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_314,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7
AAACCCAAGAATCCCT-473_A06,AAACCCAAGAATCCCT,L8TX_201217_01_A08,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v3,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_314,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7
AAACCCAAGAGGACTC-478_A02,AAACCCAAGAGGACTC,L8TX_210107_02_H11,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v2,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_314,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7
AAACCCAAGATGCTGG-471_B04,AAACCCAAGATGCTGG,L8TX_201217_01_B07,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v3,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_314,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7
AAACCCAAGCAGGGAG-213.2_B01,AAACCCAAGCAGGGAG,L8TX_200206_01_A03,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v2,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_314,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
TTTGTTGTCTGATTCT-171_A01,TTTGTTGTCTGATTCT,L8TX_191029_01_C08,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v3,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_314,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7
TTTGTTGTCTTCCACG-471_A04,TTTGTTGTCTTCCACG,L8TX_201217_01_A07,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v2,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_314,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7
TTTGTTGTCTTGATTC-471_A04,TTTGTTGTCTTGATTC,L8TX_201217_01_A07,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v3,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_314,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7
TTTGTTGTCTTGCAGA-181_B01,TTTGTTGTCTTGCAGA,L8TX_191119_01_F11,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v2,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_314,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7


Moving `supertype`-`1155 CB Granule Glut_2` from `subclass`-`314 CB Granule Glut` to `subclass`-`315 DCO UBC Glut`

In [10]:
CB_Granule_Glut = [annotation for annotation in cas_json_obj["annotations"] if annotation["cell_label"] == "314 CB Granule Glut"][0]
DCO_UBC_Glut = [annotation for annotation in cas_json_obj["annotations"] if annotation["cell_label"] == "315 DCO UBC Glut"][0]
CB_Granule_Glut_2 = [annotation for annotation in cas_json_obj["annotations"] if annotation["cell_label"] == "1155 CB Granule Glut_2"][0]

In [11]:
CB_Granule_Glut_2["cell_label"] = "Updated_CB Granule Glut_2"
CB_Granule_Glut_2["parent_cell_set_accession"] = DCO_UBC_Glut["cell_set_accession"]

In [12]:
with open("edited_cas.json", "w") as f:
    f.write(json.dumps(cas_json_obj, indent=2))

In [13]:
!cas populate_cells --json edited_cas.json --anndata merged_CS20230722_CLAS_29.h5ad --labelsets cluster supertype subclass class

INFO:root:All labelsets exist in obs.


In [14]:
!cas export2cap --json edited_cas.json --anndata merged_CS20230722_CLAS_29.h5ad --output edited_flatten_CS20230722_CLAS_29.h5ad

In [15]:
edited_flatten_df = ad.read_h5ad("edited_flatten_CS20230722_CLAS_29.h5ad")

In [16]:
edited_flatten_df.obs[edited_flatten_df.obs["supertype"]=="Updated_CB Granule Glut_2"]

Unnamed: 0_level_0,cell_barcode,library_label,tissue,tissue_ontology_term_id,neurotransmitter,organism,disease,assay,class,class--cell_set_accession,...,supertype--cell_set_accession,supertype--parent_cell_set_accession,supertype--author_annotation_fields,cluster,cluster--cell_set_accession,cluster--parent_cell_set_accession,cluster--author_annotation_fields,cluster--neurotransmitter_accession,cluster--neurotransmitter_rationale,cluster--neurotransmitter_marker_gene_evidence
cell_label,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AAACCCAAGAACAAGG-472_A05,AAACCCAAGAACAAGG,L8TX_201217_01_G07,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v2,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_315,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7
AAACCCAAGAATCCCT-473_A06,AAACCCAAGAATCCCT,L8TX_201217_01_A08,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v3,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_315,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7
AAACCCAAGAGGACTC-478_A02,AAACCCAAGAGGACTC,L8TX_210107_02_H11,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v2,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_315,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7
AAACCCAAGATGCTGG-471_B04,AAACCCAAGATGCTGG,L8TX_201217_01_B07,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v3,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_315,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7
AAACCCAAGCAGGGAG-213.2_B01,AAACCCAAGCAGGGAG,L8TX_200206_01_A03,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v2,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_315,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
TTTGTTGTCTGATTCT-171_A01,TTTGTTGTCTGATTCT,L8TX_191029_01_C08,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v3,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_315,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7
TTTGTTGTCTTCCACG-471_A04,TTTGTTGTCTTCCACG,L8TX_201217_01_A07,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v2,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_315,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7
TTTGTTGTCTTGATTC-471_A04,TTTGTTGTCTTGATTC,L8TX_201217_01_A07,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v3,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_315,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7
TTTGTTGTCTTGCAGA-181_B01,TTTGTTGTCTTGCAGA,L8TX_191119_01_F11,Cerebellum,UBERON:0002037,Glut,Mus musculus,normal,10x 3' v2,29 CB Glut,CS20230722_CLAS_29,...,CS20230722_SUPT_1155,CS20230722_SUBC_315,{'supertype.markers.combo _within subclass_': ...,5201 CB Granule Glut_2,CS20230722_CLUS_5201,CS20230722_SUPT_1155,"{'neighborhood': 'NN-IMN-GC', 'anatomical_anno...",CS20230722_NEUR_Glut,Slc17a7:7.22,Slc17a7


## Edit OBS

In [17]:
edited_flatten_df.obs["subclass"] = edited_flatten_df.obs["subclass"].replace("315 DCO UBC Glut", "Upgraded DCO UBC Glut")
edited_flatten_df.obs["subclass"] = edited_flatten_df.obs["subclass"].replace("314 CB Granule Glut", "Downgraded CB Granule Glut")

  edited_flatten_df.obs["subclass"] = edited_flatten_df.obs["subclass"].replace("315 DCO UBC Glut", "Upgraded DCO UBC Glut")
  edited_flatten_df.obs["subclass"] = edited_flatten_df.obs["subclass"].replace("314 CB Granule Glut", "Downgraded CB Granule Glut")


In [18]:
edited_flatten_df.write("new_edited_flatten_cas_CS20230722_CLAS_29.h5ad", compression="gzip")

In [19]:
edited_flatten_df.file.close()

## Unflatten

In [20]:
!cas unflatten --anndata new_edited_flatten_cas_CS20230722_CLAS_29.h5ad --json edited_cas.json --output_anndata unflatten_cas_CS20230722_CLAS_29.h5ad  --output_json new_cas.json



In [21]:
with open("new_cas.json", "r") as f:
    new_split_cas = json.load(f)

In [22]:
[anno for anno in new_split_cas["annotations"] if anno["labelset"] == "subclass"]

[{'author_annotation_fields': {'neighborhood': 'NN-IMN-GC',
   'subclass.tf.markers.combo': 'Pax6,Neurod2,Etv1',
   'subclass.markers.combo': 'Gabra6,Ror1',
   'nt_type_label': 'Glut',
   'CTX.cluster_label': 'None',
   'supertype.markers.combo _within subclass_': 'None',
   'supertype.markers.combo': 'None',
   'CCF_acronym.freq': 'None',
   'sex.bias': 'None',
   'cluster.markers.combo _within subclass_': 'None',
   'nt.markers': 'None',
   'CTX.supertype_label': 'None',
   'v2.size': 'None',
   'v3.size': 'None',
   'cluster.TF.markers.combo': 'None',
   'multiome.size': 'None',
   'Dark': 'None',
   'Glial': 'None',
   'Light': 'None',
   'anatomical_annotation': 'None',
   'CTX.supertype_id': 'None',
   'np.markers': 'None',
   'CTX.subclass_id': 'None',
   'Neuronal': 'None',
   'cluster.markers.combo': 'None',
   'CTX.subclass_label': 'None',
   'CCF_broad.freq': 'None',
   'merfish.markers.combo': 'None',
   'CTX.size': 'None',
   'nt_type_combo_label': 'None',
   'CTX.cluster_

In [23]:
ub = ad.read_h5ad("unflatten_cas_CS20230722_CLAS_29.h5ad", backed="r")

In [24]:
cas_json_str = ub.uns["cas"]
cas_json_obj = json.loads(cas_json_str)
print(json.dumps(cas_json_obj, indent=2))

{
  "title": "Whole Mouse Brain Taxonomy",
  "description": "Atlas of whole mouse brain.",
  "cellannotation_schema_version": "1.0.0",
  "author_name": "Hongkui Zeng",
  "orcid": "https://orcid.org/0000-0002-9361-5607",
  "labelsets": [
    {
      "name": "class",
      "description": "The top level of cell type definition in the mouse whole brain taxonomy. It is primarily determined by broad brain region and neurotransmitter type. All cells within a subclass belong to the same class. Class provides a broader categorization of cell types.",
      "rank": 3
    },
    {
      "name": "cluster",
      "description": "The finest level of cell type definition in the mouse whole brain taxonomy. Cells within a cluster share similar characteristics and belong to the same supertype.",
      "rank": 0
    },
    {
      "name": "subclass",
      "description": "The coarse level of cell type definition in the mouse whole brain taxonomy. All cells within a supertype belong to the same subclass. 

In [25]:
ub.file.close()