# Cell type assignment

In this notebook, we use a manual annotation approach, based on marker gene expression, to validate the cell type predictions of the automated annotation model. 

In [2]:
import scrabbitpy
import scanpy as sc
import pandas as pd
import numpy as np

run_full = False
run_fast = True

ModuleNotFoundError: No module named 'scrabbitpy'

### Load data

In [None]:
# Load data
r_data = sc.read_h5ad("../data/cell_type_annotation/rabbit_data.h5ad")
orthologs = pd.read_csv("../data/orthologs/orthologs_mouse_ext.tsv",sep="\t")
markers = pd.read_csv("../data/cell_type_annotation/marker_genes.tsv",sep="\t")


### Cluster data

Cell type labels are assigned to clusters identified in the high-dimensional gene expression space. Here we perfrom Leiden clustering at various resolutions to identify both coarse and fine-grained populations of cells.

In [None]:
# Load clusters from file
if(run_fast):
    clusters = pd.read_csv("../data/cell_type_annotation/clusters.tsv",sep="\t")
    r_data.obs = r_data.obs.join(clusters)

# Compute clusterings
elif(run_full):
    sc.tl.leiden(r_data,resolution=1,key_added="leiden_res1",random_state=seed)
    sc.tl.leiden(rabbit,resolution=2,key_added="leiden_res2",random_state=seed)
    sc.tl.leiden(rabbit,resolution=1.5,key_added="leiden_res1_5",random_state=seed)
    sc.tl.leiden(rabbit,resolution=2.5,key_added="leiden_res2_5",random_state=seed)
    sc.tl.leiden(rabbit,resolution=3,key_added="leiden_res3",random_state=seed)
    sc.tl.leiden(rabbit,resolution=5,key_added="leiden_res5",random_state=seed)
    sc.tl.leiden(rabbit,resolution=10,key_added="leiden_res10",random_state=seed)

### Divide dataset

To simplify the annotation process, the dataset is initially divded into broad regions to be annotated independently. 

For each broad region, we plot the automated annotation predictions, along with clusterings of different resolutions. In addition to clustering the entire dataset, Leiden clustering is also performed within each region. These results are compared to UMAP plots of marker gene expression of known cell types as well as differentially expressed genes computationally identified in the annotated mouse dataset.



In [None]:
blood = rabbit[rabbit.obs["leiden_res1"].isin(['4','19','20','8']),]
mesoderm = rabbit[rabbit.obs["leiden_res1"].isin(['17','2','12','22']),]
neural = rabbit[rabbit.obs["leiden_res1"].isin(['3','7']),]
misc = rabbit[rabbit.obs["leiden_res1"].isin(['6','23','21','18']),]
exe_ectoderm = rabbit[rabbit.obs["leiden_res1"].isin(['9','0','16']),]
exe_mesoderm = rabbit[rabbit.obs["leiden_res1"].isin(['5','14','10','1','22']),]
exe_endoderm = rabbit[rabbit.obs["leiden_res1"].isin(['13','15','11']),]

