**This note book is the tutorial for cell type annotation** 

Here we use package "SingleR" under R 4.0 environment.

The package "SingleR" and "scater" used in this project are downloaded from **Bioconductor**.

**Install R 4.0 and related packages**

\# Install R 4.0, if you have installed R with other version, please create a new conda environment first

    conda install -c conda-forge r-base=4.0.0

\# Enter R, then install packages:

    if (!requireNamespace("BiocManager", quietly = TRUE))

        install.packages("BiocManager")

    BiocManager::install("SingleCellExperiment")

    BiocManager::install("scater")

    BiocManager::install("SingleR")


**Load packages**

In [2]:
library(SingleR)
library(SingleCellExperiment)
library(scater)

**(Optional) save built-in reference dataset**

Here we use Human Primary Cell Atalas (HPCA) (Mabbott NA et al., 2013) as reference dataset. 

Please run the code in this cell in the first time to save this dataset. For future use, just load the saved object!

Remember to change the "save_ref_dir" to your directory.

In [12]:
# save_ref_dir <- "/data1/ljq/rdata/cell_type/"
# hpca.se <- HumanPrimaryCellAtlasData()
# hpca.se
# saveRDS(hpca.se, file=paste0(save_ref_dir,"HPCA.rds"))

**Load reference dataset**

Remember to change the "ref_dir" to your directory.

In [3]:
ref_dir <- "/data1/ljq/rdata/cell_type/HPCA.rds"
hpca.se <- readRDS(ref_dir)
hpca.se

class: SummarizedExperiment 
dim: 19363 713 
metadata(0):
assays(1): logcounts
rownames(19363): A1BG A1BG-AS1 ... ZZEF1 ZZZ3
rowData names(0):
colnames(713): GSM112490 GSM112491 ... GSM92233 GSM92234
colData names(2): label.main label.fine

**Load query dataset**

In our HCA-d database, datasets are stored in .tsv format. Make sure the rows are genes and columns are cells.

Remember to change the "query_dir" to your directory.

In [4]:
query_dir <- "/stor/public/hcad/heart_heart_HCLAdultHeart1/heart_heart_HCLAdultHeart1_expression.tsv.gz"
query_data <- read.table(query_dir, header=TRUE, row.names=1)

**Data normalization**

Here we used the "logNormCounts" function to implement data normalization (package "scater").

The operation is based on the SingleCellExperiment object.


In [5]:
query_obj <- SingleCellExperiment(assays=list(counts=query_data))
query_obj <- logNormCounts(query_obj)

**Annotate cell type**

Run SingleR, using HPCA as reference and query_obj as test. We generate both the "main" cell types and "fine" cell subtypes.

In [6]:
query_pred_main <- SingleR(test = query_obj, ref = hpca.se, labels = hpca.se$label.main)
query_pred_fine <- SingleR(test = query_obj, ref = hpca.se, labels = hpca.se$label.fine)
table(query_pred_main$labels)
table(query_pred_fine$labels)



             B_cell        Chondrocytes                 CMP                  DC 
                  4                  17                   3                 160 
  Endothelial_cells         Fibroblasts         Hepatocytes           HSC_CD34+ 
                124                  33                   1                  32 
      Keratinocytes          Macrophage            Monocyte             Neurons 
                  1                 355                 209                   3 
        Neutrophils             NK_cell         Osteoblasts    Pre-B_cell_CD34- 
                  5                  47                   1                  11 
   Pro-B_cell_CD34+ Smooth_muscle_cells             T_cells   Tissue_stem_cells 
                  1                  65                  21                 215 


         Astrocyte:Embryonic_stem_cell-derived 
                                             1 
                                  B_cell:Naive 
                                             1 
                            B_cell:Plasma_cell 
                                             3 
                      Chondrocytes:MSC-derived 
                                             5 
                                           CMP 
                                             2 
                           DC:monocyte-derived 
                                             1 
DC:monocyte-derived:A._fumigatus_germ_tubes_6h 
                                            73 
           DC:monocyte-derived:AEC-conditioned 
                                             6 
           DC:monocyte-derived:anti-DC-SIGN_2h 
                                            48 
           DC:monocyte-derived:antiCD40/VAF347 
                                            40 
                DC:monocyte-derived:Gal

In [7]:
query_result <- data.frame(main_type=query_pred_main$labels, sub_type=query_pred_fine$labels)
rownames(query_result) <- colnames(query_data)

**Save annotated results**

Remember to change the "save_anno_dir" to your directory.

In [8]:
save_anno_dir <- "/data1/ljq/rdata/cell_type/"
write.csv(query_result, paste0(save_anno_dir, "anno.csv"))