### Overview
This notebook imports the extracted data from the unannotated scRNA-seq (publicly available dataset [2]) and reference to perform single-cell annotation using SCINA [3].

This annotation method is particularly useful when there is no annotated scRNA-seq available as reference. 

**This notebook is written in R.**

In [None]:
.libPaths('/home/chiacmm/rpackages')

In [None]:
#Libraries and global setting
library(SCINA)
library(stringr)
library(base) #it failed without loading base once on rpy2env
library(tools)

### Steps performed

1. Import the query data (unannotated scRNA-seq)

2. Import the filtered marker genes data (to be used in annotating the query data)

3. Run SCINA

4. Export results

#### Import the query data (unannotated scRNA-seq)

In [3]:
#Import query data
fullpath_input_geneexpr <- "../../data/demo_public/output/scina_que_exprs.csv"

#Read expression matrix
df_geneexprs <- read.csv(fullpath_input_geneexpr, row.names=1, stringsAsFactors = F)

#### Import the filtered marker genes data

In [4]:
#Import marker genes 
fullpath_input_markergenes <- "../../data/demo_public/output/scina_filtered_markergenes.csv"

#Read and preprocess marker genes using SCINA function
df_markergenes <- preprocess.signatures(fullpath_input_markergenes)
df_markergenes

#### SCINA begins

In [5]:
#Run SCINA
system.time(res_scina <- SCINA(df_geneexprs, df_markergenes, max_iter = 1000, convergence_n = 10, convergence_rate = 0.999, sensitivity_cutoff = 0.9, rm_overlap=TRUE, allow_unknown=TRUE, log_file='SCINA.log'))

   user  system elapsed 
 16.144  14.671   2.932 

#### Export results

In [6]:
# Write data
#convert list of lists to data frame
l_cellID = colnames(df_geneexprs)
l_cell_type = as.list(res_scina$cell_labels)
l_celltype_probability = head(res_scina$probabilities,1)

#Strip attributes
l_celltype_probability[] <- lapply(l_celltype_probability, function(x) { attributes(x) <- NULL; x })

#Create result dataframe
df_res_scina <- data.frame(l_cellID, unlist(l_cell_type), unlist(l_celltype_probability))
names(df_res_scina) <- c("cellID", "annotation","scina_probability")

#Output data
fullpath_output_scina <- "../../data/demo_public/output/scina_annotation.csv"
system.time(write.csv(df_res_scina, fullpath_output_scina))

   user  system elapsed 
  0.013   0.001   0.016 

In [7]:
sessionInfo()

R version 4.1.3 (2022-03-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS/LAPACK: /home/chiacmm/.conda/envs/findsyn/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US       LC_NUMERIC=C         LC_TIME=en_US       
 [4] LC_COLLATE=en_US     LC_MONETARY=en_US    LC_MESSAGES=en_US   
 [7] LC_PAPER=en_US       LC_NAME=C            LC_ADDRESS=C        
[10] LC_TELEPHONE=C       LC_MEASUREMENT=en_US LC_IDENTIFICATION=C 

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] stringr_1.5.0 SCINA_1.2.0   gplots_3.1.3  MASS_7.3-58.1

loaded via a namespace (and not attached):
 [1] magrittr_2.0.3     uuid_1.1-0         rlang_1.0.6        fastmap_1.1.0     
 [5] fansi_1.0.3        caTools_1.18.2     KernSmooth_2.23-20 utf8_1.2.2        
 [9] cli_3.5.0          htmltools_0.5.4    gtools_3.9.4       digest_0.6.31     
[13] lifecycle_1.

##### Reference
1. Chia, C. M., Roig Adam, A., & Moro, A. (2022). *In silico* multiple single-subject neural tissue screening using deconvolution on pseudo-bulk RNA-seq - a prototype. Bioinformatics and Systems Biology joint degree program. Vrije Universiteit Amsterdam and University of Amsterdam. 

2. Allen Institute for Brain Science (2004). Allen Mouse Brain Atlas, Mouse Whole Cortex and Hippocampus 10x. Available from mouse.brain-map.org. Allen Institute for Brain Science (2011).

3. Zhang, Z., Luo, D., Zhong, X., Choi, J. H., Ma, Y., Wang, S., Mahrt, E., Guo, W., Stawiski, E. W., Modrusan, Z., Seshagiri, S., Kapur, P., Hon, G. C., Brugarolas, J., & Wang, T. (2019). SCINA: A Semi-Supervised Subtyping Algorithm of Single Cells and Bulk Samples. Genes, 10(7), 531. https://doi.org/10.3390/genes10070531