# Analysing scRNA-seq Data



## Benefits

- Measure expression in individual cells rather than average expression

### Applications

**Immunology**
- Immunophenotyping (identifying cell types)
- Cell activation
- Rare cell type discovery - Unlike traditional methods of identifying cells using cell surface markers, measuring many proteins allows rare cells (<1%) to be identified. More recently cells are considered to exist on a continuum, rather than individual cell types [2].

**Cancer Biology**
- Minimal residual disease - the ability to measure millions of cells per sample can identify minimal residual disease with a limit of give tumour cells [3]

**Other**
- Identify cell-type specific changes in disease e.g. Alzheimer's disease [4]
- Developmental biology

Single cell epigenomics - analyse patterns of open chromatin

Single cell genomics - de novo germline mutations, somatic mutations and copy number alterations

## Study Design

**Protocols**
- **PCR plate-based** - These capture cells through a cell sorter or microfluidics. They are very sesitive as they offer high read depth [14], making them suitable for discriminating subpopulations of cell types with subtle differences [15]. 
    - Smartseq2 [12]
    - CEL-seq [7]
    - MARS-seq [13]
- **Droplet-based** - These can capture millions of cells, but with low sequencing depths per cell[14], this makes them useful for detecting rare cell types [18]. 
    - InDrop [16]
    - Drop-seq [9]
    - Chromium (10x Genomics) [17]
- Others
    - Quartz-seq [6]
    - RamDa-seq [8]
    - sci-RNA-seq [10]
    - Smart-seq [5]

(Epigenomic protocols - CHIP-seq [11] and scATAC-seq)

## Quality Control and Normalisation

The quality control and normalisation step is needed to remove noise introduced during the sequencing step. The amount of RNA used in single-cell analyses is much less than in bulk RNA analyses producing noisier data [19, 20]. The low abundance of RNA can also lead to 'dropout' events where the transcript is 'missed' during the reverse-transcription step of sequencing. This means the resulting data are complicated by the presense of 'zero-counts' caused by dropout events [21]. The QC step mitigates this by selecting only high-quality data for analysis. 

'getOption("repos")' replaces Bioconductor standard repositories, see
'?repositories' for details

replacement repositories:
    CRAN: https://cran.r-project.org


Bioconductor version 3.15 (BiocManager 1.30.18), R 4.2.0 (2022-04-22 ucrt)

Installing package(s) 'DropletUtils'

also installing the dependencies 'rhdf5filters', 'limma', 'locfit', 'R.oo', 'R.methodsS3', 'HDF5Array', 'rhdf5', 'edgeR', 'R.utils', 'Rhdf5lib'




package 'rhdf5filters' successfully unpacked and MD5 sums checked
package 'limma' successfully unpacked and MD5 sums checked
package 'locfit' successfully unpacked and MD5 sums checked
package 'R.oo' successfully unpacked and MD5 sums checked
package 'R.methodsS3' successfully unpacked and MD5 sums checked
package 'HDF5Array' successfully unpacked and MD5 sums checked
package 'rhdf5' successfully unpacked and MD5 sums checked
package 'edgeR' successfully unpacked and MD5 sums checked
package 'R.utils' successfully unpacked and MD5 sums checked
package 'Rhdf5lib' successfully unpacked and MD5 sums checked
package 'DropletUtils' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\Emily\AppData\Local\Temp\RtmpcJ32Ev\downloaded_packages


Old packages: 'BiocParallel', 'brew', 'broom', 'bslib', 'callr', 'cli',
  'cluster', 'DelayedMatrixStats', 'desc', 'devtools', 'dplyr', 'dtplyr',
  'evaluate', 'farver', 'fontawesome', 'forcats', 'foreign', 'future',
  'future.apply', 'gargle', 'generics', 'GenomeInfoDb', 'gert', 'ggridges',
  'gh', 'gitcreds', 'globals', 'googlesheets4', 'gtable', 'gtools', 'haven',
  'hms', 'htmltools', 'httpuv', 'httr', 'ica', 'igraph', 'IRanges', 'jsonlite',
  'knitr', 'leiden', 'lifecycle', 'MASS', 'MatrixGenerics', 'MatrixModels',
  'mnormt', 'modelr', 'multcomp', 'nlme', 'nnet', 'openssl', 'parallelly',
  'patchwork', 'pillar', 'pkgload', 'processx', 'progressr', 'ps', 'qqconf',
  'quantreg', 'rbibutils', 'Rcpp', 'RcppArmadillo', 'RcppHNSW', 'RCurl',
  'Rdpack', 'readr', 'readxl', 'reprex', 'reticulate', 'rgl', 'rlang',
  'rmarkdown', 'roxygen2', 'rstudioapi', 'rversions', 'rvest', 'sass',
  'ScaledMatrix', 'scales', 'scuttle', 'shiny', 'sn', 'stringi', 'stringr',
  'survival', 'tibble', 'tidyr'

In [47]:
suppressMessages(library(Seurat))
suppressMessages(library(SeuratData))
suppressMessages(library(patchwork))
suppressMessages(library(DropletUtils))
#LoadData('ifnb')

In [50]:
pbmc_file_path <- 'pbmc3k_filtered_gene_bc_matrices/filtered_gene_bc_matrices/hg19/'

# First, use the Read10X function to read the output of the 10X cellranger pipeline
pbmc.data <- Read10X(data.dir = pbmc_file_path)

# This returns a unique molecular identified (UMI) count matrix where the columns are individual cells (UMIs) and 
# rows are genes
pbmc.data[c(1:5), c(1:5)]

5 x 5 sparse Matrix of class "dgCMatrix"
             AAACATACAACCAC-1 AAACATTGAGCTAC-1 AAACATTGATCAGC-1
MIR1302-10                  .                .                .
FAM138A                     .                .                .
OR4F5                       .                .                .
RP11-34P13.7                .                .                .
RP11-34P13.8                .                .                .
             AAACCGTGCTTCCG-1 AAACCGTGTATGCG-1
MIR1302-10                  .                .
FAM138A                     .                .
OR4F5                       .                .
RP11-34P13.7                .                .
RP11-34P13.8                .                .

### Filter empty droplets

Use DropletUtils package to filter empty droplets.

In [57]:
# Create a SingleCellExperiment from the CellRanger output
droplet_raw <- read10xCounts(pbmc_file_path, col.names = TRUE)

# Count the number of non-zeros
non_zero <- sum(colSums(counts(droplet_raw)) > 0)

#C
empty <- emptyDrops(counts(droplet_raw), lower = 100)

ERROR: Error in .compute_ambient_stats(mat, totals, lower = lower): no counts available to estimate the ambient profile


**References**

1.	Tinnevelt GH, Wouters K, Postma GJ, Folcarelli R, Jansen JJ. High-throughput single cell data analysis - A tutorial. Anal Chim Acta [Internet]. 2021;1185(338872):338872. Available from: http://dx.doi.org/10.1016/j.aca.2021.338872
2.	Bendall SC, Simonds EF, Qiu P, Amir E-AD, Krutzik PO, Finck R, et al. Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science [Internet]. 2011;332(6030):687–96. Available from: http://dx.doi.org/10.1126/science.1198704
3.	Flores-Montero J, Sanoja-Flores L, Paiva B, Puig N, García-Sánchez O, Böttcher S, et al. Next Generation Flow for highly sensitive and standardized detection of minimal residual disease in multiple myeloma. Leukemia [Internet]. 2017;31(10):2094–103. Available from: http://dx.doi.org/10.1038/leu.2017.29
4.	Wang M, Song W-M, Ming C, Wang Q, Zhou X, Xu P, et al. Guidelines for bioinformatics of single-cell sequencing data analysis in Alzheimer’s disease: review, recommendation, implementation and application. Mol Neurodegener [Internet]. 2022;17(1):17. Available from: http://dx.doi.org/10.1186/s13024-022-00517-z
5.	Ramsköld D, Luo S, Wang Y-C, Li R, Deng Q, Faridani OR, et al. Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotechnol [Internet]. 2012;30(8):777–82. Available from: http://dx.doi.org/10.1038/nbt.2282
6.	Asagawa Y, Nikaido I, Hayashi T, Danno H, Uno KD, Imai T. Quartz- Seq: a highly reproducible and sensitive single-cell RNA sequencing method, reveals non-genetic gene-expression heterogeneity. Genome Biol. 2013;14.
7.	Hashimshony T, Wagner F, Sher N, Yanai I. CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification. Cell Rep [Internet]. 2012;2(3):666–73. Available from: http://dx.doi.org/10.1016/j.celrep.2012.08.003
8.	Hayashi T, Ozaki H, Sasagawa Y, Umeda M, Danno H, Nikaido I. Singlecell full-length total RNA sequencing uncovers dynamics of recursive splicing and enhancer RNAs. Nat Commun. 2018;9.
9.	Macosko EZ, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M, et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell [Internet]. 2015;161(5):1202–14. Available from: http://dx.doi.org/10.1016/j.cell.2015.05.002
10.	Cao J, Packer JS, Ramani V, Cusanovich DA, Huynh C, Daza R, et al. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science [Internet]. 2017;357(6352):661–7. Available from: http://dx.doi.org/10.1126/science.aam8940
11.	Rotem A, Ram O, Shoresh N, Sperling RA, Goren A, Weitz DA, et al. Single-cell ChIP-seq reveals cell subpopulations defined by chromatin state. Nat Biotechnol [Internet]. 2015;33(11):1165–72. Available from: http://dx.doi.org/10.1038/nbt.3383
12.	Picelli S, Faridani OR, Björklund AK, Winberg G, Sagasser S, Sandberg R. Full-length RNA-seq from single cells using Smart-seq2. Nat Protoc [Internet]. 2014;9(1):171–81. Available from: http://dx.doi.org/10.1038/nprot.2014.006
13.	Jaitin DA, Kenigsberg E, Keren-Shaul H, Elefant N, Paul F, Zaretsky I, et al. Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science [Internet]. 2014;343(6172):776–9. Available from: http://dx.doi.org/10.1126/science.1247651
14.	Andrews TS, Hemberg M. Identifying cell populations with scRNASeq. Mol Aspects Med [Internet]. 2018;59:114–22. Available from: http://dx.doi.org/10.1016/j.mam.2017.07.002
15.	Kolodziejczyk AA, Kim JK, Tsang JCH, Ilicic T, Henriksson J, Natarajan KN, et al. Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell [Internet]. 2015;17(4):471–85. Available from: http://dx.doi.org/10.1016/j.stem.2015.09.011
16.	Klein AM, Mazutis L, Akartuna I, Tallapragada N, Veres A, Li V, et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell [Internet]. 2015;161(5):1187–201. Available from: http://dx.doi.org/10.1016/j.cell.2015.04.044
17.	Zheng GXY, Terry JM, Belgrader P, Ryvkin P, Bent ZW, Wilson R, et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun [Internet]. 2017;8(1):14049. Available from: http://dx.doi.org/10.1038/ncomms14049
18.	Campbell JN, Macosko EZ, Fenselau H, Pers TH, Lyubetskaya A, Tenen D, et al. A molecular census of arcuate hypothalamus and median eminence cell types. Nat Neurosci [Internet]. 2017;20(3):484–96. Available from: http://dx.doi.org/10.1038/nn.4495
19.	Brennecke P, Anders S, Kim JK, Kołodziejczyk AA, Zhang X, Proserpio V, et al. Accounting for technical noise in single-cell RNA-seq experiments. Nat Methods [Internet]. 2013;10(11):1093–5. Available from: http://dx.doi.org/10.1038/nmeth.2645
20.	Marinov GK, Williams BA, McCue K, Schroth GP, Gertz J, Myers RM, et al. From single-cell to cell-pool transcriptomes: stochasticity in gene expression and RNA splicing. Genome Res [Internet]. 2014;24(3):496–510. Available from: http://dx.doi.org/10.1101/gr.161034.113
21.	Kharchenko PV, Silberstein L, Scadden DT. Bayesian approach to single- cell differential expression analysis. Nat Methods. 2014;11:740–2.
