## Tools for scRNA-Seq

-   scanpy
-   Monocle
-   Bioconductor
-   Seurat

# Differential Expression

## Methods

-   Preprocessing: data import, QC, quantification
-   Normalization: dropout, depth, batch effects
-   Dimensionality Reduction: PCA, UMAP
-   Clustering: cell type clusters
-   Differential Gene Expression

## Seurat

::: columns
::: {.column width="50%"}
-   Most used tool

-   Good documentation

-   Several tutorials

-   Many methods

-   Extendable

-   Gene Expression Dynamics
:::

::: {.column width="50%"}
![](umap.png)
:::
:::

## Statistical Analysis of Differences Between Clusters

-   Different types of hits
    -   Quantitatively significant between clusters
    -   Qualitatively different (predictive) of cluster membership
-   Different types of markers
    -   Global: Distinguish one cluster from all of the rest of the data
    -   Local: Distinguish one cluster from another defined set of clusters

## Statistical Analysis of Differences Between Clusters

-   Often filter genes based on coverage in the set or the size of groups
-   Several choices of methods to identify genes

## Methods

-   Non-parametric: Wilcoxon rank sum test
-   Parametric: t-test, negative binomial
-   Classification: ROC
-   Specialized: MAST

```         
FindMarkers(data, ident.1 = "g1", ident.2 = "g2", group.by = "status", test.use = "roc", only.pos = TRUE)
```

## 

![](methods.png){fig-align="center"}

# Cell Type Annotation

## Why annotate cell types?

* Interpreting the findings of our analysis is the most difficult task in sc-data analysis
* Understanding the biological state of each cluster is way harder then assigning clusters
* To do this, we need to “connect” our dataset to existing knowledge
* One strategy is to compare the expression of our dataset to the expressions of curated existing datasets (references)
* What tool do we use? **SingleR**

## Cell Type Annotation

* **SingleR** pkg contains the statistical method for assignment
* **celldex** pkg shares several reference (well curated) datasets
* Most references are built from bulk RNA-seq and microarray
* They are good enough for annotation of sc-data, provided that the references contains the cell types that are expected to be present on the test data
* We'll use a reference built from Blueprint and ENCODE data
* Single-cell references can also be used (eg. `MuraroPancreasData()`)

## How to perform annotation?

```
## Load the references
library(celldex)
ref = BlueprintEncodeData()

## Compare expression levels from my.data to the reference
library(SingleR)
pred = SingleR(test = my.data, ref = ref, labels = ref$label.main)

table(pred$labels)
```


```{css}
code.sourceCode {
  font-size: 1.3em;
  /* or try font-size: xx-large; */
}
```