reduced-dimensions.Rmd

---
bibliography: ref.bib
---

# Plotting reduced dimensions

```{r, echo=FALSE, results='asis'}
rebook::chapterPreamble()
```

## Foreword 

When one thinks of single-cell data analysis, one thinks of $t$-SNEs.
(Or maybe UMAPs, for the younger folks.)
Indeed, hardly a single-cell paper comes out these days without a $t$-SNE or UMAP plot in it somewhere.
I would guess that their popularity stems from the entrancing illusion that a viewer is looking at "raw data",
unsullied by the authors' interpretation.
Indeed, I personally prefer $t$-SNEs as - with the right parameters - they can be made to look like histology images,
while UMAPs look more like the aftermath of blowing your nose.
But hey, to each their own.

## Setting up the data

Here we'll use one of the many PBMC datasets generated by 10X Genomics [@zheng2017massively].
One can only imagine how many times these PBMCs have been re-analyzed,
surely there's nothing we don't know about these cells anymore.

```{r}
library(DropletTestFiles)
fpath <- getTestFile(file.path("tenx-3.0.0-pbmc_10k_protein_v3", 
    "1.0.0/filtered.tar.gz"), prefix=TRUE)
tmp <- tempfile()
untar(fpath, exdir=tmp)

library(DropletUtils)
sce <- read10xCounts(file.path(tmp, "filtered_feature_bc_matrix"))
sce <- splitAltExps(sce, rowData(sce)$Type) # splitting off the ADTs.
sce
```

We slap together a quick-and-dirty analysis with some of the usual packages [@amezquita2020orchestrating].
Normally we would make some more diagnostic plots for each of these steps,
but it would not do to distract from the star of the show in this chapter.

```{r}
library(scater)
library(scran)
library(scuttle)
library(bluster)

# Quality control on the mitochondrial content:
qcstats <- perCellQCMetrics(sce, subsets=list(Mito=grep("^MT-", rowData(sce)$Symbol)))
discard <- isOutlier(qcstats$subsets_Mito_percent, type="higher")
sce <- sce[,!discard]
summary(discard)

# Normalization and feature selection:
sce <- logNormCounts(sce)
dec <- modelGeneVarByPoisson(sce)
hvgs <- getTopHVGs(dec, n=4000)

# PCA and some clustering:
set.seed(117)
sce <- runPCA(sce, subset_row=hvgs)
colLabels(sce) <- clusterRows(reducedDim(sce), NNGraphParam())

# Creating those sweet, sweet t-SNEs.
set.seed(118)
sce <- runTSNE(sce, dimred="PCA")
sce <- runUMAP(sce, dimred="PCA")
sce
```

We can see that the dimensionality reduction results are tucked away in the `reducedDims()` of the `sce` object.
If you want to get in on the action, you can pull out the reduced dimensions yourself for plotting:

```{r}
head(reducedDim(sce, "TSNE"))
```

## Visualizing with `r Biocpkg("scater")`

We'll start with the visualizations from the `r Biocpkg("scater")` package [@mccarthy2017scater].
The `plotTSNE()` function creates `ggplot` objects with various aesthetics, as shown in Figure \@ref(fig:scater-tsne-color-label).
Here, we have coloured each cell by its assigned label, throwing in the label name for good measure.

```{r scater-tsne-color-label, fig.cap="$t$-SNE coloured by the cluster label."}
plotTSNE(sce, colour_by="label", text_by="label")
```

Alternatively, we might color each cell by the expression of a particular gene, 
using the neon 80's-throwback color scheme that is `r CRANpkg("viridis")` (Figure \@ref(fig:scater-tsne-color-cd3)).
We set `swap_rownames=` to allow us to refer to genes using symbols rather than the more obtuse Ensembl identifiers.

```{r scater-tsne-color-cd3, fig.cap="$t$-SNE coloured by the expression of _CD3E_."}
plotTSNE(sce, colour_by="CD3E", swap_rownames="Symbol", 
    text_by="label", text_colour="red")
```

This function (and its sister `plotUMAP()`) use the same `plotReducedDim()` function under the hood.
We can generate plots for any dimensionality reduction result in `sce` by calling `plotReducedDim()` directly,
passing in the name of the result that we wish to visualize (Figure \@ref(fig:scater-umap-color-label)).

```{r scater-umap-color-label, fig.cap="UMAP coloured by the cluster label."}
plotReducedDim(sce, "UMAP", colour_by="label", text_by="label")
```

And of course, this is all `ggplot`-based, so you can just add on your own layers to customize the aesthetics.
Don't like viridis?
(Does anybody? Ho ho ho.)
You can swap it out with a color scale of your choice using the relevant `r CRANpkg("ggplot2")` functions,
as shown in Figure \@ref(fig:scater-umap-color-cd3) with a grey-orange gradient for _CD3E_ expression.

```{r scater-umap-color-cd3, fig.cap="UMAP coloured by the expression of _CD3E_."}
plotUMAP(sce, colour_by="CD3E", swap_rownames="Symbol") +
    scale_color_gradient(low="grey", high="orange")
```

In fact, you can tell `plotTSNE()` and `plotUMAP()` to carry over any `colData()` field from the `SingleCellExperiment`.
These fields are inserted into the `data.frame` and can be used in any `r CRANpkg("ggplot2")`-compatible function.
In Figure \@ref(fig:scater-tsne-color-cd3-facet), we bin each cell based on its size factor to create the `category` field,
and we use those bins for faceting the plot generated by `plotTSNE()`.

```{r scater-tsne-color-cd3-facet, fig.width=10, fig.height=5, fig.cap="$t$-SNE coloured by the cluster identity and faceted by the log-size factor after binning."}
sce$category <- cut(log(sizeFactors(sce)), 3)
plotTSNE(sce, colour_by="label", other_fields="category") +
    facet_wrap('category')
```

While most customization can be performed by adding further `ggplot`-related layers,
we can also tinker with other aesthetic parameters in the `plotTSNE()` call.
For example, we change the point size and axis labels while also removing the legend in Figure \@ref(fig:scater-tsne-other).
Further options can be found by following the breadcrumbs in the documentation for `?plotTSNE`. 

```{r}
plotTSNE(sce, colour_by="label", point_size=0.25, add_legend=FALSE, 
    label_format="%s dimension %i")
```

I could go on, but I won't.

## Session information {-}

```{r, echo=FALSE, results="asis"}
rebook::prettySessionInfo()
```