/
reduced-dimensions.Rmd
146 lines (114 loc) · 5.82 KB
/
reduced-dimensions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
bibliography: ref.bib
---
# Plotting reduced dimensions
```{r, echo=FALSE, results='asis'}
rebook::chapterPreamble()
```
## Foreword
When one thinks of single-cell data analysis, one thinks of $t$-SNEs.
(Or maybe UMAPs, for the younger folks.)
Indeed, hardly a single-cell paper comes out these days without a $t$-SNE or UMAP plot in it somewhere.
I would guess that their popularity stems from the entrancing illusion that a viewer is looking at "raw data",
unsullied by the authors' interpretation.
Indeed, I personally prefer $t$-SNEs as - with the right parameters - they can be made to look like histology images,
while UMAPs look more like the aftermath of blowing your nose.
But hey, to each their own.
## Setting up the data
Here we'll use one of the many PBMC datasets generated by 10X Genomics [@zheng2017massively].
One can only imagine how many times these PBMCs have been re-analyzed,
surely there's nothing we don't know about these cells anymore.
```{r}
library(DropletTestFiles)
fpath <- getTestFile(file.path("tenx-3.0.0-pbmc_10k_protein_v3",
"1.0.0/filtered.tar.gz"), prefix=TRUE)
tmp <- tempfile()
untar(fpath, exdir=tmp)
library(DropletUtils)
sce <- read10xCounts(file.path(tmp, "filtered_feature_bc_matrix"))
sce <- splitAltExps(sce, rowData(sce)$Type) # splitting off the ADTs.
sce
```
We slap together a quick-and-dirty analysis with some of the usual packages [@amezquita2020orchestrating].
Normally we would make some more diagnostic plots for each of these steps,
but it would not do to distract from the star of the show in this chapter.
```{r}
library(scater)
library(scran)
library(scuttle)
library(bluster)
# Quality control on the mitochondrial content:
qcstats <- perCellQCMetrics(sce, subsets=list(Mito=grep("^MT-", rowData(sce)$Symbol)))
discard <- isOutlier(qcstats$subsets_Mito_percent, type="higher")
sce <- sce[,!discard]
summary(discard)
# Normalization and feature selection:
sce <- logNormCounts(sce)
dec <- modelGeneVarByPoisson(sce)
hvgs <- getTopHVGs(dec, n=4000)
# PCA and some clustering:
set.seed(117)
sce <- runPCA(sce, subset_row=hvgs)
colLabels(sce) <- clusterRows(reducedDim(sce), NNGraphParam())
# Creating those sweet, sweet t-SNEs.
set.seed(118)
sce <- runTSNE(sce, dimred="PCA")
sce <- runUMAP(sce, dimred="PCA")
sce
```
We can see that the dimensionality reduction results are tucked away in the `reducedDims()` of the `sce` object.
If you want to get in on the action, you can pull out the reduced dimensions yourself for plotting:
```{r}
head(reducedDim(sce, "TSNE"))
```
## Visualizing with `r Biocpkg("scater")`
We'll start with the visualizations from the `r Biocpkg("scater")` package [@mccarthy2017scater].
The `plotTSNE()` function creates `ggplot` objects with various aesthetics, as shown in Figure \@ref(fig:scater-tsne-color-label).
Here, we have coloured each cell by its assigned label, throwing in the label name for good measure.
```{r scater-tsne-color-label, fig.cap="$t$-SNE coloured by the cluster label."}
plotTSNE(sce, colour_by="label", text_by="label")
```
Alternatively, we might color each cell by the expression of a particular gene,
using the neon 80's-throwback color scheme that is `r CRANpkg("viridis")` (Figure \@ref(fig:scater-tsne-color-cd3)).
We set `swap_rownames=` to allow us to refer to genes using symbols rather than the more obtuse Ensembl identifiers.
```{r scater-tsne-color-cd3, fig.cap="$t$-SNE coloured by the expression of _CD3E_."}
plotTSNE(sce, colour_by="CD3E", swap_rownames="Symbol",
text_by="label", text_colour="red")
```
This function (and its sister `plotUMAP()`) use the same `plotReducedDim()` function under the hood.
We can generate plots for any dimensionality reduction result in `sce` by calling `plotReducedDim()` directly,
passing in the name of the result that we wish to visualize (Figure \@ref(fig:scater-umap-color-label)).
```{r scater-umap-color-label, fig.cap="UMAP coloured by the cluster label."}
plotReducedDim(sce, "UMAP", colour_by="label", text_by="label")
```
And of course, this is all `ggplot`-based, so you can just add on your own layers to customize the aesthetics.
Don't like viridis?
(Does anybody? Ho ho ho.)
You can swap it out with a color scale of your choice using the relevant `r CRANpkg("ggplot2")` functions,
as shown in Figure \@ref(fig:scater-umap-color-cd3) with a grey-orange gradient for _CD3E_ expression.
```{r scater-umap-color-cd3, fig.cap="UMAP coloured by the expression of _CD3E_."}
plotUMAP(sce, colour_by="CD3E", swap_rownames="Symbol") +
scale_color_gradient(low="grey", high="orange")
```
In fact, you can tell `plotTSNE()` and `plotUMAP()` to carry over any `colData()` field from the `SingleCellExperiment`.
These fields are inserted into the `data.frame` and can be used in any `r CRANpkg("ggplot2")`-compatible function.
In Figure \@ref(fig:scater-tsne-color-cd3-facet), we bin each cell based on its size factor to create the `category` field,
and we use those bins for faceting the plot generated by `plotTSNE()`.
```{r scater-tsne-color-cd3-facet, fig.width=10, fig.height=5, fig.cap="$t$-SNE coloured by the cluster identity and faceted by the log-size factor after binning."}
sce$category <- cut(log(sizeFactors(sce)), 3)
plotTSNE(sce, colour_by="label", other_fields="category") +
facet_wrap('category')
```
While most customization can be performed by adding further `ggplot`-related layers,
we can also tinker with other aesthetic parameters in the `plotTSNE()` call.
For example, we change the point size and axis labels while also removing the legend in Figure \@ref(fig:scater-tsne-other).
Further options can be found by following the breadcrumbs in the documentation for `?plotTSNE`.
```{r}
plotTSNE(sce, colour_by="label", point_size=0.25, add_legend=FALSE,
label_format="%s dimension %i")
```
I could go on, but I won't.
## Session information {-}
```{r, echo=FALSE, results="asis"}
rebook::prettySessionInfo()
```