-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathDune.Rmd
125 lines (92 loc) · 3.15 KB
/
Dune.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
---
title: "Vignette for Dune: merging clusters to improve replicability through ARI merging"
author: "Hector Roux de Bézieux"
output:
rmarkdown::html_document:
toc: true
toc_depth: 3
vignette: >
%\VignetteIndexEntry{Dune Vignette}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
# Installation
```{r, eval = FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("Dune")
```
We use a subset of the Allen Smart-Seq nuclei dataset. Run `?Dune::nuclei` for more details on pre-processing.
```{r setup}
suppressPackageStartupMessages({
library(RColorBrewer)
library(dplyr)
library(ggplot2)
library(tidyr)
library(knitr)
library(purrr)
library(Dune)
})
data("nuclei", package = "Dune")
theme_set(theme_classic())
```
# Initial visualization
We have a dataset of $1744$ cells, with the results from 3 clustering algorithms: Seurat3, Monocle3 and SC3. The Allen Institute also produce hand-picked cluster and subclass labels. Finally, we included the coordinates from a t-SNE representation, for visualization.
```{r}
ggplot(nuclei, aes(x = x, y = y, col = subclass_label)) +
geom_point()
```
We can also see how the three clustering algorithm partitioned the dataset initially:
```{r, fig.hold='hold', out.width="33%", fig.height=9}
walk(c("SC3", "Seurat", "Monocle"), function(clus_algo){
df <- nuclei
df$clus_algo <- nuclei[, clus_algo]
p <- ggplot(df, aes(x = x, y = y, col = as.character(clus_algo))) +
geom_point(size = 1.5) +
# guides(color = FALSE) +
labs(title = clus_algo, col = "clusters") +
theme(legend.position = "bottom")
print(p)
})
```
# Merging with **Dune**
## Initial ARI
The adjusted Rand Index between the three methods can be computed.
```{r}
plotARIs(nuclei %>% select(SC3, Seurat, Monocle))
```
As we can see, the ARI between the three methods is initially quite low.
## Actual merging
We can now try to merge clusters with the `Dune` function. At each step, the algorithm will print which clustering label is merged (by its number, so `1~SC3` and so on), as well as the pair of clusters that get merged.
```{r}
merger <- Dune(clusMat = nuclei %>% select(SC3, Seurat, Monocle), verbose = TRUE)
```
The output from `Dune` is a list with four components:
```{r}
names(merger)
```
`initialMat` is the initial matrix. of cluster labels. `currentMat` is the final matrix of cluster labels. `merges` is a matrix that recapitulates what has been printed above, while `ImpARI` list the ARI improvement over the merges.
## ARI improvement
We can now see how much the ARI has improved:
```{r}
plotARIs(clusMat = merger$currentMat)
```
The methods now look much more similar, as can be expected.
We can also see how the number of clusters got reduced.
```{r}
plotPrePost(merger)
```
For SC3 for example, we can visualize how the clusters got merged:
```{r}
ConfusionPlot(merger$initialMat[, "SC3"], merger$currentMat[, "SC3"]) +
labs(x = "Before merging", y = "After merging")
```
Finally, the __ARIImp__ function tracks mean ARI improvement as pairs of clusters get merged down.
```{r}
ARItrend(merger)
```
# Session
```{r}
sessionInfo()
```