-
Notifications
You must be signed in to change notification settings - Fork 14
/
clustRviz.Rmd
155 lines (121 loc) · 4.28 KB
/
clustRviz.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
title: "clustRviz Quick Start"
author:
- name: Michael Weylandt
affiliation: Department of Statistics, Rice University
email: michael.weylandt@rice.edu
- name: John Nagorski
affiliation: Department of Statistics, Rice University
- name: Genevera I. Allen
affiliation: |
| Departments of Statistics, Computer Science, and Electical and Computer Engineering, Rice University
| Jan and Dan Duncan Neurological Research Institute, Baylor College of Medicine
email: gallen@rice.edu
date: "Last Updated: August 19th, 2020"
output:
html_document:
toc: true
toc_float:
collapsed: false
bibliography: vignettes.bib
vignette: >
%\VignetteIndexEntry{clustRviz Quick Start}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
eval=TRUE,
message = FALSE
)
```
\renewcommand{\vec}[1]{\boldsymbol{#1}}
## Introduction
This vignette provides a brief introduction to the `clustRviz` package,
describing how to use the main entry points `CARP` and `CBASS` and providing
a quick overview of the rich built-in graphics functionality. For more details
on graphics, weight selection, or the computational algorithms used, please
see the other package vignettes.
## Clustering
`clustRviz` implements the *convex* clustering formulation popularized by
Hocking *et al.* [-@Hocking:2011] and uses the path-wise algorithms of
Weylandt, Nagorski, and Allen [-@Weylandt:2019] to support full path
computation and dendrogram construction. This allows convex clustering to
produce `hclust`-style dendrograms while maintaining its statistical and computational
advantages.
The main entry point for clustering is the `CARP` function, which implements
the **Clustering via Algorithmic Regularization Paths** proposed by
Weylandt, Nagorski, and Allen [-@Weylandt:2019]. We can use it on the built-in
`presidential_speech` data set:
```{r}
library(clustRviz)
carp_fit <- CARP(presidential_speech)
print(carp_fit)
```
As can be seen, this provides a full path in only a few seconds. From this,
we can produce a variety of attractive plots, including dendrograms
```{r}
plot(carp_fit, type = "dendrogram")
```
one-way heatmaps
```{r}
plot(carp_fit, type = "heatmap")
```
and regularization paths
```{r}
plot(carp_fit, type = "path")
```
For each plot type, interactive and dynamic versions are also supported: for example,
```{r}
plot(carp_fit, type = "dendrogram", dynamic = TRUE)
```
By default, the entire path is shown, but it is possible to obtain specific solutions
by specifying the `k` or `percent` arguments to plot.
```{r}
plot(carp_fit, k = 3)
```
To work with the clustering solutions directly, the `get_cluster_labels`, `get_clustered_data`,
or `get_cluster_centroids` functions may be useful.
## Bi-Clustering
Chi *et al* [-@Chi:2017] proposed a convex formulation of *biclustering* for which
Weylandt [-@Weylandt:2019b] later proposed an efficient ADMM algorithm. This ADMM
was adapted into the **CBASS** - *Convex Biclustering via Algorithmic Regularization
with Small Steps* algorithm. `clustRviz` exposes an implementation of this algorithm
via the function of the same name.
```{r}
library(clustRviz)
cbass_fit <- CBASS(presidential_speech)
print(cbass_fit)
```
As can be seen, this provides a full path in only a few seconds. In general, the
bi-clustering problem is a bit slower than the standard clustering problem but still
highly efficient. From this, we can produce a variety of attractive plots,
including row- and column-wise dendrograms
```{r}
plot(cbass_fit, type = "row.dendrogram")
```
```{r}
plot(cbass_fit, type = "col.dendrogram")
```
row- and columnwise regularization paths
```{r}
plot(cbass_fit, type = "row.path")
```
and the traditional two-way cluster heatmap
```{r}
plot(cbass_fit, type = "heatmap")
```
As before, interactive and dynamic versions are also supported: for example,
```{r}
plot(cbass_fit, type = "heatmap", dynamic = TRUE)
```
Because `CBASS` clusters rows and columns simultaneously, when specifying
cluster numbers, it is necessary to distinguish between row and column clusters
```{r}
plot(cbass_fit, k.row = 3)
```
This is only a brief demonstration of the capabilities of the `clustRviz` package
- see the other vignettes for more!
## References