-
Notifications
You must be signed in to change notification settings - Fork 1
/
seriation.Rmd
177 lines (126 loc) · 4.95 KB
/
seriation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
---
title: "Ordering Objects using Seriation in R"
date: "`r Sys.Date()`"
output:
workflowr::wflow_html:
toc: true
---
```{r setup, include=FALSE}
library(tidyverse)
library(patchwork)
library(dendextend)
knitr::opts_chunk$set(echo = TRUE)
```
## Introduction
From the [seriation R package](https://github.com/mhahsler/seriation).
>Seriation arranges a set of objects into a linear order given available data with the goal of revealing structural information. This package provides the infrastructure for ordering objects with an implementation of many seriation/sequencing/ordination techniques to reorder data matrices, dissimilarity matrices, correlation matrices, and dendrograms (see below for a complete list). The package provides several visualizations (grid and ggplot2) to reveal structural information, including permuted image plots, reordered heatmaps, Bertin plots, clustering visualizations like dissimilarity plots, and visual assessment of cluster tendency plots (VAT and iVAT).
## Installation
Install stable CRAN version.
```{r install}
if(! "seriation" %in% installed.packages()[, 1]){
install.packages("seriation", repos = c("https://mhahsler.r-universe.dev", "https://cloud.r-project.org/"))
}
library(seriation)
packageVersion("seriation")
```
## Getting started
Use the [example dataset](https://github.com/mhahsler/seriation#usage) `SupremeCourt`, which:
>Contains a (a subset of the) decisions for the stable 8-yr period 1995-2002 of the second Rehnquist Supreme Court. Decisions are aggregated to the joint probability for disagreement between judges.
```{r supreme_court}
data("SupremeCourt")
SupremeCourt
```
Convert to distance matrix.
```{r distance_matrix}
d <- as.dist(SupremeCourt)
d
```
Perform the default seriation method to reorder the objects.
```{r default_seriation}
my_order <- seriate(d)
get_order(my_order)
```
Plot heatmap.
```{r heatmap, message=FALSE, warning=FALSE}
p1 <- ggpimage(d, upper_tri = TRUE) +
ggtitle("Judges (original order)")
p2 <- ggpimage(d, my_order, upper_tri = TRUE) +
ggtitle("Judges (seriation order)")
p1 + p2 & scale_fill_gradientn(colours = c("darkgrey", "skyblue"))
```
Return linear configuration where more similar objects are located closer to each other.
```{r get_config}
sort(get_config(my_order))
```
Plot linear configuration.
```{r plot_config}
plot_config(my_order)
```
Hierarchical cluster with average linkage.
```{r hclust}
plot(hclust(d, method = "average"))
```
## Heatmaps with seriation
The `Wood` dataset consists of:
>A data matrix containing a sample of the normalized gene expression data for 6 locations in the stem of Popla trees published in the study by Herzberg et al (2001). The sample of 136 genes selected by Caraux and Pinloche (2005).
```{r wood}
data("Wood")
dim(Wood)
```
Check out `Wood`.
```{r wood_glimpse}
head(Wood)
```
Methods of interest for heatmaps are dendrogram leaf order-based methods applied to rows and columns. This is done using `method = "heatmap"`. The actual seriation method can be passed on as parameter seriation_method, but it has a suitable default if it is omitted.
```{r wood_order}
wood_hc_complete <- seriate(Wood, method = "Heatmap", seriation_method = "HC_complete")
wood_olo_complete <- seriate(Wood, method = "Heatmap", seriation_method = "OLO_complete")
get_order(wood_olo_complete)
```
Ignore the numbers of the order above; they indicate the index of the gene in `Wood`. `AI165492` and `AI166057` are the most similar to each other. If I use those values in `Wood`, I get their expression data.
```{r index_wood}
Wood[c(106, 128), ]
```
We can clearly see the similar expression patterns for these subset of genes.
```{r plot_eg}
Wood[get_order(wood_olo_complete)[1:6], ] |>
as.data.frame() |>
tibble::rownames_to_column('gene') |>
tidyr::pivot_longer(-gene) |>
ggplot(data = _, aes(name, value, group = gene, colour = gene)) +
geom_line() +
geom_point() +
theme_minimal() +
labs(title = "Genes with similar expression patterns", y = "Normalised expression", x = "location")
```
`wood_olo_complete` is also a `hclust`.
```{r wood_olo_class}
class(wood_olo_complete[[1]])
```
Plot hierarchical clustering result.
```{r fig.height=20, fig.width=14}
as.dendrogram(wood_olo_complete[[1]]) %>%
plot(horiz = TRUE)
```
Gene order of the dendrogram matches the order produced by seriation using OLO complete.
```{r gene_order}
o <- wood_olo_complete[[1]]$order
identical(wood_olo_complete[[1]]$labels[o], names(get_order(wood_olo_complete)))
```
Location ordering.
```{r comp_order}
get_order(wood_hc_complete, 2)
get_order(wood_olo_complete, 2)
```
Heatmap.
```{r wood_heatmap, message=FALSE, warning=FALSE}
p1 <- ggpimage(Wood) +
ggtitle("Wood (no order)") +
theme(legend.position = "none")
p2 <- ggpimage(Wood, wood_hc_complete) +
ggtitle("Wood (HC complete)") +
theme(legend.position = "none")
p3 <- ggpimage(Wood, wood_olo_complete) +
ggtitle("Wood (OLO complete)")
p1 + p2 + p3 & scale_fill_gradientn(colours = c("skyblue", "red"))
```