-
Notifications
You must be signed in to change notification settings - Fork 12
/
quickstart.rmd
245 lines (162 loc) · 13.1 KB
/
quickstart.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
---
title: "Creating and plotting gTracks"
output: html_document
---
`gTrack` is a package that enables easy plotting of data across genomic intervals. This is a short tutorial that explains how gTracks can be created and plotted. For this example, we will use a *pyrgo* locus from the ovarian carcinoma cell line OVCAR-3.
In general, gTracks can be created from `GRanges`, `GRangesList`, and `Matrix` objects. In addition, we have written a some R packages ([gGnome](https://github.com/mskilab/gGnome) and [GxG](https://github.com/mskilab/GxG)) in which `R6` objects (namely `gGraph`, `gWalk`, and `gMatrix`) include a gTrack constructor (usually invoked by calling the `$gt` active field or `$gtrack` method).
## Setting Up Your Environment
Before proceeding with the tutorial you should first set up your environment. `gTrack` requires version 4.0.2 of R. Versions after this will not work. You can download the 4.0.2 `pkg` file (for MacOS) [here](https://cran.r-project.org/bin/macosx/base/), or the executable (for Windows) [here](https://cran.r-project.org/bin/windows/base/old/4.0.2/). You may also wish to install something like [RSwitch](https://rud.is/rswitch/) to easily switch between different R versions you have installed.
You will also need to install the [`gGnome`](https://github.com/mskilab/gGnome) package:
```{r include = TRUE, message = FALSE, warning = FALSE}
## allows dependencies that throw warnings to install
Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS = TRUE)
devtools::install_github("mskilab/gGnome")
library(gGnome)
```
Finally, you'll need to import `gTrack` AND `gUtils`:
```{r include = TRUE, message = FALSE, warning = FALSE}
project_path <- "~/projects/gTrack" ## this should be the path to your gTrack clone
devtools::load_all(project_path)
library(gTrack)
library(gUtils)
knitr::opts_chunk$set(fig.height = 10)
```
# GRanges
GRanges can be used as the starting point for creating scatter plots, bar plots, and line plots. The `x`-coordinate of each data point in these plots is specified by the genomic position, while the `y`-coordinate is stored in a user-defined metadata column of the `GRanges` object.
In this example, we will create `gTrack`s to plot read depth in 1 Kbp genomic intervals.
## Rectangles
The most basic plot in gTrack will represent each supplied genomic interval as a rectangle. If strand is provided, the vertical edges of the rectangle will be bent in a direction indicating strand (towards the right for `+` stranded intervals, and towards the left for `-` stranded intervals. This `GRanges` contains genomic bins that are all 1 Kbp in width without a strand. However, for the purposes of demonstration, we will also plot some stranded intervals to demonstrate the effect.
```{r include = TRUE, message = FALSE, warning = FALSE}
coverage.gr <- readRDS(system.file("extdata", "ovcar.subgraph.coverage.rds", package = "gTrack"))
## create a plus stranded interval
plus.coverage.gr <- readRDS(system.file("extdata", "ovcar.subgraph.coverage.rds", package = "gTrack"))
strand(plus.coverage.gr) <- "+"
minus.coverage.gr <- readRDS(system.file("extdata", "ovcar.subgraph.coverage.rds", package = "gTrack"))
strand(minus.coverage.gr) <- "-"
## create gTracks
coverage.gt <- gTrack(coverage.gr)
plus.coverage.gt <- gTrack(plus.coverage.gr)
minus.coverage.gt <- gTrack(minus.coverage.gr)
```
To generate a plot, the `plot` method is used, which takes two positional arguments, a `gTrack` and a genomic window over which to generate the plot. The genomic window can be specified as either a `GRanges`, `GRangesList`, or a character vector that can be parsed as `GRanges`. (If `window` is `NULL`, then all genomic regions defined in the supplied gTrack will be plotted).
Here is the original (unstranded) intervals:
```{r include = TRUE, message = FALSE, warning = FALSE, fig.height = 6}
## specify genomic region that will be plotted
fp <- parse.gr("1:6850000-7050000")
plot(coverage.gt, fp)
```
Here is the `+` stranded intervals:
```{r include = TRUE, message = FALSE, warning = FALSE, fig.height = 6}
## specify genomic region that will be plotted
plot(plus.coverage.gt, fp)
```
Here is the `-` stranded intervals:
```{r include = TRUE, message = FALSE, warning = FALSE, fig.height = 6}
## specify genomic region that will be plotted
plot(minus.coverage.gt, fp)
```
## Scatter plot
In this example, we will create a scatter plot. In this `GRanges`, the (normalized) read depth is contained in a metadata column called `cn`. We need to specify this column name in the argument `y.field` when creating our gTrack. To create a scatter plot, we need to set the parameter `circles` to `TRUE`. The parameter `lwd.border` controls the size of the points in the scatter plot, the parameter `y0` controls the starting position on the y axis, and the parameter `y1` controls the ending position on the y axis.
```{r include = TRUE, message = FALSE, warning = FALSE}
coverage.gr <- readRDS(system.file("extdata", "ovcar.subgraph.coverage.rds", package = "gTrack"))
coverage.gt <- gTrack(coverage.gr, y.field = "cn", circles = TRUE, lwd.border = 0.2, y0 = 0, y1 = 12)
```
```{r include = TRUE, message = FALSE, warning = FALSE}
## specify genomic region that will be plotted
fp <- parse.gr("1:6043576-7172800")
plot(coverage.gt, fp + 1e5)
```
## Bar plot
Next, we will plot the same data as a bar plot. To do this, we will set the parameter `bars` to `TRUE`.
```{r include = TRUE, message = FALSE, warning = FALSE}
coverage.bars.gt <- gTrack(coverage.gr, y.field = "cn", bars = TRUE, y0 = 0, y1 = 12)
plot(coverage.bars.gt, fp + 1e5)
```
## Line plot
Finally, we will plot these data as a line plot, by setting `lines` to `TRUE`.
```{r include = TRUE, message = FALSE, warning = FALSE}
coverage.lines.gt <- gTrack(coverage.gr, y.field = "cn", lines = TRUE, y0 = 0, y1 = 12)
plot(coverage.lines.gt, fp + 1e5)
```
## Multiple plots
A useful feature of `gTrack` objects is that multiple tracks can be concatenated to produce stacked subplots, as shown in the following example. The direction of concatenation is from the bottom up (so the first `gTrack` corresponds with the bottom-most subplot and the final `gTrack` corresponds with the top-most subplot).
```{r include = TRUE, message = FALSE, warning = FALSE}
concatenated.gt <- c(coverage.gt, coverage.bars.gt, coverage.lines.gt)
plot(concatenated.gt, fp + 1e5)
```
# GRangesList
A `gTrack` can also be created from a `GRangesList`. This is desirable when plotting ranges that are grouped together in some way, such as alignments deriving from a single read pair.
## Unordered GRangesList (default)
In this example, we will create a gTrack from junction-supporting read pairs. Briefly, these are read pairs that form split, gapped, or discordant alignments, hinting at the existence of a genomic rearrangement.
The alignments associated with each read pair are represented by a single entry in this example's `GRangesList`. In the corresponding gTrack, alignments from the same `GRangesList` entry are linked together by a light gray horizontal line, making it easy for them to be visually associated.
You can see that there are many junction-supporting reads associated with read depth change points in the coverage `gTrack` which makes sense because aberrant adjacencies can produce copy number variants.
```{r include = TRUE, message = FALSE, warning = FALSE}
reads <- readRDS(system.file("extdata", "ovcar.subgraph.reads.rds", package = "gTrack"))
reads.gt <- gTrack(reads)
plot(c(coverage.gt, reads.gt), fp + 1e5)
```
## Ordered GRangesList (draw.paths)
Sometimes it is desirable to preserve the ordering of segments within each GRanges in a GRangesList. For instance, if each GRanges represents a rearranged somatic haplotype, it is helpful to visualize the exact order of a rearranged segment. This can be done by setting the parameter `draw.paths` to `TRUE`. Notice the difference between the plots when this parameter is set!
```{r include = TRUE, message = FALSE, warning = FALSE}
## create GRangesList for plotting
wks.grl <- readRDS(file.path(project_path, "inst/extdata/ovcar.subgraph.walks.rds"))$grl
fp <- parse.gr("1:6043576-7172800")
paths.gt <- gTrack(wks.grl, draw.paths = TRUE, stack.gap = 1e7, name = "paths")
nopaths.gt <- gTrack(wks.grl, draw.paths = FALSE, stack.gap = 1e7, name = "no paths")
plot(c(nopaths.gt, paths.gt), fp + 5e4)
```
# Matrices
## Heatmaps (mdata)
A `gTrack` can be created from a `GRanges` and a corresponding adjacency matrix to plot associations between two genomic ranges. One example of this would be a heatmap, where the color of each cell is proportional to some value defined by its corresponding genomic regions.
In this example, we will create a heatmap of the number of shared read `qname`s between pairs of genomic intervals. We will read a `GRanges` and associated `matrix` and create a `gTrack` from these inputs, which we will plot alongside the coverage.
As you can see, the off-diagonal elements correspond with copy number change points in the coverage!
```{r include = TRUE, message = FALSE, warning = FALSE}
mdata.mat <- readRDS(system.file("extdata", "ovcar.subgraph.mdata.mat.rds", package = "gTrack"))
mdata.gr <- readRDS(system.file("extdata", "ovcar.subgraph.mdata.gr.rds", package = "gTrack"))
heatmap.gt <- gTrack(mdata.gr, mdata = mdata.mat, cmap.max = 10)
plot(c(coverage.gt, heatmap.gt), fp + 1e5)
```
## Connections (edges)
Instead of a heatmap, it is also possible to plot the edges between genomic intervals by supplying an adjacency list. In this next example, we will plot edges associated with the off-diagonal squares in the previous heatmap.
```{r include = TRUE, message = FALSE, warning = FALSE}
edges.dat <- readRDS(system.file("extdata", "ovcar.subgraph.edges.dat.rds", package = "gTrack"))
edges.gr <- readRDS(system.file("extdata", "ovcar.subgraph.edges.gr.rds", package = "gTrack"))
edges.gt <- gTrack(edges.gr, edges = edges.dat)
plot(c(coverage.gt, edges.gt), fp + 1e5)
```
# gGraph
`gGraph`s are genome graphs in which nodes represent (signed) genomic intervals, and edges represent adjacencies between those intervals (see our `gGnome` package [here](https://github.com/mskilab/gGnome)!).
A `gTrack` can be created from a `gGraph` using either the `$gt` active field or `$gtrack` method, as shown below. In this example, we will create a plot of a tumor `gGraph` and the associated coverage profile.
```{r include = TRUE, message = FALSE, warning = FALSE}
gg <- readRDS(system.file("extdata", "ovcar.subgraph.rds", package = "gTrack"))
plot(c(coverage.gt, gg$gt), fp + 1e5)
```
# gWalk
`gWalk`s represent paths through a `gGraph`. Generally, they represent somatic haplotypes that could exist in a genome corresponding with a given `gGraph` (more details in the `gGnome` package!). Like `gGraph`s, `gWalk`s have active field `$gt` and method `$gtrack`, both of which will produce a `gTrack`.
The following example plots a set of `gWalk`s associated with the `gGraph` plotted above.
```{r include = TRUE, message = FALSE, warning = FALSE}
wks <- readRDS(system.file("extdata", "ovcar.subgraph.walks.rds", package = "gTrack"))
plot(c(coverage.gt, gg$gt, wks$gt), fp + 1e5)
```
# gMatrix
`gMatrix` is an object implemented in the package `GxG` that facilitates analysis and visualization of paired genomic intervals. There are many use cases for `gMatrix`, but the one used in this example is Hi-C data, which consists of read counts shared by pairs of genomic bins. We will plot the Hi-C profile associated with this locus.
Again, similar to `gWalk` and `gGraph`, a `gTrack` can be created for a `gMatrix` with the active field `$gt` or method `$gtrack`. Here we will use the `$gtrack` method to set a parameter for coloring the heatmap (`cmap.max`).
```{r include = TRUE, message = FALSE, warning = FALSE}
gm <- readRDS(system.file("extdata", "ovcar.subgraph.hic.rds", package = "gTrack"))
plot(c(coverage.gt, gg$gt, gm$gtrack(cmap.max = 1000)), fp + 1e5)
```
# Karyogram
You can plot karyograms in gTrack. If no file is passed to the `karyogram` method as an argument, it will use the karyogram datafiles (hg18 or hg19, depending on if the `hg19` parameter is set to `FALSE`) included with the package. You can specify if you colored giemsa bands by setting the `bands` parameter to `TRUE` or `FALSE` (enabled by default). You can also return the chromosome arms with different colors, and with centromeres and telemeres marked, by setting `arms = TRUE` (also enabled by default)
Here we are plotting the default assembly, hg18, with the bands and arms.
```{r include = TRUE, message = FALSE, warning = FALSE}
karyogram_fp <- parse.gr("1:1-200000000")
karyogram_gt_hg18 <- karyogram(hg19 = FALSE)
plot(karyogram_gt_hg18, karyogram_fp)
```
# Gencode Tracks
gTrack can also plot a gencode gene track using the `track.gencode` method. This method will download the specified build (indicated by the `build` parameter) from [mskilab.com](mskilab.com). You can filter genes by supplying a character vector to the `grep` parameter, the `grepe` parameter (to exclude genes), or the `genes` paramter (to limit the gTrack to only those genes).
```{r include = TRUE, message = FALSE, warning = FALSE}
gencode_fp <- parse.gr("1:6000000-6000100")
gencode_gt <- track.gencode(grep = "NPH", grepe = "S2")
expect_error(plot(gencode_gt, gencode_fp + 1e5), NA)
```