/
gsva.Rmd
538 lines (396 loc) · 23.2 KB
/
gsva.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
---
title: "GSVA: gene set variation analysis"
date: "`r Sys.Date()`"
output:
workflowr::wflow_html:
toc: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Background
Notes from the paper [GSVA: gene set variation analysis for microarray and RNA-Seq data](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-7).
> Gene Set Enrichment (GSE) analyses begin by obtaining a ranked gene list, typically derived from a microarray experiment that studies gene expression changes between two groups.
>
> The genes are then mapped into predefined gene sets and their gene expression statistic is summarized into a single enrichment score for each gene set.
>
> Many methodological variations of GSE methods have been proposed, including non-parametric enrichment statistics, battery testing, and focused gene set testing.
>
> Battery testing methods aim at identifying gene sets standing out from a large collection of annotated pathways and gene signatures.
>
> Focused gene set testing methods try to carefully evaluate a few gene sets that are relevant to the experiment being analyzed.
>
> An important distinction among many of the GSE methods is the definition of the null hypothesis that is tested. The null hypothesis of a competitive test declares that there are no differences between genes inside and outside the gene set.
>
> A self-contained test defines its null hypothesis only in terms of the genes inside the gene set being tested. More concretely, for a self-contained test on a gene set, the differential expression of just one of its genes allows one to reject the null hypothesis of no differential expression for that gene set. It follows, that self-contained tests provide higher power than competitive tests to detect subtle changes of expression in a gene set. But they may not be useful to single out a few gene sets in a battery testing setting because of the potentially large number of reported results.
>
> Finally, many GSE methods assume two classes (e.g. case/control) and evaluate enrichment within this context.
>
> The limits imposed by this assumption become evident with the rise of large genomic studies, such as The Cancer Genome Atlas project (TCGA - http://cancergenome.nih.gov), an ambitious project with the goal to identify the molecular determinants of multiple cancer types. In contrast to case-control studies with small sample sizes, the TCGA project has large patient cohorts with multiple phenotypes, structured with hierarchical, multi-class, and censored data. Hence, GSE methods are needed that can assess pathway variation across large, heterogeneous populations with complex phenotypic traits.
>
> To address these challenges, we present a non-parametric, unsupervised method called Gene Set Variation Analysis (GSVA).
>
> GSVA calculates sample-wise gene set enrichment scores as a function of genes inside and outside the gene set, analogously to a competitive gene set test. Further, it estimates variation of gene set enrichment over the samples independently of any class label.
>
> Conceptually, this methodology can be understood as a change in coordinate systems for gene expression data, from genes to gene sets. This transformation facilitates post-hoc construction of pathway-centric models, such as differential pathway activity identification or survival prediction. Further, we demonstrate the flexibility of GSVA by applying it to RNA-seq data.
## Implementation
The GSVA method requires two main inputs:
1. A matrix $X = {\{x_{ij}\}}_{p \times n}$ of normalised expression values for $p$ genes and $n$ samples and
2. A collection of gene sets $\Gamma = \{\gamma_1,\ldots,\gamma_m\}$.
![Figure 1](assets/gsva.jpg)
Step 1. Kernel estimation of the cumulative density function (kcdf). The two plots show two simulated expression profiles mimicking 6 samples from microarray and RNA-seq data. The x-axis corresponds to expression values where each gene is lowly expressed in the four samples with lower values and highly expressed in the other two. The scale of the kcdf is on the left y-axis and the scale of the Gaussian and Poisson kernels is on the right y-axis.
> [In statistics](https://en.wikipedia.org/wiki/Kernel_density_estimation), kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on kernels as weights.
Step 2. The expression-level statistic is rank ordered for each sample. This is to bring distinct expression profiles to a common scale.
Step 3. For every gene set, the Kolmogorov-Smirnov-like rank statistic is calculated. The plot illustrates a gene set consisting of 3 genes out of a total number of 10 with the sample-wise calculation of genes inside and outside of the gene set.
Step 4. The GSVA enrichment score is either the maximum deviation from zero (top) or the difference between the two sums (bottom). The two plots show two simulations of the resulting scores under the null hypothesis of no gene expression change. The output of the algorithm is a matrix containing pathway enrichment scores for each gene set and sample.
## Vignette
Following the [vignette](https://bioconductor.org/packages/release/bioc/vignettes/GSVA/inst/doc/GSVA.html).
>Gene set variation analysis (GSVA) is a particular type of gene set enrichment method that works on single samples and enables pathway-centric analyses of molecular data by performing a conceptually simple but powerful change in the functional unit of analysis, from genes to gene sets. The GSVA package provides the implementation of four single-sample gene set enrichment methods, concretely zscore, plage, ssGSEA and its own called GSVA. While this methodology was initially developed for gene expression data, it can be applied to other types of molecular profiling data. In this vignette we illustrate how to use the GSVA package with bulk microarray and RNA-seq expression data.
>
>Gene set variation analysis (GSVA) provides an estimate of pathway activity by transforming an input gene-by-sample expression data matrix into a corresponding gene-set-by-sample expression data matrix.
>
>This resulting expression data matrix can be then used with classical analytical methods such as differential expression, classification, survival analysis, clustering or correlation analysis in a pathway-centric manner. One can also perform sample-wise comparisons between pathways and other molecular data types such as microRNA expression or binding data, copy-number variation (CNV) data or single nucleotide polymorphisms (SNPs).
## Package
Install GSVA. (Dependencies are listed in the Imports section in the [DESCRIPTION](https://github.com/rcastelo/GSVA/blob/devel/DESCRIPTION) file.)
```{r install_fgsea, message=FALSE, warning=FALSE}
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
if (!require("GSVA", quietly = TRUE))
BiocManager::install("GSVA")
if (!require("GSVAdata", quietly = TRUE))
BiocManager::install("GSVAdata")
```
Load package.
```{r load_gsva, message=FALSE, warning=FALSE}
library(GSVA)
packageVersion("GSVA")
```
## Quick start
Generate example expression matrix with 10,000 genes across 30 samples.
```{r eg_mat}
p <- 10000
n <- 30
set.seed(1984)
X <- matrix(
rnorm(p*n),
nrow=p,
dimnames=list(paste0("g", 1:p), paste0("s", 1:n))
)
X[1:5, 1:5]
```
Generate 100 gene sets that are contain from 10 to up to 100 genes sampled from `1:p`.
```{r eg_gs}
set.seed(1984)
gs <- as.list(sample(10:100, size=100, replace=TRUE))
gs <- lapply(gs, function(n, p){
paste0("g", sample(1:p, size=n, replace=FALSE))
}, p)
names(gs) <- paste0("gs", 1:length(gs))
sapply(gs, length)
```
Calculate GSVA enrichment scores using the `gsva()` function, which does all the work and requires the following two input arguments:
1. A **normalised** gene expression dataset, which can be provided in one of the following containers:
* A matrix of expression values with genes corresponding to rows and samples corresponding to columns.
* An ExpressionSet object; see package Biobase.
* A SummarizedExperiment object, see package SummarizedExperiment.
2. A collection of gene sets; which can be provided in one of the following containers:
* A list object where each element corresponds to a gene set defined by a vector of gene identifiers, and the element names correspond to the names of the gene sets.
* A GeneSetCollection object; see package GSEABase.
>The first argument to the `gsva()` function is the gene expression data matrix and the second the collection of gene sets. The `gsva()` function can take the input expression data and gene sets using different specialized containers that facilitate the access and manipulation of molecular and phenotype data, as well as their associated metadata. Another advanced features include the use of on-disk and parallel backends to enable, respectively, using GSVA on large molecular data sets and speed up computing time.
The `gsva()` function will apply the following filters before the actual calculations take place:
1. Discard genes in the input expression data matrix with constant expression.
2. Discard genes in the input gene sets that do not map to a gene in the input gene expression data matrix.
3. Discard gene sets that, after applying the previous filters, do not meet a minimum and maximum size, which by default is 1 for the minimum size and `Inf` for the maximum size.
When `method="gsva"` is used (the default), the following parameters can be tuned:
* `kcdf`: The first step of the GSVA algorithm brings gene expression profiles to a common scale by calculating an expression statistic through a non-parametric estimation of the CDF across samples. Such a non-parametric estimation employs a kernel function and the `kcdf` parameter allows the user to specify three possible values for that function:
1. "Gaussian", the default value, which is suitable for continuous expression data, such as microarray fluorescent units in logarithmic scale and RNA-seq log-CPMs, log-RPKMs or log-TPMs units of expression;
2. "Poisson", which is suitable for integer counts, such as those derived from RNA-seq alignments;
3. "none", which will enforce a direct estimation of the CDF without a kernel function.
* `mx.diff`: The last step of the GSVA algorithm calculates the gene set enrichment score from two Kolmogorov-Smirnov random walk statistics. This parameter is a logical flag that allows the user to specify two possible ways to do such calculation:
1. `TRUE`, the default value, where the enrichment score is calculated as the magnitude difference between the largest positive and negative random walk deviations;
2. `FALSE`, where the enrichment score is calculated as the maximum distance of the random walk from zero.
* `abs.ranking`: Logical flag used only when `mx.diff=TRUE`. By default, `abs.ranking=FALSE` and it implies that a modified Kuiper statistic is used to calculate enrichment scores, taking the magnitude difference between the largest positive and negative random walk deviations. When `abs.ranking=TRUE` the original Kuiper statistic is used, by which the largest positive and negative random walk deviations are added together. In this case, gene sets with genes enriched on either extreme (high or low) will be regarded as highly activated.
* `tau`: Exponent defining the weight of the tail in the random walk. By default `tau=1`. When `method="ssgsea"`, this parameter is also used and its default value becomes then `tau=0.25` to match the methodology described in (Barbie et al. 2009).
In general, the default values for the previous parameters are suitable for most analysis settings, which usually consist of some kind of normalized continuous expression values.
```{r eg_gsva}
es_gsva <- gsva(gsvaParam(X, gs), verbose=FALSE)
dim(es_gsva)
```
Median enrichment scores.
```{r eg_median_es}
apply(es_gsva, 2, median)
```
>ssgsea (Barbie et al. 2009). Single sample GSEA (ssGSEA) is a non-parametric method that calculates a gene set enrichment score per sample as the normalized difference in empirical cumulative distribution functions (CDFs) of gene expression ranks inside and outside the gene set. By default, the implementation in the GSVA package follows the last step described in (Barbie et al. 2009, online methods, pg. 2) by which pathway scores are normalized, dividing them by the range of calculated values. This normalization step may be switched off using the argument ssgsea.norm in the call to the gsva() function; see below.
```{r eg_ssgsea}
es_ssgsea <- gsva(ssgseaParam(X, gs), verbose=FALSE)
apply(es_ssgsea, 2, median)
```
## Investigating the scores
Function to create testing matrix.
```{r create_mat}
create_mat <- function(n_genes, n_samples, my_seed = 1984, mean = 10, sd = 10){
set.seed(my_seed)
matrix(
rnorm(n = n_genes*n_samples, mean = mean, sd = sd),
nrow=n_genes,
dimnames=list(paste0("g", 1:n_genes), paste0("s", 1:n_samples))
)
}
create_mat(10, 3)
```
### Matrix subset.
Subset and full matrix.
```{r subset_and_full_mat}
subset_mat <- create_mat(10, 5)
full_mat <- create_mat(10, 10)
subset_mat
full_mat
```
Single gene set with three genes.
```{r single_gene_set}
gene_set <- list(
gs1 = paste0("g", 1:3)
)
```
The samples including in the analysis affect the ssGSEA enrichment calculations slightly.
```{r ssgsea_subset, message=FALSE, warning=FALSE}
x <- gsva(ssgseaParam(subset_mat, gene_set), verbose=FALSE)
y <- gsva(ssgseaParam(full_mat, gene_set), verbose=FALSE)
x
y
```
The initial expression estimation for each gene is carried out across all samples for GSVA, so it is expected that different samples will result in different enrichment scores.
```{r gsva_subset, message=FALSE, warning=FALSE}
x <- gsva(gsvaParam(subset_mat, gene_set), verbose=FALSE)
y <- gsva(gsvaParam(full_mat, gene_set), verbose=FALSE)
x
y
```
### Normalisation
Create another test matrix.
```{r create_test_matrix}
X <- create_mat(10000, 2)
X[1:50, 's1'] <- rnorm(n = 50, mean = 50, sd = 55)
X[51:100, 's1'] <- rnorm(n = 50, mean = 2, sd = 2)
X[1:50, 's2'] <- rnorm(n = 50, mean = 100, sd = 5)
X[51:100, 's2'] <- rnorm(n = 50, mean = 2, sd = 0.5)
X[1:5, ]
```
Create testing gene lists with higher and lower expression patterns and check their enrichment scores.
```{r test_gene_set}
gene_set <- list(
gs1 = paste0("g", 1:50),
gs2 = paste0("g", 51:100),
gs3 = paste0("g", 101:150)
)
```
_ssgsea_ (Barbie et al. 2009). Single sample GSEA (ssGSEA) is a non-parametric method that calculates a gene set enrichment score per sample as the normalized difference in empirical cumulative distribution functions (CDFs) of gene expression ranks inside and outside the gene set. By default, the implementation in the GSVA package follows the last step described in (Barbie et al. 2009, online methods, pg. 2) by which pathway scores are normalized, dividing them by the range of calculated values. This normalization step may be switched off using the argument `ssgsea.norm` in the call to the `gsva()` function.
```{r ssgsea_test_matrix}
test_ssgsea <- gsva(ssgseaParam(X, gene_set), verbose=FALSE)
test_ssgsea
```
No normalisation.
```{r ssgsea_no_norm_test_matrix}
test_ssgsea <- gsva(ssgseaParam(X, gene_set, normalize = FALSE), verbose=FALSE)
test_ssgsea
```
gsva (Hanzelmann, Castelo, and Guinney 2013). This is the default method of the package and similarly to ssGSEA, is a non-parametric method that uses the empirical CDFs of gene expression ranks inside and outside the gene set, but it starts by calculating an expression-level statistic that brings gene expression profiles with different dynamic ranges to a common scale.
```{r gsva_test_matrix}
test_gsva <- gsva(gsvaParam(X, gene_set), verbose=FALSE)
test_gsva
```
_zscore_ (Lee et al. 2008). The z-score method standardizes expression profiles over the samples and then, for each gene set, combines the standardized values as follows. Given a gene set $\gamma = \{1, \ldots ,k\}$ with standardized values $\{z_1,\ldots,z_k\}$ for each gene in a specific sample, the combined z-score $Z_\gamma$ for the gene set $\gamma$ is defined as:
$$ Z_\gamma = \frac{\sum^k_{i=1} z_i}{\sqrt{k}}.$$
```{r zscore_test_matrix}
test_zscore <- gsva(zscoreParam(X, gene_set), verbose=FALSE)
test_zscore
```
_plage_ (Tomfohr, Lu, and Kepler 2005). Pathway level analysis of gene expression (PLAGE) standardizes expression profiles over the samples and then, for each gene set, it performs a singular value decomposition (SVD) over its genes. The coefficients of the first right-singular vector are returned as the estimates of pathway activity over the samples. Note that, because of how SVD is calculated, the sign of its singular vectors is arbitrary.
```{r plage_test_matrix}
test_plage <- gsva(plageParam(X, gene_set), verbose=FALSE)
test_plage
```
## Enrichment scores on microarray and RNA-seq data
Gene expression data of lymphoblastoid cell lines (LCL) from HapMap individuals that have been profiled using microarray and RNA-seq.
```{r c2_broad_sets, message=FALSE, warning=FALSE}
library(Biobase)
library(GSVAdata)
data(c2BroadSets)
data(commonPickrellHuang)
stopifnot(
identical(
featureNames(huangArrayRMAnoBatchCommon_eset),
featureNames(pickrellCountsArgonneCQNcommon_eset)
)
)
stopifnot(
identical(
sampleNames(huangArrayRMAnoBatchCommon_eset),
sampleNames(pickrellCountsArgonneCQNcommon_eset)
)
)
pickrellCountsArgonneCQNcommon_eset
```
For the current analysis we use the subset of canonical pathways from the C2 collection of MSigDB Gene Sets. These correspond to the following pathways from KEGG, REACTOME and BIOCARTA.
```{r canonical_c2_broad_sets}
canonicalC2BroadSets <- c2BroadSets[
c(
grep("^KEGG", names(c2BroadSets)),
grep("^REACTOME", names(c2BroadSets)),
grep("^BIOCARTA", names(c2BroadSets))
)
]
canonicalC2BroadSets
```
We calculate the GSVA enrichment scores for these gene sets using first the normalized microarray data and then the normalized RNA-seq integer count data. Note that the only requirement to do the latter is to set the argument `kcdf="Poisson"`, which is `"Gaussian"` by default. However, if the RNA-seq normalized expression levels is continuous, such as log-CPMs, log-RPKMs or log-TPMs, use `"Gaussian"`.
Microarray.
```{r esmicro}
huangPar <- gsvaParam(
exprData = huangArrayRMAnoBatchCommon_eset,
geneSets = canonicalC2BroadSets,
minSize=5,
maxSize=500
)
esmicro <- gsva(huangPar, verbose=FALSE)
exprs(esmicro)[1:6, 1:6]
```
RNA-seq.
```{r esrnaseq}
pickrellPar <- gsvaParam(
exprData = pickrellCountsArgonneCQNcommon_eset,
geneSets = canonicalC2BroadSets,
minSize=5,
maxSize=500,
kcdf="Poisson"
)
system.time(
esrnaseq <- gsva(pickrellPar, verbose=FALSE)
)
exprs(esrnaseq)[1:6, 1:6]
```
In parallel.
```{r esrnaseq_par, warning=FALSE, message=FALSE}
library(BiocParallel)
param <- MulticoreParam(workers = 4L, progressbar = FALSE)
system.time(
esrnaseq_par <- gsva(pickrellPar, verbose=FALSE, BPPARAM = param)
)
identical(exprs(esrnaseq), exprs(esrnaseq_par))
```
Correlation of enrichment scores between the two technologies.
```{r correlation_enrichment_scores, fig.width=12, fig.height=12}
library(corrplot)
corrplot(cor(exprs(esrnaseq), exprs(esmicro)), type="lower")
```
Correlation of enrichment scores between just the RNA-seq samples ordered using hierarchical clustering.
```{r correlation_enrichment_scores_rnaseq, fig.width=12, fig.height=12}
corrplot(cor(exprs(esrnaseq), exprs(esrnaseq)), type="lower", diag = FALSE, order = "hclust")
```
## Molecular signature identification
Verhaak et al. 2010 identified four subtypes of glioblastoma multiforme (GBM) using gene expression patterns:
1. Proneural
2. Classical
3. Neural
4. Mesenchymal
Here we will try to replicate the study using four gene set signatures specific to brain cell types that were derived using mouse models by Cahoy et al. 2008:
1. Astrocytes
2. Oligodendrocytes
3. Neurons
4. Cultured astroglial cells
```{r gbm_eset}
data(gbm_VerhaakEtAl)
gbm_eset
```
Feature names are gene symbols.
```{r gbm_eset_feature_names}
head(featureNames(gbm_eset))
```
Subtypes.
```{r gbm_eset_subtypes}
table(gbm_eset$subtype)
```
Length of the signatures.
```{r brain_tx_db_sets}
data(brainTxDbSets)
lengths(brainTxDbSets)
```
Check out the signatures.
```{r brain_tx_db_sets_head}
lapply(brainTxDbSets, head)
```
GSVA enrichment scores are calculated using the gene sets contained in `brainTxDbSets`; `maxDiff` is set to `FALSE`. Here's a reminder of what this parameter does:
`max.diff`: The last step of the GSVA algorithm calculates the gene set enrichment score from two Kolmogorov-Smirnov random walk statistics. This parameter is a logical flag that allows the user to specify two possible ways to do such calculation:
1. `TRUE`, the default value, where the enrichment score is calculated as the magnitude difference between the largest positive and negative random walk deviations;
2. `FALSE`, where the enrichment score is calculated as the maximum distance of the random walk from zero.
```{r gbm_es}
gbmPar <- gsvaParam(gbm_eset, brainTxDbSets, maxDiff=FALSE)
gbm_es <- gsva(gbmPar, verbose=FALSE)
```
Prepare data frame for plotting.
```{r my_df}
my_df <- data.frame(
sample = colnames(gbm_eset),
subtype = gbm_eset$subtype
)
t(exprs(gbm_es)) |>
as.data.frame() |>
tibble::rownames_to_column('sample') |>
dplyr::inner_join(my_df, by = "sample") |>
dplyr::mutate(sample = factor(sample, levels = sample)) |>
tidyr::pivot_longer(cols = c(-sample, -subtype), names_to = "signature", values_to = "enrichment") -> my_df
head(my_df)
```
Plot.
```{r plot_gbm}
library(ggplot2)
ggplot(my_df, aes(sample, signature, fill = enrichment)) +
geom_tile() +
facet_grid(~subtype, scales = "free") +
theme(
axis.text.x = element_blank(), axis.ticks.x = element_blank()
) +
scale_fill_gradient(low = "skyblue", high = "red")
```
Results using `maxDiff=TRUE`.
```{r gbm_es_max_diff_true}
gbmPar <- gsvaParam(gbm_eset, brainTxDbSets, maxDiff=TRUE)
gbm_es_max <- gsva(gbmPar, verbose=FALSE)
my_df <- data.frame(
sample = colnames(gbm_eset),
subtype = gbm_eset$subtype
)
t(exprs(gbm_es_max)) |>
as.data.frame() |>
tibble::rownames_to_column('sample') |>
dplyr::inner_join(my_df, by = "sample") |>
dplyr::mutate(sample = factor(sample, levels = sample)) |>
tidyr::pivot_longer(cols = c(-sample, -subtype), names_to = "signature", values_to = "enrichment") -> my_df2
ggplot(my_df2, aes(sample, signature, fill = enrichment)) +
geom_tile() +
facet_grid(~subtype, scales = "free") +
theme(
axis.text.x = element_blank(), axis.ticks.x = element_blank()
) +
scale_fill_gradient(low = "skyblue", high = "red")
```
Results using ssgsea.
```{r gbm_es_ssgsea}
gbm_ssgsea <- gsva(ssgseaParam(gbm_eset, brainTxDbSets), verbose=FALSE)
my_df <- data.frame(
sample = colnames(gbm_eset),
subtype = gbm_eset$subtype
)
t(exprs(gbm_ssgsea)) |>
as.data.frame() |>
tibble::rownames_to_column('sample') |>
dplyr::inner_join(my_df, by = "sample") |>
dplyr::mutate(sample = factor(sample, levels = sample)) |>
tidyr::pivot_longer(cols = c(-sample, -subtype), names_to = "signature", values_to = "enrichment") -> my_df3
ggplot(my_df3, aes(sample, signature, fill = enrichment)) +
geom_tile() +
facet_grid(~subtype, scales = "free") +
theme(
axis.text.x = element_blank(), axis.ticks.x = element_blank()
) +
scale_fill_gradient(low = "skyblue", high = "red")
```