-
Notifications
You must be signed in to change notification settings - Fork 0
04 DSGE Analysis
songif edited this page Jun 15, 2026
·
4 revisions
The core function. Computes DSGE for every pathway, generates size-grouped permutation null distributions, fits GPD to the upper tail, and applies multiple testing correction.
result <- pathway_dsge(
pathway_genes = pw,
pvalue = res$pvalue,
base_mean = res$baseMean,
gene_names = res$geneName,
gene_id_col = NULL, # auto-detected from pathway source
base_mean_cutoff = 0.1,
min_size = 5,
max_size = 500,
n_perm = 10000,
seed = 42,
return_null = TRUE,
progress = TRUE,
heterogeneity = FALSE,
directional = FALSE,
direction_vec = NULL,
use_std = TRUE,
use_gpd = TRUE,
gpd_threshold = 0.99,
gpd_method = "mle",
safety_margin = 1.6,
n_cores = 1,
p_adjust_method = "BY",
nds_top_frac = 0.25
)
result_tbl <- result$table # data.frame, sorted by p_adj ascending| Parameter | Default | Description |
|---|---|---|
pathway_genes |
(required) | Named list from get_pathway_genes_db(), get_pathway_genes_kegg(), or get_pathway_genes_reactome()
|
pvalue |
(required) | p-value vector from differential expression analysis |
base_mean |
NULL |
Mean expression vector (e.g., DESeq2 baseMean); NULL skips filtering |
gene_names |
(required) | Gene symbols, must be unique |
gene_id_col |
NULL |
Column in pathway data.frames to match gene_names. When NULL, auto-detected from the pathway source: "db_object_symbol" (GO), "kegg_gene_symbol" (KEGG), "reactome_gene_symbol" (Reactome) |
source |
NULL |
Override pathway source detection. One of "GO", "KEGG", "REACTOME". When NULL, auto-detected from the S3 class of the pathway list elements |
base_mean_cutoff |
0.1 |
Exclude genes with baseMean at or below this value |
min_size |
5 |
Minimum matched genes per pathway |
max_size |
500 |
Maximum matched genes (set Inf to disable) |
n_perm |
10000 |
Permutations per size group |
seed |
NULL |
Random seed for reproducibility |
return_null |
FALSE |
If TRUE, return list with null distributions (needed for plot_dsge) |
progress |
TRUE |
Show progress bars during computation |
heterogeneity |
FALSE |
If TRUE, also compute Gini, CV, and heterogeneity p-values |
directional |
FALSE |
If TRUE, compute Normalized Direction Score (NDS) using direction_vec
|
direction_vec |
NULL |
Numeric vector (e.g. log2FoldChange), same length as pvalue; required when directional = TRUE
|
use_std |
TRUE |
If TRUE, compute (observed - mean(null)) / sd(null) and include dsge_std column |
use_gpd |
TRUE |
If TRUE, use GPD tail extrapolation with support-constrained adjustment (avoids p=0) |
gpd_threshold |
0.99 |
Tail quantile threshold for GPD fitting |
gpd_method |
"mle" |
GPD estimation method passed to POT::fitgpd
|
safety_margin |
1.6 |
Safety margin for GPD support-constrained adjustment |
n_cores |
1 |
Number of CPU cores for parallel null generation (uses Rcpp; limited benefit on Windows) |
p_adjust_method |
"BY" |
Multiple testing correction method. Default BY (Benjamini-Yekutieli) for FDR control under arbitrary dependence |
nds_top_frac |
0.25 |
Fraction of most-perturbed genes retained for NDS calculation (only used when directional = TRUE) |
Output column names depend on the pathway source (auto-detected):
| Column | Description |
|---|---|
go_id |
GO term identifier |
go_name |
Human-readable GO term name |
aspect |
Ontology: BP / MF / CC |
n_pathway |
Total genes in the pathway annotation |
n_matched |
Genes matched in the expression data |
dsge |
Observed DSGE (mean z-score of pathway genes) |
dsge_std |
Standardised DSGE (only when use_std = TRUE) |
nds |
Normalized Direction Score (only when directional = TRUE) |
p_value |
Permutation p-value |
p_adj |
Adjusted p-value (BY correction by default) |
| Column | Description |
|---|---|
kegg_id |
KEGG pathway ID (e.g. hsa00010) |
kegg_name |
Human-readable KEGG pathway name |
n_pathway, n_matched, dsge, dsge_std, nds, p_value, p_adj
|
Same as GO |
| Column | Description |
|---|---|
reactome_id |
Reactome pathway ID (e.g. R-HSA-177929) |
reactome_name |
Human-readable Reactome pathway name |
n_pathway, n_matched, dsge, dsge_std, nds, p_value, p_adj
|
Same as GO |
# How many pathways are significant?
sum(result_tbl$p_adj < 0.05)
# Top pathways (GO)
head(result_tbl[, c("go_id", "go_name", "n_matched", "dsge", "dsge_std", "p_adj")])
# Top pathways (KEGG)
head(result_tbl[, c("kegg_id", "kegg_name", "n_matched", "dsge", "dsge_std", "p_adj")])
# Top pathways (Reactome)
head(result_tbl[, c("reactome_id", "reactome_name", "n_matched", "dsge", "dsge_std", "p_adj")])
# Save
write.csv(result_tbl, "pathway_dsge_results.csv", row.names = FALSE)
# Source-specific attributes
attr(result_tbl, "pathway_source") # "GO", "KEGG", or "REACTOME"