-
Notifications
You must be signed in to change notification settings - Fork 0
04 DSGE Analysis
songif edited this page Jun 12, 2026
·
4 revisions
The core function. Computes DSGE for every pathway, generates size-grouped permutation null distributions, fits GPD to the upper tail, and applies multiple testing correction.
result <- pathway_dsge(
pathway_genes = pw,
pvalue = res$pvalue,
base_mean = res$baseMean,
gene_names = res$geneName,
gene_id_col = "db_object_symbol",
base_mean_cutoff = 0.1,
min_size = 5,
max_size = 500,
n_perm = 10000,
seed = 42,
return_null = TRUE,
progress = TRUE,
heterogeneity = FALSE,
directional = FALSE,
direction_vec = NULL,
use_std = TRUE,
use_gpd = TRUE,
gpd_threshold = 0.99,
gpd_method = "mle",
safety_margin = 1.6,
n_cores = 1,
p_adjust_method = "BY",
nds_top_frac = 0.25
)
result_tbl <- result$table # data.frame, sorted by p_adj ascending| Parameter | Default | Description |
|---|---|---|
pathway_genes |
(required) | Named list from get_pathway_genes() or get_pathway_genes_db()
|
pvalue |
(required) | p-value vector from differential expression analysis |
base_mean |
NULL |
Mean expression vector (e.g., DESeq2 baseMean); NULL skips filtering |
gene_names |
(required) | Gene symbols, must be unique |
gene_id_col |
"db_object_symbol" |
Column in pathway data.frames to match gene_names
|
base_mean_cutoff |
0.1 |
Exclude genes with baseMean at or below this value |
min_size |
5 |
Minimum matched genes per pathway |
max_size |
500 |
Maximum matched genes (set Inf to disable) |
n_perm |
10000 |
Permutations per size group |
seed |
NULL |
Random seed for reproducibility |
return_null |
FALSE |
If TRUE, return list with null distributions (needed for plot_dsge) |
progress |
TRUE |
Show progress bars during computation |
heterogeneity |
FALSE |
If TRUE, also compute Gini, CV, and heterogeneity p-values |
directional |
FALSE |
If TRUE, compute Normalized Direction Score (NDS) using direction_vec
|
direction_vec |
NULL |
Numeric vector (e.g. log2FoldChange), same length as pvalue; required when directional = TRUE
|
use_std |
TRUE |
If TRUE, compute (observed - mean(null)) / sd(null) and include dsge_std column |
use_gpd |
TRUE |
If TRUE, use GPD tail extrapolation with support-constrained adjustment (avoids p=0) |
gpd_threshold |
0.99 |
Tail quantile threshold for GPD fitting |
gpd_method |
"mle" |
GPD estimation method passed to POT::fitgpd
|
safety_margin |
1.6 |
Safety margin for GPD support-constrained adjustment |
n_cores |
1 |
Number of CPU cores for parallel null generation (uses Rcpp; limited benefit on Windows) |
p_adjust_method |
"BY" |
Multiple testing correction method. Default BY (Benjamini-Yekutieli) for FDR control under arbitrary dependence |
nds_top_frac |
0.25 |
Fraction of most-perturbed genes retained for NDS calculation (only used when directional = TRUE) |
| Column | Description |
|---|---|
go_id |
GO term identifier |
go_name |
Human-readable GO term name |
aspect |
Ontology: BP (Biological Process), MF (Molecular Function), CC (Cellular Component) |
n_pathway |
Total genes in the pathway annotation |
n_matched |
Genes matched in the expression data |
dsge |
Observed DSGE (mean z-score of pathway genes) |
dsge_std |
Standardised DSGE (only when use_std = TRUE) |
nds |
Normalized Direction Score (only when directional = TRUE) |
gini |
Gini coefficient (only when heterogeneity = TRUE) |
cv |
Coefficient of Variation (only when heterogeneity = TRUE) |
p_value |
Permutation p-value |
p_adj |
Adjusted p-value (BY correction by default) |
het_p_value |
Heterogeneity p-value (only when heterogeneity = TRUE) |
het_p_adj |
Adjusted heterogeneity p-value (only when heterogeneity = TRUE) |
# How many pathways are significant?
sum(result_tbl$p_adj < 0.05)
# Ontology breakdown
table(result_tbl$aspect)
# Top pathways
head(result_tbl[, c("go_id", "go_name", "aspect", "n_matched", "dsge", "dsge_std", "p_adj")])
# Save
write.csv(result_tbl, "pathway_dsge_results.csv", row.names = FALSE)