Skip to content

04 DSGE Analysis

songif edited this page Jun 12, 2026 · 4 revisions

Pathway DSGE Analysis

The core function. Computes DSGE for every pathway, generates size-grouped permutation null distributions, fits GPD to the upper tail, and applies multiple testing correction.

Usage

result <- pathway_dsge(
  pathway_genes    = pw,
  pvalue           = res$pvalue,
  base_mean        = res$baseMean,
  gene_names       = res$geneName,
  gene_id_col      = "db_object_symbol",
  base_mean_cutoff = 0.1,
  min_size         = 5,
  max_size         = 500,
  n_perm           = 10000,
  seed             = 42,
  return_null      = TRUE,
  progress         = TRUE,
  heterogeneity    = FALSE,
  directional      = FALSE,
  direction_vec    = NULL,
  use_std          = TRUE,
  use_gpd          = TRUE,
  gpd_threshold    = 0.99,
  gpd_method       = "mle",
  safety_margin    = 1.6,
  n_cores          = 1,
  p_adjust_method  = "BY",
  nds_top_frac     = 0.25
)

result_tbl <- result$table    # data.frame, sorted by p_adj ascending

Parameters

Parameter Default Description
pathway_genes (required) Named list from get_pathway_genes() or get_pathway_genes_db()
pvalue (required) p-value vector from differential expression analysis
base_mean NULL Mean expression vector (e.g., DESeq2 baseMean); NULL skips filtering
gene_names (required) Gene symbols, must be unique
gene_id_col "db_object_symbol" Column in pathway data.frames to match gene_names
base_mean_cutoff 0.1 Exclude genes with baseMean at or below this value
min_size 5 Minimum matched genes per pathway
max_size 500 Maximum matched genes (set Inf to disable)
n_perm 10000 Permutations per size group
seed NULL Random seed for reproducibility
return_null FALSE If TRUE, return list with null distributions (needed for plot_dsge)
progress TRUE Show progress bars during computation
heterogeneity FALSE If TRUE, also compute Gini, CV, and heterogeneity p-values
directional FALSE If TRUE, compute Normalized Direction Score (NDS) using direction_vec
direction_vec NULL Numeric vector (e.g. log2FoldChange), same length as pvalue; required when directional = TRUE
use_std TRUE If TRUE, compute (observed - mean(null)) / sd(null) and include dsge_std column
use_gpd TRUE If TRUE, use GPD tail extrapolation with support-constrained adjustment (avoids p=0)
gpd_threshold 0.99 Tail quantile threshold for GPD fitting
gpd_method "mle" GPD estimation method passed to POT::fitgpd
safety_margin 1.6 Safety margin for GPD support-constrained adjustment
n_cores 1 Number of CPU cores for parallel null generation (uses Rcpp; limited benefit on Windows)
p_adjust_method "BY" Multiple testing correction method. Default BY (Benjamini-Yekutieli) for FDR control under arbitrary dependence
nds_top_frac 0.25 Fraction of most-perturbed genes retained for NDS calculation (only used when directional = TRUE)

Result Columns

Column Description
go_id GO term identifier
go_name Human-readable GO term name
aspect Ontology: BP (Biological Process), MF (Molecular Function), CC (Cellular Component)
n_pathway Total genes in the pathway annotation
n_matched Genes matched in the expression data
dsge Observed DSGE (mean z-score of pathway genes)
dsge_std Standardised DSGE (only when use_std = TRUE)
nds Normalized Direction Score (only when directional = TRUE)
gini Gini coefficient (only when heterogeneity = TRUE)
cv Coefficient of Variation (only when heterogeneity = TRUE)
p_value Permutation p-value
p_adj Adjusted p-value (BY correction by default)
het_p_value Heterogeneity p-value (only when heterogeneity = TRUE)
het_p_adj Adjusted heterogeneity p-value (only when heterogeneity = TRUE)

Inspecting Results

# How many pathways are significant?
sum(result_tbl$p_adj < 0.05)

# Ontology breakdown
table(result_tbl$aspect)

# Top pathways
head(result_tbl[, c("go_id", "go_name", "aspect", "n_matched", "dsge", "dsge_std", "p_adj")])

# Save
write.csv(result_tbl, "pathway_dsge_results.csv", row.names = FALSE)

Clone this wiki locally