04 DSGE Analysis

Pathway DSGE Analysis

The core function. Computes DSGE for every pathway, generates size-grouped permutation null distributions, fits GPD to the upper tail, and applies multiple testing correction.

Usage

result <- pathway_dsge(
  pathway_genes    = pw,
  pvalue           = res$pvalue,
  base_mean        = res$baseMean,
  gene_names       = res$geneName,
  gene_id_col      = NULL,          # auto-detected from pathway source
  base_mean_cutoff = 0.1,
  min_size         = 5,
  max_size         = 500,
  n_perm           = 10000,
  seed             = 42,
  return_null      = TRUE,
  progress         = TRUE,
  heterogeneity    = FALSE,
  directional      = FALSE,
  direction_vec    = NULL,
  use_std          = TRUE,
  use_gpd          = TRUE,
  gpd_threshold    = 0.99,
  gpd_method       = "mle",
  safety_margin    = 1.6,
  n_cores          = 1,
  p_adjust_method  = "BY",
  nds_top_frac     = 0.25
)

result_tbl <- result$table    # data.frame, sorted by p_adj ascending

Parameters

Parameter	Default	Description
`pathway_genes`	(required)	Named list from `get_pathway_genes_db()`, `get_pathway_genes_kegg()`, or `get_pathway_genes_reactome()`
`pvalue`	(required)	p-value vector from differential expression analysis
`base_mean`	`NULL`	Mean expression vector (e.g., DESeq2 baseMean); `NULL` skips filtering
`gene_names`	(required)	Gene symbols, must be unique
`gene_id_col`	`NULL`	Column in pathway data.frames to match `gene_names`. When `NULL`, auto-detected from the pathway source: `"db_object_symbol"` (GO), `"kegg_gene_symbol"` (KEGG), `"reactome_gene_symbol"` (Reactome)
`source`	`NULL`	Override pathway source detection. One of `"GO"`, `"KEGG"`, `"REACTOME"`. When `NULL`, auto-detected from the S3 class of the pathway list elements
`base_mean_cutoff`	`0.1`	Exclude genes with baseMean at or below this value
`min_size`	`5`	Minimum matched genes per pathway
`max_size`	`500`	Maximum matched genes (set `Inf` to disable)
`n_perm`	`10000`	Permutations per size group
`seed`	`NULL`	Random seed for reproducibility
`return_null`	`FALSE`	If `TRUE`, return list with null distributions (needed for `plot_dsge`)
`progress`	`TRUE`	Show progress bars during computation
`heterogeneity`	`FALSE`	If `TRUE`, also compute Gini, CV, and heterogeneity p-values
`directional`	`FALSE`	If `TRUE`, compute Normalized Direction Score (NDS) using `direction_vec`
`direction_vec`	`NULL`	Numeric vector (e.g. log2FoldChange), same length as `pvalue`; required when `directional = TRUE`
`use_std`	`TRUE`	If `TRUE`, compute `(observed - mean(null)) / sd(null)` and include `dsge_std` column
`use_gpd`	`TRUE`	If `TRUE`, use GPD tail extrapolation with support-constrained adjustment (avoids p=0)
`gpd_threshold`	`0.99`	Tail quantile threshold for GPD fitting
`gpd_method`	`"mle"`	GPD estimation method passed to `POT::fitgpd`
`safety_margin`	`1.6`	Safety margin for GPD support-constrained adjustment
`n_cores`	`1`	Number of CPU cores for parallel null generation (uses Rcpp; limited benefit on Windows)
`p_adjust_method`	`"BY"`	Multiple testing correction method. Default BY (Benjamini-Yekutieli) for FDR control under arbitrary dependence
`nds_top_frac`	`0.25`	Fraction of most-perturbed genes retained for NDS calculation (only used when `directional = TRUE`)

Result Columns

Output column names depend on the pathway source (auto-detected):

GO source

Column	Description
`go_id`	GO term identifier
`go_name`	Human-readable GO term name
`aspect`	Ontology: BP / MF / CC
`n_pathway`	Total genes in the pathway annotation
`n_matched`	Genes matched in the expression data
`dsge`	Observed DSGE (mean z-score of pathway genes)
`dsge_std`	Standardised DSGE (only when `use_std = TRUE`)
`nds`	Normalized Direction Score (only when `directional = TRUE`)
`p_value`	Permutation p-value
`p_adj`	Adjusted p-value (BY correction by default)

KEGG source

Column	Description
`kegg_id`	KEGG pathway ID (e.g. `hsa00010`)
`kegg_name`	Human-readable KEGG pathway name
`n_pathway`, `n_matched`, `dsge`, `dsge_std`, `nds`, `p_value`, `p_adj`	Same as GO

Reactome source

Column	Description
`reactome_id`	Reactome pathway ID (e.g. `R-HSA-177929`)
`reactome_name`	Human-readable Reactome pathway name
`n_pathway`, `n_matched`, `dsge`, `dsge_std`, `nds`, `p_value`, `p_adj`	Same as GO

Inspecting Results

# How many pathways are significant?
sum(result_tbl$p_adj < 0.05)

# Top pathways (GO)
head(result_tbl[, c("go_id", "go_name", "n_matched", "dsge", "dsge_std", "p_adj")])

# Top pathways (KEGG)
head(result_tbl[, c("kegg_id", "kegg_name", "n_matched", "dsge", "dsge_std", "p_adj")])

# Top pathways (Reactome)
head(result_tbl[, c("reactome_id", "reactome_name", "n_matched", "dsge", "dsge_std", "p_adj")])

# Save
write.csv(result_tbl, "pathway_dsge_results.csv", row.names = FALSE)

# Source-specific attributes
attr(result_tbl, "pathway_source")  # "GO", "KEGG", or "REACTOME"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

04 DSGE Analysis

Pathway DSGE Analysis

Usage

Parameters

Result Columns

GO source

KEGG source

Reactome source

Inspecting Results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally