07 Key Concepts

Key Concepts

DSGE Formula

The Disruption Score of Gene Expression is built on per-gene z-scores:

z-score conversion:

z_i = |Φ⁻¹(1 - p_i/2)|

where p_i is the nominal p-value from differential expression testing and Φ⁻¹ is the inverse normal CDF (quantile function). Taking the absolute value converts two-sided p-values into a one-sided perturbation magnitude.

Pathway DSGE:

DSGE = mean(z_i)  for all genes in the pathway

An unweighted mean of absolute z-scores. Higher values indicate stronger transcriptional perturbation in the gene set.

Size-Grouped Permutation

Pathways sharing the same number of matched genes reuse a single null distribution. This reduces computation from K × n_perm to |unique sizes| × n_perm.

For each size group, genes are randomly sampled without replacement from the filtered gene pool, and DSGE is recomputed. This generates an empirical null distribution that accounts for the correlation structure of the gene pool.

GPD Tail Extrapolation

When use_gpd = TRUE (default), observed DSGE values above the gpd_threshold percentile of the null distribution (default 0.99) get p-values from a fitted Generalized Pareto Distribution (GPD) instead of direct counting. This provides higher resolution for extreme observations beyond the range of the empirical null.

A support-constrained adjustment (Peschel et al. 2025, arXiv:2602.22975) is applied when the fitted GPD would otherwise produce p = 0 due to a finite upper bound. This ensures a valid non-zero p-value while minimally deviating from the MLE.

When use_gpd = FALSE, pure empirical ECDF is used (p-values always >= 1/n_perm).

Multiple Testing Correction

Default method: Benjamini-Yekutieli (BY), which controls the False Discovery Rate under arbitrary dependence structures. This is more conservative than Benjamini-Hochberg (BH) but appropriate when GO pathways share genes and are not independent.

Alternative methods supported: "holm", "hochberg", "hommel", "bonferroni", "BH", "fdr", "none".

Standardised DSGE

dsge_std = (observed - mean(null)) / sd(null)

Standardises the observed DSGE against its size-matched null distribution, producing a z-score-like metric that is comparable across pathways of different sizes.

Perturbation Heterogeneity

Optional Gini coefficient + Coefficient of Variation (CV) with a two-sided permutation test:

Low heterogeneity: uniform perturbation across all genes in the pathway (all genes equally affected)
High heterogeneity: selective targeting of a few key genes (driver vs. passenger distinction)

Normalized Direction Score (NDS)

When a direction vector (e.g. log2FoldChange) is provided with directional = TRUE:

NDS = (U - D) / max(U, D)

Where:

U = sum of z-scores of up-regulated genes (dir > 0)
D = sum of z-scores of down-regulated genes (dir < 0)

By default, only the top 25% most-perturbed genes (by absolute direction value) are used, controlled by the nds_top_frac parameter. If the top subset has fewer than 10 genes, all genes are used as fallback.

Range: -1 (all down-regulated) to +1 (all up-regulated).

NDS is a descriptive metric — significance is determined by the DSGE p-value.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

07 Key Concepts

Key Concepts

DSGE Formula

Size-Grouped Permutation

GPD Tail Extrapolation

Multiple Testing Correction

Standardised DSGE

Perturbation Heterogeneity

Normalized Direction Score (NDS)

License

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally