-
Notifications
You must be signed in to change notification settings - Fork 0
07 Key Concepts
The Disruption Score of Gene Expression is built on per-gene z-scores:
z-score conversion:
z_i = |Φ⁻¹(1 - p_i/2)|
where p_i is the nominal p-value from differential expression testing and Φ⁻¹ is the inverse normal CDF (quantile function). Taking the absolute value converts two-sided p-values into a one-sided perturbation magnitude.
Pathway DSGE:
DSGE = mean(z_i) for all genes in the pathway
An unweighted mean of absolute z-scores. Higher values indicate stronger transcriptional perturbation in the gene set.
Pathways sharing the same number of matched genes reuse a single null distribution. This reduces computation from K × n_perm to |unique sizes| × n_perm.
For each size group, genes are randomly sampled without replacement from the filtered gene pool, and DSGE is recomputed. This generates an empirical null distribution that accounts for the correlation structure of the gene pool.
When use_gpd = TRUE (default), observed DSGE values above the gpd_threshold percentile of the null distribution (default 0.99) get p-values from a fitted Generalized Pareto Distribution (GPD) instead of direct counting. This provides higher resolution for extreme observations beyond the range of the empirical null.
A support-constrained adjustment (Peschel et al. 2025, arXiv:2602.22975) is applied when the fitted GPD would otherwise produce p = 0 due to a finite upper bound. This ensures a valid non-zero p-value while minimally deviating from the MLE.
When use_gpd = FALSE, pure empirical ECDF is used (p-values always >= 1/n_perm).
Default method: Benjamini-Yekutieli (BY), which controls the False Discovery Rate under arbitrary dependence structures. This is more conservative than Benjamini-Hochberg (BH) but appropriate when GO pathways share genes and are not independent.
Alternative methods supported: "holm", "hochberg", "hommel", "bonferroni", "BH", "fdr", "none".
dsge_std = (observed - mean(null)) / sd(null)
Standardises the observed DSGE against its size-matched null distribution, producing a z-score-like metric that is comparable across pathways of different sizes.
Optional Gini coefficient + Coefficient of Variation (CV) with a two-sided permutation test:
- Low heterogeneity: uniform perturbation across all genes in the pathway (all genes equally affected)
- High heterogeneity: selective targeting of a few key genes (driver vs. passenger distinction)
When a direction vector (e.g. log2FoldChange) is provided with directional = TRUE:
NDS = (U - D) / max(U, D)
Where:
-
U= sum of z-scores of up-regulated genes (dir > 0) -
D= sum of z-scores of down-regulated genes (dir < 0)
By default, only the top 25% most-perturbed genes (by absolute direction value) are used, controlled by the nds_top_frac parameter. If the top subset has fewer than 10 genes, all genes are used as fallback.
Range: -1 (all down-regulated) to +1 (all up-regulated).
NDS is a descriptive metric — significance is determined by the DSGE p-value.
MIT