Implement Diebold–Mariano Test, an evolved T-test

In [Brehmer et al. 2024](https://pubs.geoscienceworld.org/ssa/srl/article-abstract/doi/10.1785/0220240209/650395/Enhancing-the-Statistical-Evaluation-of-Earthquake)[^1], we propose the _[Diebold–Mariano](https://www.tandfonline.com/doi/abs/10.1080/07350015.1995.10524599)[^2] Test_ (DM-test) as an evolution of the T-test. It **revises the variance estimation** (used for standardizing the test statistic) by accounting for correlations between forecasts, leading to a different (adequate) null distribution and hence _p_-values. This improvement is relevant when using the information gain per earthquake (IGPE) for statistical inference (the mean IGPE remains the same).

---

Here a summary of similarities and differences between both tests:

**Similarities**
 * based on the same test statistic (the information gain, i.e., the spatially and temporally aggregated difference in the Poisson score as proved by Eq. (15) in our paper)
 * both are variants of Student's _t_-test: T-test employs Student's _t_-distribution, DM test employs standard normal distribution[^3]
 * both rank forecasts the same way (mean IGPE is the same)

**Differences**
* CSEP T-test: variance derivation has several theoretical deficiencies (explained in detail on page 19, right column in our paper):
	* ignores grid cells and time periods without target earthquake
	* ignores spatiotemporal dependencies
	* uses _N_ (target earthquakes) instead of _n_ (number of spatiotemporal cells) 
	* → _p_-value not uniform
	* ⇒ too many rejections / too optimistic (type I error, false positives), as evidenced in our simulations
* DM test: derives a variance estimate that is
	* theoretically valid (see Diebold and Mariano 1995[^2])
	* based on sample autocovariance between the forecasts → accounts for temporal dependencies and related effects
	* → uniform _p_-value (desired behavior under null hypothesis)
	* ⇒ adequate/nominal numbers of rejections, as evidenced in our simulations

---

Original code in R: [github.com/jbrehmer42/Earthquakes_Italy](https://github.com/jbrehmer42/Earthquakes_Italy)

I will work on a PR soon.

@pabloitu: It may be worthwhile to add it to [EPIC: Roadmap v1.0 #269](https://github.com/SCECcode/pycsep/issues/269).

[^1]: Brehmer, J. R., K. Kraus, T. Gneiting, M. Herrmann, W. Marzocchi (2025). Enhancing the statistical evaluation of earthquake forecasts – An application to Italy. _Seismological Research Letters_. doi: [10.1785/0220240209](https://doi.org/10.1785/0220240209)
[^2]: Diebold, F. X., & Mariano, R. S. (1995). Comparing Predictive Accuracy. _Journal of Business & Economic Statistics, 13_(3), 253–263. doi: [10.1080/07350015.1995.10524599](https://doi.org/10.1080/07350015.1995.10524599)
[^3]: The standard normal distribution arises as the limit of the (suitably normalized) Student _t_-distribution with _n_ degrees of freedom, as _n_ becomes large.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Diebold–Mariano Test, an evolved T-test #276

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement Diebold–Mariano Test, an evolved T-test #276

Description

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions