Skip to content

Implement Diebold–Mariano Test, an evolved T-test #276

@mherrmann3

Description

@mherrmann3

In Brehmer et al. 20241, we propose the Diebold–Mariano2 Test (DM-test) as an evolution of the T-test. It revises the variance estimation (used for standardizing the test statistic) by accounting for correlations between forecasts, leading to a different (adequate) null distribution and hence p-values. This improvement is relevant when using the information gain per earthquake (IGPE) for statistical inference (the mean IGPE remains the same).


Here a summary of similarities and differences between both tests:

Similarities

  • based on the same test statistic (the information gain, i.e., the spatially and temporally aggregated difference in the Poisson score as proved by Eq. (15) in our paper)
  • both are variants of Student's t-test: T-test employs Student's t-distribution, DM test employs standard normal distribution3
  • both rank forecasts the same way (mean IGPE is the same)

Differences

  • CSEP T-test: variance derivation has several theoretical deficiencies (explained in detail on page 19, right column in our paper):
    • ignores grid cells and time periods without target earthquake
    • ignores spatiotemporal dependencies
    • uses N (target earthquakes) instead of n (number of spatiotemporal cells)
    • p-value not uniform
    • ⇒ too many rejections / too optimistic (type I error, false positives), as evidenced in our simulations
  • DM test: derives a variance estimate that is
    • theoretically valid (see Diebold and Mariano 19952)
    • based on sample autocovariance between the forecasts → accounts for temporal dependencies and related effects
    • → uniform p-value (desired behavior under null hypothesis)
    • ⇒ adequate/nominal numbers of rejections, as evidenced in our simulations

Original code in R: github.com/jbrehmer42/Earthquakes_Italy

I will work on a PR soon.

@pabloitu: It may be worthwhile to add it to EPIC: Roadmap v1.0 #269.

Footnotes

  1. Brehmer, J. R., K. Kraus, T. Gneiting, M. Herrmann, W. Marzocchi (2025). Enhancing the statistical evaluation of earthquake forecasts – An application to Italy. Seismological Research Letters. doi: 10.1785/0220240209

  2. Diebold, F. X., & Mariano, R. S. (1995). Comparing Predictive Accuracy. Journal of Business & Economic Statistics, 13(3), 253–263. doi: 10.1080/07350015.1995.10524599 2

  3. The standard normal distribution arises as the limit of the (suitably normalized) Student t-distribution with n degrees of freedom, as n becomes large.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions