-
Notifications
You must be signed in to change notification settings - Fork 26
Description
In Brehmer et al. 20241, we propose the Diebold–Mariano2 Test (DM-test) as an evolution of the T-test. It revises the variance estimation (used for standardizing the test statistic) by accounting for correlations between forecasts, leading to a different (adequate) null distribution and hence p-values. This improvement is relevant when using the information gain per earthquake (IGPE) for statistical inference (the mean IGPE remains the same).
Here a summary of similarities and differences between both tests:
Similarities
- based on the same test statistic (the information gain, i.e., the spatially and temporally aggregated difference in the Poisson score as proved by Eq. (15) in our paper)
- both are variants of Student's t-test: T-test employs Student's t-distribution, DM test employs standard normal distribution3
- both rank forecasts the same way (mean IGPE is the same)
Differences
- CSEP T-test: variance derivation has several theoretical deficiencies (explained in detail on page 19, right column in our paper):
- ignores grid cells and time periods without target earthquake
- ignores spatiotemporal dependencies
- uses N (target earthquakes) instead of n (number of spatiotemporal cells)
- → p-value not uniform
- ⇒ too many rejections / too optimistic (type I error, false positives), as evidenced in our simulations
- DM test: derives a variance estimate that is
- theoretically valid (see Diebold and Mariano 19952)
- based on sample autocovariance between the forecasts → accounts for temporal dependencies and related effects
- → uniform p-value (desired behavior under null hypothesis)
- ⇒ adequate/nominal numbers of rejections, as evidenced in our simulations
Original code in R: github.com/jbrehmer42/Earthquakes_Italy
I will work on a PR soon.
@pabloitu: It may be worthwhile to add it to EPIC: Roadmap v1.0 #269.
Footnotes
-
Brehmer, J. R., K. Kraus, T. Gneiting, M. Herrmann, W. Marzocchi (2025). Enhancing the statistical evaluation of earthquake forecasts – An application to Italy. Seismological Research Letters. doi: 10.1785/0220240209 ↩
-
Diebold, F. X., & Mariano, R. S. (1995). Comparing Predictive Accuracy. Journal of Business & Economic Statistics, 13(3), 253–263. doi: 10.1080/07350015.1995.10524599 ↩ ↩2
-
The standard normal distribution arises as the limit of the (suitably normalized) Student t-distribution with n degrees of freedom, as n becomes large. ↩