# Assessing the benefit of growing a decision tree to post-process rainfall

# Introduction

Research questions n. 1 (RQ1): are multiple-WT ecPoint-Rainfall forecasts better than the single-WT forecasts? 

Research questions n. 2 (RQ2): if so, is there a limit on the number of WTs that can contribute to a significant improvement of the ecPoint-Rainfall forecasts? 

## Methods

Calibration and discrimination abilities are two of the key attributes of a probabilistic forecast (Murphy, 1991). Calibration deals with the meaning of probabilities in the forecast. Discrimination is the ability to distinguish between event and non-event, appraising the existance of a signal in the forecast when an event materialises and its absence when it does not. In this study, calibration is verified using the reliability component of the Brier score, while discrimination is verified using the ROC curve and the area under the ROC curve. Both scores will be computed for single-WT and multiple-WT ecPoint-Rainfall forecasts and raw ECMWF ENS that will be considered as a baseline performance. 

### ROC and Area under the ROC (AURC)

The ROC curve plots the hit rate (HR) versus the false alarm rate (FAR) of an event for incremental decision threshold. A ROC curve is defined by the line joining successive ROC points, where each point corresponds to results for increasing decision threshold, from the top right to the bottom left corner of the unit square. The decision variable is the number of members exceeding the event-threshold (interpreted as a raw probability forecast), so the issued forecast takes values in [0, 1/M, 2/M, ..., M/M=1] for an ensemble of size M. As a consequence, the resulting ROC curve is based on up (M+1) points. The ROC curve is then completed by adding the points (0,0) and (1,1). The AURC is then estimated by the sum of the trapeziums formed by conneting with straight lines the (M+1) ROC points, including the (0,0) and (1,1) points. Therefore, this estimation of the AURC is known as the "trapezoidal approximation" (T-AURC). For rare events, there is a tendency for the points on the ROC to cluster towards the lower left corner of the unit square (Casati et al., 2008). When computing T-AURC, a straight line is drawn between the last meaningful point on the ROC curve and the top-raight corner to close the ROC curve, giving the impression that part of the curve is missing. How much of the curve is missing depends on the lowest category, defined here by the ensemble size and the base-rate of the event.

In order to draw a "full" ROC curve, one can fit the ROC curve with different models proposed in the literature, e.g. binormal model (Harvey et al., 1992; Wilson, 2000; Atger, 2004) or the most recent two-parameter beta family (Gneiting and Vogel, 2021) or the method that uses the mean of the probabilistic forecasts to complete the ROC (Bouallegue and Richardson, 2021). In this study, the binormal model will be used. The binormal model is based on the assumption that HR and FAR are integrations of a unit normal Gaussian distribution, and the AURC (denoted hear as Z-AURC) is computed using equations (2) and (3) in Harvey et al, 1992. When applied to ensemble-derived probability forecasts for rare events, this approach consists effectively in an extrapolation to a hypothetical continuous decision variable based on the limited set of decision thresholds materially assessable (Bouallegue and Richardson, 2021). because such a decision variable may not be achievable in practice, Z-AURC is sometimes considered as a measure of the potential discrimination ability that could be achieved for an "unlimited ensemble size" (Bowler et al., 2006).

T-AURC and Z-AURC summary metrics can provide very different comparative results, with typically T-AURC being smaller for rare events as ROC points tend to cluster on the bottom-left corner of the unit square. Therefore, T-AURC statistics point towards a larger predictive skill of the low event-thresholds probability forecasts, which the consequence of thinking that users would practically benefit more from using the lower event-threshold. The use of one or the other depends on the research question at hand, and in particular on whether the practical usefulness or the intrinsic information content of the ensemble forecast is the key aspect to be assessed. In this study, both aspects are of interest. Therefore, both T-AURC and Z-AURC will be estimated to assess the "real" and the "potential" dsicrimination ability of the single-WT and multiple-WT ecPoint-Rainfall forecasts. 

# References

Atger, F., 2004: Estimation of the reliability of ensemble-based probabilistic forecasts. Quart. J. Roy. Meteor. Soc., 130, 627–646, doi:10.1256/qj.03.23.

Bouallegue Z Ben., Richardson DS. 2021. On the ROC Area of Ensemble Forecasts for Rare Events. Preprints.

Bowler, N. E., C. E. Pierce, and A. W. Seed, 2006: Steps: A probabilistic precipitation forecasting scheme which merges an extrapolation nowcast with downscaled nwp. Quart. J. Roy. Meteor. Soc., 132 (620), 2127–2155, doi:10.1256/qj.04.100.

Casati, B., and Coauthors, 2008: Forecast verification: current status and future directions. Met. Apps, 15 (1), 3–18, doi:10.1002/met.52.

Gneiting T., Vogel P. 2021. Receiver operating characteristic (ROC) curves: equivalences, beta model, and minimum distance estimation. Mach. Learn.:1–13.

Harvey, L. O., J. K. Hammond, C. Lusk, and E. Mross, 1992: The application of signal detection theory to weather forecasting behavior. Mon. Wea. Rev., 120, 863–883, doi:10.1175/1520-0493(1992)120?0863: TAOSDT?2.0.CO;2.

Murphy, A. H., 1991: Forecast verification: its complexity and dimensionality. Mon. Wea. Rev., 119, 1590– 1601, doi:10.1175/1520-0493(1991)119?1590:FVICAD?2.0.CO;2

Wilson, L. J., 2000: Comments on “Probabilistic Predictions of Precipitation Using the ECMWF Ensemble Prediction System”. Wea. Forecasting, 15 (3), 361–364, doi:10.1175/1520-0434(2000)015?0361:COPPOP? 2.0.CO;2.