Metrics: make CRPS handle obs outside the fx support (#781)

* tests for CPRS with obs outside forecast support (WIP) * CRPS: handle obs outside forecast support Revise the CRPS calculation to handle the case where the observation is outside the forecast support. For example, if Prob(Power <= 10 MW) = 0% but then the observation is 9 MW. Or if Prob(Power <= 30 MW) = 100% but ten the observation is 31 MW. This change required some subtle changes to how the vectorized calculation is performed. Also, this commit switches the integration from the rectangular rule to a quadrature rule, which seems result in more accurate CRPS calculations when the number of forecast CDF intervals is low (e.g., 10 or less). This commit also updates the docstring of the CRPS function and the tests, including comparisons against examples where the forecast is defined by a continuous parametric distribution that allows calculating the CRPS analytically. Note: this branch still needs to validate the CRPS skill score calculation and related tests. Also, it would be good to include some "simpler" CRPS calculation examples (e.g., with 3 or 5 CDF intervals, but that may not be practical. * CRPS: simplify integration using np.trapz Simplify the integration using the numpy trapezoidal function. The result is identical to the prior code (since underneath it's the same operations), but using np.trapz() makes it clearer what method is being used and allows users to read up on the numpy documentation regarding the integration method. * Add tests for CRPS and CRPSS (skill score) Add additional tests for the CRPS and CRPSS (CRPS skill score) functions, including "simple" examples with 3 CDF intervals to help show the logical of the trapezoidal integration. * CRPS: allow numpy broadcasting with n >= 2 Modify how the forecast CDF is expanded and the observation indicator function is calculated to support numpy broadcasting when the number of samples is greater than or equal to two (i.e., n >= 2). * tests/probabilistic: use np.array for obs Revert to assuming observations are provided as numpy arrays, including in the case where the observation is for a single sample. This matches the previous logic and helps prevent issues in other parts of the code base. * tests/calculator: update CRPS examples Note that since the CRPS calculation now uses a more generalized numerical integration setup, some of the example CRPS values had to be adjusted as the "simple" CRPS examples in the calculator tests are not very realistic (e.g., only 3 forecast CDF intervals). Rather than complicate the examples in these tests, I instead corrected the CRPS values for the given examples. Also, this commit corrects a mispelling of the Brier score name in a code comment. * Add whatsnew/1.0.13.rst with CRPS revision * whatsnew: correct name * Address reviewer comments and suggested revisions.
SolarArbiter · Feb 11, 2022 · d4d85a3 · d4d85a3
1 parent 881fbab
commit d4d85a3
Show file tree

Hide file tree

Showing 4 changed files with 259 additions and 154 deletions.
diff --git a/docs/source/whatsnew/1.0.13.rst b/docs/source/whatsnew/1.0.13.rst
@@ -0,0 +1,18 @@
+.. _whatsnew_1013:
+
+.. py:currentmodule:: solarforecastarbiter
+
+
+1.0.13 (February 10, 2022)
+--------------------------
+
+Fixed
+~~~~~~~~~~~~
+* Revised CRPS metric to handle observations outside the forecast support. (:pull:`781`, :issue:`779`)
+
+Contributors
+~~~~~~~~~~~~
+
+* Will Holmgren (:ghuser:`wholmgren`)
+* Leland Boeman (:ghuser:`lboeman`)
+* David Larson (:ghuser:`dplarson`)
diff --git a/solarforecastarbiter/metrics/probabilistic.py b/solarforecastarbiter/metrics/probabilistic.py
@@ -468,15 +468,19 @@ def sharpness(fx_lower, fx_upper):
 def continuous_ranked_probability_score(obs, fx, fx_prob):
     """Continuous Ranked Probability Score (CRPS).
 
-        CRPS = 1/n sum_{i=1}^n int (F_i - O_i)^2 dx
-
-    where F_i is the CDF of the forecast at time i and O_i is the CDF
-    associated with the observed value obs_i:
+    .. math::
 
-        O_{i, j} = 1 if obs_i <= fx_{i, j}, else O_{i, j} = 0
+        \\text{CRPS} = \\frac{1}{n} \\sum_{i=1}^n \\int_{-\\infty}^{\\infty}
+        (F_i(x) - \\mathbf{1} \\{x \\geq y_i \\})^2 dx
 
-    where obs_i is the observation at time i, and fx_{i, j} is the forecast at
-    time i for CDF interval j.
+    where :math:`F_i(x)` is the CDF of the forecast at time :math:`i`,
+    :math:`y_i` is the observation at time :math:`i`, and :math:`\\mathbf{1}`
+    is the indicator function that transforms the observation into a step
+    function (1 if :math:`x \\geq y`, 0 if :math:`x < y`). In other words, the
+    CRPS measures the difference between the forecast CDF and the empirical CDF
+    of the observation. The CRPS has the same units as the observation. Lower
+    CRPS values indicate more accurate forecasts, where a CRPS of 0 indicates a
+    perfect forecast. [1]_ [2]_ [3]_
 
     Parameters
     ----------
@@ -492,7 +496,8 @@ def continuous_ranked_probability_score(obs, fx, fx_prob):
     Returns
     -------
     crps : float
-        The Continuous Ranked Probability Score [unitless].
+        The Continuous Ranked Probability Score, with the same units as the
+        observation.
 
     Raises
     ------
@@ -502,22 +507,27 @@ def continuous_ranked_probability_score(obs, fx, fx_prob):
         array with d values or b) the forecasts are given as 2D arrays (n,d)
         but do not contain at least 2 CDF intervals (i.e. d < 2).
 
-    Examples
-    --------
-
-    Forecast probabilities of <= 10 MW and <= 20 MW:
-    >>> fx = np.array([[10, 20], [10, 20]])
-    >>> fx_prob = np.array([[30, 50], [65, 100]])
-    >>> obs = np.array([8, 12])
-    >>> continuous_ranked_probability_score(obs, fx, fx_prob)
-    4.5625
+    Notes
+    -----
+    The CRPS can be calculated analytically when the forecast CDF is of a
+    continuous parametric distribution, e.g., Gaussian distribution. However,
+    since the Solar Forecast Arbiter makes no assumptions regarding how a
+    probabilistic forecast was generated, the CRPS is instead calculated using
+    numerical integration of the discretized forecast CDF. Therefore, the
+    accuracy of the CRPS calculation is limited by the precision of the
+    forecast CDF. In practice, this means the forecast CDF should 1) consist of
+    at least 10 intervals and 2) cover probabilities from 0% to 100%.
 
-    Forecast thresholds for constant probabilities (25%, 75%):
-    >>> fx = np.array([[5, 15], [8, 14]])
-    >>> fx_prob = np.array([[25, 75], [25, 75]])
-    >>> obs = np.array([8, 10])
-    >>> continuous_ranked_probability_score(obs, fx, fx_prob)
-    0.5
+    References
+    ----------
+    .. [1] Matheson and Winkler (1976) "Scoring rules for continuous
+           probability distributions." Management Science, vol. 22, pp.
+           1087-1096. doi: 10.1287/mnsc.22.10.1087
+    .. [2] Hersbach (2000) "Decomposition of the continuous ranked probability
+           score for ensemble prediction systems." Weather Forecast, vol. 15,
+           pp. 559-570. doi: 10.1175/1520-0434(2000)015<0559:DOTCRP>2.0.CO;2
+    .. [3] Wilks (2019) "Statistical Methods in the Atmospheric Sciences", 4th
+           ed. Oxford; Waltham, MA; Academic Press.
 
     """
 
@@ -528,19 +538,37 @@ def continuous_ranked_probability_score(obs, fx, fx_prob):
     elif np.shape(fx)[1] < 2:
         raise ValueError("forecasts must have d >= 2 CDF intervals "
                          f"(expected >= 2, got {np.shape(fx)[1]})")
-    else:
-        obs = np.tile(obs, (fx.shape[1], 1)).T
 
-    # event: 0=did not happen, 1=did happen
-    o = np.where(obs <= fx, 1.0, 0.0)
+    n = len(fx)
+
+    # extend CDF min to ensure obs within forecast support
+    # fx.shape = (n, d) ==> (n, d + 1)
+    fx_min = np.minimum(obs, fx[:, 0])
+    fx = np.hstack([fx_min[:, np.newaxis], fx])
+    fx_prob = np.hstack([np.zeros([n, 1]), fx_prob])
+
+    # extend CDF max to ensure obs within forecast support
+    # fx.shape = (n, d + 1) ==> (n, d + 2)
+    idx = (fx[:, -1] < obs)
+    fx_max = np.maximum(obs, fx[:, -1])
+    fx = np.hstack([fx, fx_max[:, np.newaxis]])
+    fx_prob = np.hstack([fx_prob, np.full([n, 1], 100)])
+
+    # indicator function:
+    # - left of the obs is 0.0
+    # - obs and right of the obs is 1.0
+    o = np.where(fx >= obs[:, np.newaxis], 1.0, 0.0)
+
+    # correct behavior when obs > max fx:
+    # - should be 0 over range: max fx < x < obs
+    o[idx, -1] = 0.0
 
     # forecast probabilities [unitless]
     f = fx_prob / 100.0
 
     # integrate along each sample, then average all samples
-    integrand = (f - o) ** 2
-    dx = np.diff(fx, axis=1)
-    crps = np.mean(np.sum(integrand[:, :-1] * dx, axis=1))
+    crps = np.mean(np.trapz((f - o) ** 2, x=fx, axis=1))
+
     return crps
 
 

diff --git a/solarforecastarbiter/metrics/tests/test_calculator.py b/solarforecastarbiter/metrics/tests/test_calculator.py
@@ -361,12 +361,12 @@ def test_calculate_metrics_with_probablistic(single_observation,
                          probabilistic._REQ_DIST)
 
     expected = {
-        0: ('total', 'crps', '0', 17.247),
-        2: ('year', 'crps', '2019', 17.247),
-        4: ('season', 'crps', 'JJA', 17.247),
-        6: ('month', 'crps', 'Aug', 17.247),
-        8: ('hour', 'crps', '0', 19.801000000000002),
-        9: ('hour', 'crps', '1', 19.405)
+        0: ('total', 'crps', '0', 21.41819),
+        2: ('year', 'crps', '2019', 21.41819),
+        4: ('season', 'crps', 'JJA', 21.41819),
+        6: ('month', 'crps', 'Aug', 21.41819),
+        8: ('hour', 'crps', '0', 28.103),
+        9: ('hour', 'crps', '1', 26.634375),
     }
     attr_order = ('category', 'metric', 'index', 'value')
     for k, expected_attrs in expected.items():
@@ -1006,7 +1006,7 @@ def test_apply_deterministic_bad_metric_func():
     ('bs', [], [], [], None, None, np.NaN),
     ('bs', [1, 1, 1], [100, 100, 100], [1, 1, 1], None, None, 0.),
 
-    # Briar Skill Score with no reference
+    # Brier Skill Score with no reference
     ('bss', [1, 1, 1], [100, 100, 100], [0, 0, 0],
         None, None, 1.),
     ('bss', [1, 1, 1], [100, 100, 100], [1, 1, 1],
@@ -1017,11 +1017,11 @@ def test_apply_deterministic_bad_metric_func():
     ('unc', [1, 1, 1], [100, 100, 100], [1, 1, 1], None, None, 0.),
 
     # CRPS single forecast
-    ('crps', [[1, 1]], [[100, 100]], [[0, 0]], None, None, 0.),
+    ('crps', [[1, 1]], [[100, 100]], [0], None, None, 0.5),
     # CRPS mulitple forecasts
     ('crps', [[1, 1, 1], [2, 2, 2], [3, 3, 3]],
              [[100, 100, 100], [100, 100, 100], [100, 100, 100]],
-             [0, 0, 0], None, None, 0.)
+             [0, 0, 0], None, None, 1.)
 ])
 def test_apply_probabilistic_metric_func(metric, fx, fx_prob, obs,
                                          ref_fx, ref_fx_prob, expect,