Skip to content

Commit

Permalink
Metrics: make CRPS handle obs outside the fx support (#781)
Browse files Browse the repository at this point in the history
* tests for CPRS with obs outside forecast support (WIP)

* CRPS: handle obs outside forecast support

Revise the CRPS calculation to handle the case where the observation is
outside the forecast support. For example, if Prob(Power <= 10 MW) = 0%
but then the observation is 9 MW. Or if Prob(Power <= 30 MW) = 100% but
ten the observation is 31 MW. This change required some subtle changes
to how the vectorized calculation is performed. Also, this commit
switches the integration from the rectangular rule to a quadrature rule,
which seems result in more accurate CRPS calculations when the number of
forecast CDF intervals is low (e.g., 10 or less).

This commit also updates the docstring of the CRPS function and the
tests, including comparisons against examples where the forecast is
defined by a continuous parametric distribution that allows calculating
the CRPS analytically.

Note: this branch still needs to validate the CRPS skill score
calculation and related tests. Also, it would be good to include some
"simpler" CRPS calculation examples (e.g., with 3 or 5 CDF intervals,
but that may not be practical.

* CRPS: simplify integration using np.trapz

Simplify the integration using the numpy trapezoidal function. The
result is identical to the prior code (since underneath it's the same
operations), but using np.trapz() makes it clearer what method is being
used and allows users to read up on the numpy documentation regarding
the integration method.

* Add tests for CRPS and CRPSS (skill score)

Add additional tests for the CRPS and CRPSS (CRPS skill score)
functions, including "simple" examples with 3 CDF intervals to help show
the logical of the trapezoidal integration.

* CRPS: allow numpy broadcasting with n >= 2

Modify how the forecast CDF is expanded and the observation indicator
function is calculated to support numpy broadcasting when the number of
samples is greater than or equal to two (i.e., n >= 2).

* tests/probabilistic: use np.array for obs

Revert to assuming observations are provided as numpy arrays, including
in the case where the observation is for a single sample. This matches
the previous logic and helps prevent issues in other parts of the code
base.

* tests/calculator: update CRPS examples

Note that since the CRPS calculation now uses a more generalized
numerical integration setup, some of the example CRPS values had to be
adjusted as the "simple" CRPS examples in the calculator tests are not
very realistic (e.g., only 3 forecast CDF intervals). Rather than
complicate the examples in these tests, I instead corrected the CRPS
values for the given examples.

Also, this commit corrects a mispelling of the Brier score name in a
code comment.

* Add whatsnew/1.0.13.rst with CRPS revision

* whatsnew: correct name

* Address reviewer comments and suggested revisions.
  • Loading branch information
dplarson committed Feb 11, 2022
1 parent 881fbab commit d4d85a3
Show file tree
Hide file tree
Showing 4 changed files with 259 additions and 154 deletions.
18 changes: 18 additions & 0 deletions docs/source/whatsnew/1.0.13.rst
@@ -0,0 +1,18 @@
.. _whatsnew_1013:

.. py:currentmodule:: solarforecastarbiter
1.0.13 (February 10, 2022)
--------------------------

Fixed
~~~~~~~~~~~~
* Revised CRPS metric to handle observations outside the forecast support. (:pull:`781`, :issue:`779`)

Contributors
~~~~~~~~~~~~

* Will Holmgren (:ghuser:`wholmgren`)
* Leland Boeman (:ghuser:`lboeman`)
* David Larson (:ghuser:`dplarson`)
88 changes: 58 additions & 30 deletions solarforecastarbiter/metrics/probabilistic.py
Expand Up @@ -468,15 +468,19 @@ def sharpness(fx_lower, fx_upper):
def continuous_ranked_probability_score(obs, fx, fx_prob):
"""Continuous Ranked Probability Score (CRPS).
CRPS = 1/n sum_{i=1}^n int (F_i - O_i)^2 dx
where F_i is the CDF of the forecast at time i and O_i is the CDF
associated with the observed value obs_i:
.. math::
O_{i, j} = 1 if obs_i <= fx_{i, j}, else O_{i, j} = 0
\\text{CRPS} = \\frac{1}{n} \\sum_{i=1}^n \\int_{-\\infty}^{\\infty}
(F_i(x) - \\mathbf{1} \\{x \\geq y_i \\})^2 dx
where obs_i is the observation at time i, and fx_{i, j} is the forecast at
time i for CDF interval j.
where :math:`F_i(x)` is the CDF of the forecast at time :math:`i`,
:math:`y_i` is the observation at time :math:`i`, and :math:`\\mathbf{1}`
is the indicator function that transforms the observation into a step
function (1 if :math:`x \\geq y`, 0 if :math:`x < y`). In other words, the
CRPS measures the difference between the forecast CDF and the empirical CDF
of the observation. The CRPS has the same units as the observation. Lower
CRPS values indicate more accurate forecasts, where a CRPS of 0 indicates a
perfect forecast. [1]_ [2]_ [3]_
Parameters
----------
Expand All @@ -492,7 +496,8 @@ def continuous_ranked_probability_score(obs, fx, fx_prob):
Returns
-------
crps : float
The Continuous Ranked Probability Score [unitless].
The Continuous Ranked Probability Score, with the same units as the
observation.
Raises
------
Expand All @@ -502,22 +507,27 @@ def continuous_ranked_probability_score(obs, fx, fx_prob):
array with d values or b) the forecasts are given as 2D arrays (n,d)
but do not contain at least 2 CDF intervals (i.e. d < 2).
Examples
--------
Forecast probabilities of <= 10 MW and <= 20 MW:
>>> fx = np.array([[10, 20], [10, 20]])
>>> fx_prob = np.array([[30, 50], [65, 100]])
>>> obs = np.array([8, 12])
>>> continuous_ranked_probability_score(obs, fx, fx_prob)
4.5625
Notes
-----
The CRPS can be calculated analytically when the forecast CDF is of a
continuous parametric distribution, e.g., Gaussian distribution. However,
since the Solar Forecast Arbiter makes no assumptions regarding how a
probabilistic forecast was generated, the CRPS is instead calculated using
numerical integration of the discretized forecast CDF. Therefore, the
accuracy of the CRPS calculation is limited by the precision of the
forecast CDF. In practice, this means the forecast CDF should 1) consist of
at least 10 intervals and 2) cover probabilities from 0% to 100%.
Forecast thresholds for constant probabilities (25%, 75%):
>>> fx = np.array([[5, 15], [8, 14]])
>>> fx_prob = np.array([[25, 75], [25, 75]])
>>> obs = np.array([8, 10])
>>> continuous_ranked_probability_score(obs, fx, fx_prob)
0.5
References
----------
.. [1] Matheson and Winkler (1976) "Scoring rules for continuous
probability distributions." Management Science, vol. 22, pp.
1087-1096. doi: 10.1287/mnsc.22.10.1087
.. [2] Hersbach (2000) "Decomposition of the continuous ranked probability
score for ensemble prediction systems." Weather Forecast, vol. 15,
pp. 559-570. doi: 10.1175/1520-0434(2000)015<0559:DOTCRP>2.0.CO;2
.. [3] Wilks (2019) "Statistical Methods in the Atmospheric Sciences", 4th
ed. Oxford; Waltham, MA; Academic Press.
"""

Expand All @@ -528,19 +538,37 @@ def continuous_ranked_probability_score(obs, fx, fx_prob):
elif np.shape(fx)[1] < 2:
raise ValueError("forecasts must have d >= 2 CDF intervals "
f"(expected >= 2, got {np.shape(fx)[1]})")
else:
obs = np.tile(obs, (fx.shape[1], 1)).T

# event: 0=did not happen, 1=did happen
o = np.where(obs <= fx, 1.0, 0.0)
n = len(fx)

# extend CDF min to ensure obs within forecast support
# fx.shape = (n, d) ==> (n, d + 1)
fx_min = np.minimum(obs, fx[:, 0])
fx = np.hstack([fx_min[:, np.newaxis], fx])
fx_prob = np.hstack([np.zeros([n, 1]), fx_prob])

# extend CDF max to ensure obs within forecast support
# fx.shape = (n, d + 1) ==> (n, d + 2)
idx = (fx[:, -1] < obs)
fx_max = np.maximum(obs, fx[:, -1])
fx = np.hstack([fx, fx_max[:, np.newaxis]])
fx_prob = np.hstack([fx_prob, np.full([n, 1], 100)])

# indicator function:
# - left of the obs is 0.0
# - obs and right of the obs is 1.0
o = np.where(fx >= obs[:, np.newaxis], 1.0, 0.0)

# correct behavior when obs > max fx:
# - should be 0 over range: max fx < x < obs
o[idx, -1] = 0.0

# forecast probabilities [unitless]
f = fx_prob / 100.0

# integrate along each sample, then average all samples
integrand = (f - o) ** 2
dx = np.diff(fx, axis=1)
crps = np.mean(np.sum(integrand[:, :-1] * dx, axis=1))
crps = np.mean(np.trapz((f - o) ** 2, x=fx, axis=1))

return crps


Expand Down
18 changes: 9 additions & 9 deletions solarforecastarbiter/metrics/tests/test_calculator.py
Expand Up @@ -361,12 +361,12 @@ def test_calculate_metrics_with_probablistic(single_observation,
probabilistic._REQ_DIST)

expected = {
0: ('total', 'crps', '0', 17.247),
2: ('year', 'crps', '2019', 17.247),
4: ('season', 'crps', 'JJA', 17.247),
6: ('month', 'crps', 'Aug', 17.247),
8: ('hour', 'crps', '0', 19.801000000000002),
9: ('hour', 'crps', '1', 19.405)
0: ('total', 'crps', '0', 21.41819),
2: ('year', 'crps', '2019', 21.41819),
4: ('season', 'crps', 'JJA', 21.41819),
6: ('month', 'crps', 'Aug', 21.41819),
8: ('hour', 'crps', '0', 28.103),
9: ('hour', 'crps', '1', 26.634375),
}
attr_order = ('category', 'metric', 'index', 'value')
for k, expected_attrs in expected.items():
Expand Down Expand Up @@ -1006,7 +1006,7 @@ def test_apply_deterministic_bad_metric_func():
('bs', [], [], [], None, None, np.NaN),
('bs', [1, 1, 1], [100, 100, 100], [1, 1, 1], None, None, 0.),
# Briar Skill Score with no reference
# Brier Skill Score with no reference
('bss', [1, 1, 1], [100, 100, 100], [0, 0, 0],
None, None, 1.),
('bss', [1, 1, 1], [100, 100, 100], [1, 1, 1],
Expand All @@ -1017,11 +1017,11 @@ def test_apply_deterministic_bad_metric_func():
('unc', [1, 1, 1], [100, 100, 100], [1, 1, 1], None, None, 0.),
# CRPS single forecast
('crps', [[1, 1]], [[100, 100]], [[0, 0]], None, None, 0.),
('crps', [[1, 1]], [[100, 100]], [0], None, None, 0.5),
# CRPS mulitple forecasts
('crps', [[1, 1, 1], [2, 2, 2], [3, 3, 3]],
[[100, 100, 100], [100, 100, 100], [100, 100, 100]],
[0, 0, 0], None, None, 0.)
[0, 0, 0], None, None, 1.)
])
def test_apply_probabilistic_metric_func(metric, fx, fx_prob, obs,
ref_fx, ref_fx_prob, expect,
Expand Down

0 comments on commit d4d85a3

Please sign in to comment.