Proper use of `loc` parameter in `gamma`,`fisk` dists (SPI/SPEI) #1720

coxipi · 2024-04-18T20:35:44Z

Pull Request Checklist:

This PR addresses an already opened issue (for bug fixes / features)
- This PR fixes #xyz
Tests for the changes have been added (for bug fixes / features)
- (If applicable) Documentation has been added / updated (for bug fixes / features)
CHANGES.rst has been updated (with summary of main changes)
- Link to issue (:issue:number) and pull request (:pull:number) has been added

What kind of change does this PR introduce?

Spei used to rely on an offset to ensure that distribution of values are positive when using distributions such as gamma, fisk. Now, instead, we properly use the loc parameter in those distributions that generalize the distributions and allow negative values. For instance, the gamma disitribution is a 2-parameter function, but in scipy it's generalized to 3 parameters, with a loc parameter. This extends the support of gamma from $x \in [0,\infty]$ to $x \in [\text{loc}, \infty]$.

The code was already compatible with this idea, just had to change _fit_start so it can compute estimates of the parameters in the cases with negatives values.

Does this PR introduce a breaking change?

Removing the offset from SPEI. Also, the _fit_start can now compute an estimate of loc which can affect what value we obtain when optimizing 3-parameters distributions. Even when the loc is fixed to 0 with floc=0 in fitkwargs, the estimate of fisk was modified. The previous formulation was obtained by applying more or less the idea in the method of moments, but now I obtained an approximation that's more rigorous.
The case of zero inflated distribution has been changed: Now, it is the calibration data that determines the probability of zero, as it should.
Method APP can now only be used if fitkwargs['floc'] is given as input by the user.

Other information:

coxipi · 2024-04-18T20:37:08Z

I started a new branch spi_loc2 because there were many errors hard to track in spi_loc. Opening a new PR, this is the old one: #1653

xclim/indices/_agro.py

coxipi · 2024-04-22T19:02:55Z

I was just trying to tie ends together, and was exploring the difference between fitting with zero inflated distributions vs. normal distributions.

I'm considering the case where the dataset is inflated in some small value that I inject, but I don't do any special treatment for this value. I realize that things work well unless the inflated value is 0:

pr0 = open_dataset("sdba/CanESM2_1950-2100.nc")
# Take the first 30 july 1rst 
pr0  = pr0.isel(location=1).isel(time=slice(179,365*30,365)).pr.values
pr0 = np.where(pr0>0, pr0, 1e-6)
min_prs = [-1e-10,-1e-20, 0, 1e-20, 1e-18, 1e-14, 1e-10]
print(["min_pr", "Pr(X>=min_pr) [fraction]"])
for min_pr in min_prs:
    pr = pr0.copy()
    pr[0:11] = min_pr 
    a, loc,scale = sc.stats.gamma.fit(pr)
    print([min_pr, sc.stats.gamma.cdf(min_pr,a,loc=loc,scale=scale)])

yields:

['min_pr', 'Pr(X>=min_pr) [fraction]']
[-1e-10, 0.07865019433968551]
[-1e-20, 0.07258213369500696]
[0, 5.552055328806407e-20]
[1e-20, 0.07787936999249587]
[1e-18, 0.07488806373878135]
[1e-14, 0.08371030972954428]
[1e-10, 0.08298754087251842]

I inflated the datasets, so we could expect 'Pr(X>=min_pr)' to be about 33%. We have approximate fits, so it's ok to have 7-8%, If I increase the bound in sc.stats.gamma.cdf(min_pr,a,loc=loc,scale=scale) say by adding 1e-15 which is still well below other values that were not injected, then I get indeed ~30%, except for the case of 0 precipitations.

and we can clearly see something is off with the 0-inflated case if we plot the pdfs:

Using a form of jitter_under_thresh, say that I inflate with precipitation values of 0, but then jitter between 10^(-21) and 10^(-20), I get something more in line with other results.

If I inflate less values, 10 or less, it seems the problem goes away (I also added a small value in the cdf computation, but that doesn't explain the change in behaviour as far I could tell):

import numpy as np
import scipy as sc
import matplotlib.pyplot as plt
pr0 = open_dataset("sdba/CanESM2_1950-2100.nc")
# Take the first 30 july 1rst 
pr0  = pr0.isel(location=1).isel(time=slice(179,365*30,365)).pr.values
pr0 = np.where(pr0>0, pr0, 1e-6)
min_prs = [
    0,
    1e-20,
    1e-17,
    1e-16,
    ]
print(["min_pr", "Pr(X>=min_pr)"])
for min_pr in min_prs:
    pr = pr0.copy()
    pr[0:10] = min_pr
    a, loc,scale = sc.stats.gamma.fit(pr)
    print([min_pr, sc.stats.gamma.cdf(min_pr+1e-21,a,loc=loc,scale=scale)])
    x = np.linspace(0, 1e-15, 1000)
    plt.plot(x,sc.stats.gamma.pdf(x, a, loc=loc, scale=scale),label=min_pr)
    plt.yscale("log")
    plt.xlabel("x")
    plt.ylabel("pdf")
plt.legend(title="min_pr")

Going about 11/30 values, then more cases look like the problematic pr==0.

Anyways, it's just worth keeping this in mind. I think the lesson is to be careful when using numbers where the precision is too low, and 0 values are afllicted by this. If we work with the appropriate zero_inflated options or perform some jitter_under_thresh with a very small value, in both cases, this should not be an issue.

CHANGES.rst

coxipi · 2024-04-23T13:24:37Z

Apparently the PR was never in draft mode? Just wanted to say it's ready for your eyes. ⚠️

Co-authored-by: Pascal Bourgault <bourgault.pascal@ouranos.ca>

xclim/indices/stats.py

for more information, see https://pre-commit.ci

xclim/indices/_agro.py

aulemahal

I like this! thanks a lot !

aulemahal · 2024-05-02T18:20:36Z

tests/test_indices.py

        if method == "APP":
-            fitkwargs["floc"] = 0
+            # same offset as in climate indices
+            offset = convert_units_to("1 mm/d", wb, context="hydro")


Suggested change

offset = convert_units_to("1 mm/d", wb, context="hydro")

offset = convert_units_to("1000 mm/month", wb, context="hydro")

Isn't this the hardcoded offset of climate_indices ?
https://github.com/monocongo/climate_indices/blob/d108eee982abae06a415e888319b8078af868558/src/climate_indices/indices.py#L280C62-L280C69

yes, climate_indices and xclim do match, but with the current datasets used in our test, we stil have negative values with the climate_indices offset. I wll wait for a next PR where we show many comparisons to many other libraries to sort this out

but I agree, I didn't have the correct offset here

xclim/indices/stats.py

Co-authored-by: Pascal Bourgault <bourgault.pascal@ouranos.ca>

### Pull Request Checklist: - [x] This PR addresses an already opened issue (for bug fixes / features) - This PR fixes #xyz - [x] (If applicable) Documentation has been added / updated (for bug fixes / features). - [x] (If applicable) Tests have been added. - [x] CHANGES.rst has been updated (with summary of main changes). - [x] Link to issue (:issue:`number`) and pull request (:pull:`number`) has been added. ### What kind of change does this PR introduce? * Tests for the GIS module were too strict. * Changes were made in `xclim v0.49.0` with how the `gamma` and `fisk` distributions were initiated in `xclim`, which changed the fitting results and parametric quantiles. It just so happened that we arbitrarily used the `gamma` for our tests. To be backwards-compatible, those tests were changed to the `gumbel_r`. ### Does this PR introduce a breaking change? - No. ### Other information: Changes to the `gamma` distribution: Ouranosinc/xclim#1477 Ouranosinc/xclim#1720

coxipi added 8 commits March 26, 2024 10:28

refactor SPI/SPEI and add loc to _fit_start

0bfc7f3

debugging

cff755e

adjust test to method changes

74b7180

refactor SPI/SPEI

394e95a

fix SPEI offset, simpler loc estimation in gamma, adjust tests

838c362

update doc/warnings

52d2af6

add 'zeroinflated' option, adjust test results

ffb765c

Merge branch 'main' of github.com:Ouranosinc/xclim into spi_loc2

2125d17

github-actions bot added the indicators Climate indices and indicators label Apr 18, 2024

updates CHANGES

eb00202

VascoSch92 reviewed Apr 18, 2024

View reviewed changes

xclim/indices/_agro.py Outdated Show resolved Hide resolved

better default, more documentation, typos

5d780a8

github-actions bot added the docs Improvements to documenation label Apr 18, 2024

coxipi added 5 commits April 19, 2024 08:07

add inputKind dict & update tests

4e4c449

update tests (no cal field)

6eef30f

let standardized_index use params only (other inputs=None)

5c6fa34

update doc

6bec8ea

Trigger Build

7ee0aa9

coxipi added 2 commits April 22, 2024 16:51

Merge branch 'main' of github.com:Ouranosinc/xclim into spi_loc2

adddfaf

fix zero_inflated option and add test

ff6a4f0

coxipi commented Apr 23, 2024

View reviewed changes

CHANGES.rst Outdated Show resolved Hide resolved

update CHANGES

1c6169b

coxipi marked this pull request as draft April 23, 2024 13:23

Zeitsperre marked this pull request as ready for review April 23, 2024 13:32

updated SPI doc

b79bc69

Zeitsperre requested review from huard and aulemahal April 25, 2024 14:03

coxipi and others added 2 commits April 29, 2024 12:21

update version depr. warn & reformulation

95289ee

Co-authored-by: Pascal Bourgault <bourgault.pascal@ouranos.ca>

more doc on zero-inflated dists

0c36e88

aulemahal reviewed May 1, 2024

View reviewed changes

xclim/indices/stats.py Outdated Show resolved Hide resolved

Zeitsperre and others added 4 commits May 1, 2024 16:40

Merge branch 'main' into spi_loc2

7b7662a

Merge branch 'main' into spi_loc2

2e984ee

[pre-commit.ci] auto fixes from pre-commit.com hooks

b6727db

for more information, see https://pre-commit.ci

update test for new behaviour

e9c0a99

Zeitsperre mentioned this pull request May 2, 2024

Prepare v0.49.0 #1741

Merged

5 tasks

coxipi added 2 commits May 2, 2024 11:16

leave the offset option in standardized_index_fit_params

8fe6547

Merge branch 'spi_loc2' of github.com:Ouranosinc/xclim into spi_loc2

e527533

aulemahal reviewed May 2, 2024

View reviewed changes

xclim/indices/_agro.py Outdated Show resolved Hide resolved

reimplement offset values in SPEI

c0e80f4

Zeitsperre added the priority Immediate priority label May 2, 2024

coxipi added 2 commits May 2, 2024 13:57

fitkwargs['floc'] to replace offset`

fcbc77c

fix missing arguments in standardized_index

3e8eb62

coxipi commented May 2, 2024

View reviewed changes

xclim/indices/_agro.py Show resolved Hide resolved

coxipi commented May 2, 2024

View reviewed changes

xclim/indices/_agro.py Outdated Show resolved Hide resolved

add missing fitkwargs in tests

cf2270e

aulemahal approved these changes May 2, 2024

View reviewed changes

github-actions bot added the approved Approved for additional tests label May 2, 2024

coxipi and others added 6 commits May 2, 2024 14:46

update CHANGES

0fa7849

clearer deprecation warning

659d67f

Co-authored-by: Pascal Bourgault <bourgault.pascal@ouranos.ca>

Merge branch 'main' into spi_loc2

24aef6d

remove mentions of SPEI/climate_indices for now

275afce

Merge branch 'spi_loc2' of github.com:Ouranosinc/xclim into spi_loc2

df6b55a

revert unwanted changes in expected test output

1cc8372

coxipi merged commit 4f2e633 into main May 2, 2024
19 checks passed

coxipi deleted the spi_loc2 branch May 2, 2024 20:42

RondeauG mentioned this pull request May 14, 2024

Fix tests hydrologie/xhydro#145

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proper use of `loc` parameter in `gamma`,`fisk` dists (SPI/SPEI) #1720

Proper use of `loc` parameter in `gamma`,`fisk` dists (SPI/SPEI) #1720

coxipi commented Apr 18, 2024 •

edited

coxipi commented Apr 18, 2024

coxipi commented Apr 22, 2024 •

edited

coxipi commented Apr 23, 2024 •

edited

aulemahal left a comment

aulemahal May 2, 2024

coxipi May 2, 2024

coxipi May 2, 2024

	offset = convert_units_to("1 mm/d", wb, context="hydro")
	offset = convert_units_to("1000 mm/month", wb, context="hydro")

Proper use of loc parameter in gamma,fisk dists (SPI/SPEI) #1720

Proper use of loc parameter in gamma,fisk dists (SPI/SPEI) #1720

Conversation

coxipi commented Apr 18, 2024 • edited

Pull Request Checklist:

What kind of change does this PR introduce?

Does this PR introduce a breaking change?

Other information:

coxipi commented Apr 18, 2024

coxipi commented Apr 22, 2024 • edited

coxipi commented Apr 23, 2024 • edited

aulemahal left a comment

Choose a reason for hiding this comment

aulemahal May 2, 2024

Choose a reason for hiding this comment

coxipi May 2, 2024

Choose a reason for hiding this comment

coxipi May 2, 2024

Choose a reason for hiding this comment

Proper use of `loc` parameter in `gamma`,`fisk` dists (SPI/SPEI) #1720

Proper use of `loc` parameter in `gamma`,`fisk` dists (SPI/SPEI) #1720

coxipi commented Apr 18, 2024 •

edited

coxipi commented Apr 22, 2024 •

edited

coxipi commented Apr 23, 2024 •

edited