Skip to content

r.quantile/r.stats.quantile/libstats: fix quantile algorithm#2108

Merged
metzm merged 4 commits intoOSGeo:mainfrom
metzm:quantiles
Jan 21, 2022
Merged

r.quantile/r.stats.quantile/libstats: fix quantile algorithm#2108
metzm merged 4 commits intoOSGeo:mainfrom
metzm:quantiles

Conversation

@metzm
Copy link
Contributor

@metzm metzm commented Jan 19, 2022

Hyndman and Fan (1996) (https://doi.org/10.2307/2684934) list 9 different algorithms to the corresponding rank of a sorted list for a given quantile. The algorithm used in GRASS is not listed. Therefore I decided to use the algorithm type 7 also used by R and numpy. In R, see ?quantile for a description of the different algorithms.

There was an independent bug in r.quantile and r.stats.quantile: if the corresponding rank belonged to the last entry of a slot, the result was wrong because the first value of the next slot needs to be used to calculate the correct value for the given quantile.

@metzm metzm added bug Something isn't working backport_needed C Related code is in C labels Jan 19, 2022
@metzm metzm added this to the 8.0.1 milestone Jan 19, 2022
@metzm metzm requested review from marisn and veroandreo January 19, 2022 17:35
@veroandreo
Copy link
Contributor

I tested using the example in r.quantile manual:

# grass8.1dev
r.quantile input=elevation percentiles=0.1,1,10,25,50,75,90,99,99.9
Computing quantiles
0:0.100000:59.870384
1:1.000000:68.065155
2:10.000000:84.523346
3:25.000000:94.789986
4:50.000000:108.879921
5:75.000000:126.791985
6:90.000000:138.660385
7:99.000000:149.824585
8:99.900000:153.375854

# mm-branch
Computing quantiles
0:0.100000:59.870381
1:1.000000:68.065153
2:10.000000:84.523341
3:25.000000:94.789986
4:50.000000:108.879906
5:75.000000:126.791973
6:90.000000:138.660303
7:99.000000:149.824419
8:99.900000:153.375565

Changes appear in the 5th and 6th decimal positions. Is that ok, @metzm ?

After exporting the raster (as we still don't have rgrass for grass8) and testing in R, I get:

quantile(elev, probs=c(0.1,1,10,25,50,75,90,99,99.9)/100, type=7)
     0.1%        1%       10%       25%       50%       75%       90%       99% 
 59.87038  68.06515  84.52334  94.78999 108.87991 126.79197 138.66030 149.82442 
    99.9% 
153.37556 

but it's rounded to the 5th decimal by default. In any case, I'm in favor of consistency with other software packages, so +1 for this change.

@metzm
Copy link
Contributor Author

metzm commented Jan 20, 2022

Yes, changes are expected, particularly for low and high quantiles, and particularly for small samples. Importantly, the 9 different algorithms listed in literature and software implementations produce different results in these cases. The 50% percentile should be identical, however. That was a real bug in lib/stats/c_percentile.c.

 * adjust test for r.neighbors
 * sync r.stats quantile to r.quantile
@metzm
Copy link
Contributor Author

metzm commented Jan 20, 2022

A simple example with a list of 10 sorted values and their indices (zero-based ranks):

value:  1  2  3  4  5  6  7  8  9  10
index:  0  1  2  3  4  5  6  7  8   9

The 50% percentile would produce a split between the 5 lowest values 1, 2, 3, 4, 5 and the 5 highest values 6, 7, 8, 9, 10, the correct result is 5 * 0.5 + 6 * 0.5 = 5.5.
The previous code with n * quant, thus the index 10 * 0.5 = 5 would return the value 6 for the 50% percentile.
The new code with quant * (n - 1) correctly uses the index 4.5 to return the value 5.5 for the 50% percentile.

@neteler
Copy link
Member

neteler commented Jan 20, 2022

One test is failing, fixed with attached diff (I don't know how to commit against your PR?):
test_r_neighbors.zip

Running ./raster/r.neighbors/testsuite/test_r_neighbors.py...
========================================================================
...F...
======================================================================
FAIL: test_standard_options_quantile (__main__.TestNeighbors)
Test output with standard options (size=3, quadratic moving window,
----------------------------------------------------------------------
Traceback (most recent call last):
  File "raster/r.neighbors/testsuite/test_r_neighbors.py", line 480, in test_standard_options_quantile
    self.check_univar(test_case)
  File "raster/r.neighbors/testsuite/test_r_neighbors.py", line 429, in check_univar
    precision=1e-5,
  File "etc/python/grass/gunittest/case.py", line 308, in assertRasterFitsUnivar
    precision=precision,
  File "etc/python/grass/gunittest/case.py", line 282, in assertModuleKeyValue
    self.fail(self._formatMessage(msg, stdMsg))
AssertionError: r.univar map=test_standard_options_quantile_raster_0.03 percentile=90.0 separator== -g difference:
mismatch values (key, reference, actual): [('coeff_var', 18.5370455239192, 18.5383334276472), ('max', 156.208636016846, 156.20787109375), ('mean', 109.552850010781, 109.544974331288), ('mean_of_abs', 109.552850010781, 109.544974331288), ('min', 55.600062789917, 55.5965177536011), ('range', 100.608573226929, 100.611353340149), ('stddev', 20.3078616792495, 20.3078125947647), ('sum', 221844521.271832, 221828573.020859), ('variance', 412.409245983529, 412.407252384084)]
command: r.univar map=test_standard_options_quantile_raster_0.03 percentile=90.0 separator== -g {'map': 'test_standard_options_quantile_raster_0.03', 'separator': '=', 'flags': 'g'}

----------------------------------------------------------------------
Ran 7 tests in 24.097s
FAILED (failures=1)
========================================================================
FAILED ./raster/r.neighbors/testsuite/test_r_neighbors.py (1 test failed)

@metzm
Copy link
Contributor Author

metzm commented Jan 20, 2022

I am busy updating the tests for r.neighbors, WIP.

 * update test results
@metzm metzm merged commit c700a8a into OSGeo:main Jan 21, 2022
@metzm metzm deleted the quantiles branch January 21, 2022 18:44
@metzm
Copy link
Contributor Author

metzm commented Jan 21, 2022

TODO: backport c700a8a to G 8.0.1

neteler pushed a commit that referenced this pull request Jan 29, 2022
* use type 7 algorithm of Hyndman and Fan (1996) for quantiles, as is the default in R and numpy 
 * update manuals for `r.quantile`, `r.stats.quantile`
 * sync `r.stats quantile` to `r.quantile`
 * update test results for `r.neighbors`
@neteler
Copy link
Member

neteler commented Jan 29, 2022

TODO: backport c700a8a to G 8.0.1

Backport done.

@ecodiv
Copy link
Contributor

ecodiv commented Apr 24, 2022

@metzm , did this perhaps cause the bug #2259 ?

ninsbl pushed a commit to ninsbl/grass that referenced this pull request Oct 26, 2022
…o#2108)

* use type 7 algorithm of Hyndman and Fan (1996) for quantiles, as is the default in R and numpy 
 * update manuals for `r.quantile`, `r.stats.quantile`
 * sync `r.stats quantile` to `r.quantile`
 * update test results for `r.neighbors`
ninsbl pushed a commit to ninsbl/grass that referenced this pull request Feb 17, 2023
…o#2108)

* use type 7 algorithm of Hyndman and Fan (1996) for quantiles, as is the default in R and numpy 
 * update manuals for `r.quantile`, `r.stats.quantile`
 * sync `r.stats quantile` to `r.quantile`
 * update test results for `r.neighbors`
neteler pushed a commit to nilason/grass that referenced this pull request Nov 7, 2023
…o#2108)

* use type 7 algorithm of Hyndman and Fan (1996) for quantiles, as is the default in R and numpy 
 * update manuals for `r.quantile`, `r.stats.quantile`
 * sync `r.stats quantile` to `r.quantile`
 * update test results for `r.neighbors`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working C Related code is in C

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants