r.quantile/r.stats.quantile/libstats: fix quantile algorithm#2108
r.quantile/r.stats.quantile/libstats: fix quantile algorithm#2108metzm merged 4 commits intoOSGeo:mainfrom
Conversation
|
I tested using the example in Changes appear in the 5th and 6th decimal positions. Is that ok, @metzm ? After exporting the raster (as we still don't have rgrass for grass8) and testing in R, I get: but it's rounded to the 5th decimal by default. In any case, I'm in favor of consistency with other software packages, so +1 for this change. |
|
Yes, changes are expected, particularly for low and high quantiles, and particularly for small samples. Importantly, the 9 different algorithms listed in literature and software implementations produce different results in these cases. The 50% percentile should be identical, however. That was a real bug in |
|
A simple example with a list of 10 sorted values and their indices (zero-based ranks): The 50% percentile would produce a split between the 5 lowest values 1, 2, 3, 4, 5 and the 5 highest values 6, 7, 8, 9, 10, the correct result is |
|
One test is failing, fixed with attached diff (I don't know how to commit against your PR?): |
|
I am busy updating the tests for |
|
TODO: backport c700a8a to G 8.0.1 |
* use type 7 algorithm of Hyndman and Fan (1996) for quantiles, as is the default in R and numpy * update manuals for `r.quantile`, `r.stats.quantile` * sync `r.stats quantile` to `r.quantile` * update test results for `r.neighbors`
Backport done. |
…o#2108) * use type 7 algorithm of Hyndman and Fan (1996) for quantiles, as is the default in R and numpy * update manuals for `r.quantile`, `r.stats.quantile` * sync `r.stats quantile` to `r.quantile` * update test results for `r.neighbors`
…o#2108) * use type 7 algorithm of Hyndman and Fan (1996) for quantiles, as is the default in R and numpy * update manuals for `r.quantile`, `r.stats.quantile` * sync `r.stats quantile` to `r.quantile` * update test results for `r.neighbors`
…o#2108) * use type 7 algorithm of Hyndman and Fan (1996) for quantiles, as is the default in R and numpy * update manuals for `r.quantile`, `r.stats.quantile` * sync `r.stats quantile` to `r.quantile` * update test results for `r.neighbors`
Hyndman and Fan (1996) (https://doi.org/10.2307/2684934) list 9 different algorithms to the corresponding rank of a sorted list for a given quantile. The algorithm used in GRASS is not listed. Therefore I decided to use the algorithm type 7 also used by R and numpy. In R, see
?quantilefor a description of the different algorithms.There was an independent bug in
r.quantileandr.stats.quantile: if the corresponding rank belonged to the last entry of a slot, the result was wrong because the first value of the next slot needs to be used to calculate the correct value for the given quantile.