Conversation
this uses the t-digest [1] method to estimate quantiles of the distribution of the result. [1] https://raw.githubusercontent.com/tdunning/t-digest/master/docs/t-digest-paper/histo.pdf
Which provides the most flexibility / reusability (e.g. as a map function in a stream pipeline). Also change "quantiles of a List of values" method to work on all kinds of iterables
|
This feature is actually quite cool. I did a quick analysis of the distribution of the length of osm highways: It's also quite interesting to see that the road lenghts almost perfectly follow a log-normal distribution (as can be seen in this QQ-plot): |
|
@rabidllama suggested to name these methods more precisely (i.e. reflect that they don't return the exact quantiles or median, but only a statistical estimation). Which is actually a good point, as it avoids potential confusions of users that maybe otherwise would expect precise results. Maybe |
|
Yes, avoiding this confusion is important. |
sfendrich
left a comment
There was a problem hiding this comment.
Sorry, I'm currently very busy. I put this review on my to-do list, but I cannot promise a date.
sfendrich
left a comment
There was a problem hiding this comment.
Rename methods such as quantile to estimatedQuantile or something similar in order to avoid wrong expectations.
done in 741b6c8 |


This uses the t-digest method (and code) by Ted Dunning and Otmar Ertl to estimate quantiles of the distribution of the result set. A short description of the method can be found here.
Todo:
compression)?MapAggregator