Skip to content

quantiles#34

Merged
tyrasd merged 9 commits into
masterfrom
quantiles
Nov 22, 2018
Merged

quantiles#34
tyrasd merged 9 commits into
masterfrom
quantiles

Conversation

@tyrasd
Copy link
Copy Markdown
Member

@tyrasd tyrasd commented Oct 11, 2018

This uses the t-digest method (and code) by Ted Dunning and Otmar Ertl to estimate quantiles of the distribution of the result set. A short description of the method can be found here.

Todo:

  • tweak t-disgest parameters (compression)?
  • implement for MapAggregator
  • unit tests

this uses the t-digest [1] method to estimate quantiles of the distribution of the result.

[1] https://raw.githubusercontent.com/tdunning/t-digest/master/docs/t-digest-paper/histo.pdf
@tyrasd tyrasd added the enhancement New feature or request label Oct 11, 2018
@tyrasd tyrasd self-assigned this Oct 11, 2018
@tyrasd tyrasd changed the title [WIP] quantiles quantiles Oct 19, 2018
@tyrasd tyrasd requested review from rtroilo and sfendrich October 19, 2018 08:37
@tyrasd
Copy link
Copy Markdown
Member Author

tyrasd commented Oct 24, 2018

This feature is actually quite cool. I did a quick analysis of the distribution of the length of osm highways:

highway-length-quantiles

It's also quite interesting to see that the road lenghts almost perfectly follow a log-normal distribution (as can be seen in this QQ-plot):

highway-length-qqplot

@tyrasd
Copy link
Copy Markdown
Member Author

tyrasd commented Oct 25, 2018

@rabidllama suggested to name these methods more precisely (i.e. reflect that they don't return the exact quantiles or median, but only a statistical estimation). Which is actually a good point, as it avoids potential confusions of users that maybe otherwise would expect precise results.

Maybe .estimatedQuantile could work… what do you think?

@sfendrich
Copy link
Copy Markdown
Contributor

Yes, avoiding this confusion is important. .estimatedQuantile is fine.

Copy link
Copy Markdown
Contributor

@sfendrich sfendrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm currently very busy. I put this review on my to-do list, but I cannot promise a date.

Copy link
Copy Markdown
Contributor

@sfendrich sfendrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename methods such as quantile to estimatedQuantile or something similar in order to avoid wrong expectations.

@tyrasd
Copy link
Copy Markdown
Member Author

tyrasd commented Nov 5, 2018

@sfendrich

Rename methods such as quantile to estimatedQuantile or something similar in order to avoid wrong expectations.

done in 741b6c8

@tyrasd tyrasd merged commit eb79045 into master Nov 22, 2018
@tyrasd tyrasd deleted the quantiles branch November 22, 2018 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants