-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
complex aggregator based on http://datasketches.github.io #1897
Conversation
2cf7712
to
d4ddc5e
Compare
--- | ||
|
||
## DataSketches aggregator | ||
Druid aggregators based on [datasketches]()http://datasketches.github.io/) library. You would ingest one or more metric columns using either `sketchBuild` or `sketchMerge` aggregators. Then, at query time, you can use `sketchMerge` with appropriate post aggregators described below. Note that sketch algorithms are approxiate, see details in the [datasketches doc](http://datasketches.github.io/docs/theChallenge.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what algorithm is actually being run behind hte covers? sketchMerge is a bit confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a theta sketch, which is a more general version of KMV is being done, is that true?
if so, can we call the aggregators thetaIngest
and theta
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
names here have some historical significance as they were used since the inception of this module with many ppl using those.
That said, I think, it will be possible to have new names (with support for old names at the same time so that most of our client code does not break).
but I believe, names should have build
or merge
in them so that it is clear whether they build a fresh sketch or just merge sketches (e.g. sketchMerge
at ingestion time is used when user has already produced sketches as part of his/her batch pipeline, so input to druid already contains sketches) . Also having ingest
in the name might be misleading some time e.g. sketchMerge
aggregator is used both at ingestion time and query time.
yes, algorithm used is theta sketch a variant of KMV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have an extra ) here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approxiate => approximate
This looks cool overall but the test coverage looks really sparse at first glance. |
7905c2b
to
9201e44
Compare
--- | ||
|
||
## DataSketches aggregator | ||
Druid aggregators based on [datasketches]()http://datasketches.github.io/) library. Note that sketch algorithms are approxiate, see details in the [datasketches doc](http://datasketches.github.io/docs/theChallenge.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: approximate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the URL also stopped working
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a description of high level when to use the aggregators and post aggregators is required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also provide an example of how to ingest data with theta sketch
👍 after comments around documenting usage are fixed |
9201e44
to
4823b12
Compare
👍 |
4823b12
to
b1768c0
Compare
@fjy updated the doc with more explanation and examples. I believe, this is ready to merge now. |
b1768c0
to
0262961
Compare
old names are still valid though so as to be backwards compatible for now
0262961
to
7788f7c
Compare
will merge after travis |
complex aggregator based on http://datasketches.github.io
these aggregators are similar to hyperUnique in terms of functionality, but also provide arbitrary set operations on underlying sketches via a post aggregator.
We will formally announce it with a blog post some time in november .