Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

complex aggregator based on http://datasketches.github.io #1897

Merged
merged 6 commits into from
Nov 12, 2015

Conversation

himanshug
Copy link
Contributor

these aggregators are similar to hyperUnique in terms of functionality, but also provide arbitrary set operations on underlying sketches via a post aggregator.

We will formally announce it with a blog post some time in november .

---

## DataSketches aggregator
Druid aggregators based on [datasketches]()http://datasketches.github.io/) library. You would ingest one or more metric columns using either `sketchBuild` or `sketchMerge` aggregators. Then, at query time, you can use `sketchMerge` with appropriate post aggregators described below. Note that sketch algorithms are approxiate, see details in the [datasketches doc](http://datasketches.github.io/docs/theChallenge.html).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what algorithm is actually being run behind hte covers? sketchMerge is a bit confusing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a theta sketch, which is a more general version of KMV is being done, is that true?

if so, can we call the aggregators thetaIngest and theta?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

names here have some historical significance as they were used since the inception of this module with many ppl using those.
That said, I think, it will be possible to have new names (with support for old names at the same time so that most of our client code does not break).
but I believe, names should have build or merge in them so that it is clear whether they build a fresh sketch or just merge sketches (e.g. sketchMerge at ingestion time is used when user has already produced sketches as part of his/her batch pipeline, so input to druid already contains sketches) . Also having ingest in the name might be misleading some time e.g. sketchMerge aggregator is used both at ingestion time and query time.

yes, algorithm used is theta sketch a variant of KMV.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have an extra ) here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approxiate => approximate

@drcrallen
Copy link
Contributor

This looks cool overall but the test coverage looks really sparse at first glance.

@himanshug himanshug force-pushed the new_sketch_aggregation branch 2 times, most recently from 7905c2b to 9201e44 Compare November 10, 2015 08:06
---

## DataSketches aggregator
Druid aggregators based on [datasketches]()http://datasketches.github.io/) library. Note that sketch algorithms are approxiate, see details in the [datasketches doc](http://datasketches.github.io/docs/theChallenge.html).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: approximate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the URL also stopped working

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a description of high level when to use the aggregators and post aggregators is required

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also provide an example of how to ingest data with theta sketch

@fjy
Copy link
Contributor

fjy commented Nov 10, 2015

👍 after comments around documenting usage are fixed

@cheddar
Copy link
Contributor

cheddar commented Nov 12, 2015

👍

@himanshug
Copy link
Contributor Author

@fjy updated the doc with more explanation and examples. I believe, this is ready to merge now.

@fjy
Copy link
Contributor

fjy commented Nov 12, 2015

will merge after travis

@fjy fjy closed this Nov 12, 2015
@fjy fjy reopened this Nov 12, 2015
fjy added a commit that referenced this pull request Nov 12, 2015
@fjy fjy merged commit 148153b into apache:master Nov 12, 2015
@gianm gianm added this to the 0.8.3 milestone Dec 1, 2015
This was referenced Dec 1, 2015
@himanshug himanshug deleted the new_sketch_aggregation branch December 5, 2015 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants