average aggregator in both ingestion phase and query phase #3859

kaijianding · 2017-01-18T06:51:26Z

This PR is Inspired by #2525.
The avg aggregator can both be used in ingestion phase and query phase.
The pre-averaged column can be used in doubleSum/... etc again. This is useful if user only want to do average when rollup, but treat it as normal double metric when query.

gianm · 2017-02-16T21:23:13Z

@kaijianding, a couple questions:

does this provide any benefit over using two aggregators (count, sum) and a post-aggregator to divide them?
are you using this in production?

kaijianding · 2017-02-17T03:00:47Z

benefits are:

simpler query
can be used in rollup and cooperated with doubleSum/Max/Min/etc.

In our scenario, we want to do average when rollup(average only when timeAndDims are same), but sum/max/min the averaged column as if the averaged column is the float type GenericColumn. In this case, sum1/count1+sum2/count2+sum3/count3 != (sum1+sum2+sum3)/(count1+count2+count3).
For max(__avg) and min(__avg), I even can"t figure out how to write the druid query using postAgg.

We are using this in pre-production for now and will go to production when the whole product is finished
@gianm

gianm · 2017-02-23T17:17:41Z

thanks for the explanation @kaijianding.

b-slim · 2017-02-23T17:52:23Z

@kaijianding i think this might confuse druid user for the reason you highlighted.

sum1/count1+sum2/count2+sum3/count3 != (sum1+sum2+sum3)/(count1+count2+count3).

I think it will be better to have it as a module maybe ? plus more documentation about how it is different from the conventional avg done via sum/count ?

jon-wei

Reviewed this partially, had a few comments on specific parts of the code, will continue review later

I also agree with @b-slim, I think it would be good to see a bit of documentation on what use cases this enables beyond the existing (sum / count) method for calculating averages

jon-wei · 2017-03-01T02:44:32Z

processing/src/main/java/io/druid/query/aggregation/avg/AvgAggregatorFactory.java

+      return Aggregators.noopBufferAggregator();
+    }
+
+    if ("float".equalsIgnoreCase(inputType)) {


Suggest making inputType an enum with a fromString() method for deserialization (see ValueType for an example), this block can be a switch

jon-wei · 2017-03-01T02:53:26Z

processing/src/test/java/io/druid/query/aggregation/avg/AvgGroupByQueryTest.java

+    return GroupByQueryRunnerTest.constructorFeeder();
+  }
+
+  public AvgGroupByQueryTest(


Suggest moving these and other "avg agg with query" tests to the main query test suites instead of in separate files, I think it'd be more convenient/easier to validate code changes if the tests for a query type are all together (otherwise a dev would have to remember to find/run all the separate 'does feature X work with query type Y' test suites)

jon-wei · 2017-03-01T03:01:01Z

processing/src/main/java/io/druid/query/aggregation/avg/AvgAggregatorCollector.java

+      return holder1;
+    }
+    if (holder1.count == 0) {
+      holder1.count = holder2.count;


Could this just return holder2 instead of copying its values into holder1?

jon-wei

Had some more comments, will do another pass of review

jon-wei · 2017-03-02T22:40:50Z

processing/src/main/java/io/druid/query/aggregation/avg/AvgAggregatorFactory.java

+  @Override
+  public byte[] getCacheKey()
+  {
+    byte[] fieldNameBytes = com.metamx.common.StringUtils.toUtf8(fieldName);


Should remove the com.metamx.common. and use the StringUtils within druid

jon-wei · 2017-03-02T22:42:09Z

processing/src/main/java/io/druid/query/aggregation/avg/AvgAggregatorFactory.java

+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.base.Preconditions;
+import com.metamx.common.IAE;


This should use io.druid.java.util.common.IAE instead

jon-wei · 2017-03-02T22:52:47Z

processing/src/main/java/io/druid/query/aggregation/avg/AvgAggregatorCollector.java

+  public double compute()
+  {
+    if (count == 0) {
+      throw new IllegalStateException("should not be empty holder");


I think this should return NaN or 0 instead of throwing the exception, it's possible for an aggregator to not receive any values (suppose the AvgAggregator is wrapped in a FilteredAggregator that doesn't match any rows in a particular segment)

Can you also add a test for this case?

jon-wei · 2017-03-02T23:09:18Z

processing/src/main/java/io/druid/query/aggregation/avg/AvgAggregatorCollector.java

+    return Longs.BYTES + Doubles.BYTES;
+  }
+
+  long count; // number of elements


let's have these use getters/setters

jon-wei · 2017-03-02T23:15:19Z

processing/src/main/java/io/druid/query/aggregation/avg/AvgBufferAggregator.java

+  {
+    long count = buf.getLong(position + COUNT_OFFSET);
+    double sum = buf.getDouble(position + SUM_OFFSET);
+    return (float) sum / count;


this should check for count == 0

jon-wei · 2017-03-02T23:15:25Z

processing/src/main/java/io/druid/query/aggregation/avg/AvgBufferAggregator.java

+  {
+    long count = buf.getLong(position + COUNT_OFFSET);
+    double sum = buf.getDouble(position + SUM_OFFSET);
+    return (long) sum / count;


this should check for count == 0

jon-wei

@kaijianding Thanks for the PR, these are all the comments I have for now

jon-wei · 2017-03-02T23:58:47Z

processing/src/main/java/io/druid/query/aggregation/avg/AvgAggregatorCollector.java

+    @Override
+    public int compare(AvgAggregatorCollector o1, AvgAggregatorCollector o2)
+    {
+      int compare = Longs.compare(o1.count, o2.count);


This comparator should compare the computed averages from o1 and o2 instead of comparing the count/sum separately, otherwise anything sorting by an AvgAggregator could get results in an incorrect order, e.g.:

AvgAgg(count=5, sum=100) should be less than AvgAgg(count=1, sum=500000)

Can you add a test for this behavior where an AvgAggregator determines the sorting order?

shahsahil · 2019-01-30T08:01:20Z

@kaijianding any particular reason, you didnt follow up with this.
was looking for this functionality

stale · 2019-03-31T08:04:33Z

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

stale · 2019-04-07T08:09:14Z

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

kaijianding force-pushed the avg branch from 5a5b019 to 95181f5 Compare January 18, 2017 11:33

kaijianding force-pushed the avg branch 2 times, most recently from 609c036 to f12a728 Compare February 3, 2017 14:03

average aggregator in both ingestion phase and query phase

88cd2aa

kaijianding force-pushed the avg branch from f12a728 to 88cd2aa Compare February 16, 2017 13:29

gianm requested review from gianm and jon-wei February 23, 2017 17:07

gianm added the Feature label Feb 23, 2017

jon-wei reviewed Mar 1, 2017

View reviewed changes

jon-wei reviewed Mar 2, 2017

View reviewed changes

jon-wei requested changes Mar 3, 2017

View reviewed changes

stale bot added the stale label Mar 31, 2019

stale bot closed this Apr 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

average aggregator in both ingestion phase and query phase #3859

average aggregator in both ingestion phase and query phase #3859

kaijianding commented Jan 18, 2017

gianm commented Feb 16, 2017

kaijianding commented Feb 17, 2017

gianm commented Feb 23, 2017

b-slim commented Feb 23, 2017

jon-wei left a comment •

edited

Loading

jon-wei Mar 1, 2017

jon-wei Mar 1, 2017

jon-wei Mar 1, 2017

jon-wei left a comment

jon-wei Mar 2, 2017

jon-wei Mar 2, 2017

jon-wei Mar 2, 2017 •

edited

Loading

jon-wei Mar 2, 2017

jon-wei Mar 2, 2017

jon-wei Mar 2, 2017

jon-wei left a comment

jon-wei Mar 2, 2017

shahsahil commented Jan 30, 2019

stale bot commented Mar 31, 2019

stale bot commented Apr 7, 2019

average aggregator in both ingestion phase and query phase #3859

average aggregator in both ingestion phase and query phase #3859

Conversation

kaijianding commented Jan 18, 2017

gianm commented Feb 16, 2017

kaijianding commented Feb 17, 2017

gianm commented Feb 23, 2017

b-slim commented Feb 23, 2017

jon-wei left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jon-wei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jon-wei Mar 2, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jon-wei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shahsahil commented Jan 30, 2019

stale bot commented Mar 31, 2019

stale bot commented Apr 7, 2019

jon-wei left a comment •

edited

Loading

jon-wei Mar 2, 2017 •

edited

Loading