-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure unique names for aggregators / post-aggregators + optimize groupBy post-aggregators #1044
Conversation
xvrl
commented
Jan 17, 2015
- Ensure names for aggregators and post-aggregators are unique
- Fixes GroupBy with complex metrics fails if there is a post-aggregator with the same name #1045 by not allowing the same name for both post-aggregators and aggregators
- Optimizes post-aggregator calculation for groupBy
726d272
to
743316d
Compare
743316d
to
60e065d
Compare
8b0ec82
to
d05032b
Compare
Fixes #1045 via metamx@60e065d |
+1 |
@xvrl : can this be a 0.7.1 fix? |
yup, added it to 0.7.1 On Mon, Feb 23, 2015 at 8:53 AM, Charles Allen notifications@github.com
|
Can I propose a simpler fix of just throwing a meaningful exception when agg/postagg match? |
@fjy : That is possible, but be aware that the current proposed fix overwrites the prior values with new ones. This means that if someone has complex aggregations and some other post-aggregations, the post aggs will overwrite the aggregators in the results set, this saves on ser/de when sending back to the client. |
@cheddar @himanshug @xvrl Any thoughts on this one? In particular with my last set of comments. |
I think I agree with @fjy in keeping it simple and just not allowing aggregator and post-aggregator of same names. Semantics are then simpler to understand/explain. @drcrallen savings on serde, if required, can be attained by fixing #688 |
60e065d
to
4393876
Compare
@cheddar / @fjy / @himanshug / @xvrl |
@@ -214,4 +214,64 @@ public void testGroupByWithApproximateHistogramAgg() | |||
Iterable<Row> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query); | |||
TestHelper.assertExpectedObjects(expectedResults, results, "approx-histo"); | |||
} | |||
|
|||
@Test(expected = java.lang.IllegalArgumentException.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit, but java.lang isn't really required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, and in 1 other spot.
LGTM besides the minor comment. |
4393876
to
e92e664
Compare
To be clear, the "fix" for the issue is still in the PR, so the underlying logic can support group by queries where the aggregate and post aggregate are named the same but now they simply refuse to. |
@@ -64,6 +68,40 @@ public String apply(AggregatorFactory input) | |||
); | |||
combinedAggNames.add(postAgg.getName()); | |||
} | |||
|
|||
final Collection<String> aggAndPostAgg = Sets.intersection( | |||
Sets.<String>newHashSet( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this the same as combinedAggNames?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the initial value I mean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I had originally taken this as a separate group of code to separate functionality, but it actually doesn't look too bad when I put them in the same logic area.
Coverage decreased (-0.02%) to 52.11% when pulling 4393876087cf55b0fb64fae4a44bab5cc3040704 on metamx:groupby-postagg-naming-bug into c7ec8ba on druid-io:master. |
e92e664
to
92de15d
Compare
Modified this a little to also do the same check for aggregators (that aggregator names must be unique) |
Coverage increased (+0.01%) to 52.13% when pulling 92de15d6b2fa7ae3a738e8515046bdc68f8a7883 on metamx:groupby-postagg-naming-bug into 9f242ed on druid-io:master. |
Coverage increased (+0.06%) to 52.18% when pulling e92e664800b4e7d9b37940e1e71c6e30dbd6fb3c on metamx:groupby-postagg-naming-bug into 9f242ed on druid-io:master. |
final MapBasedRow inputRow = (MapBasedRow) input; | ||
final Map<String, Object> values = Maps.newHashMap(inputRow.getEvent()); | ||
for (AggregatorFactory agg : query.getAggregatorSpecs()) { | ||
if (postNames.contains(agg.getName())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove this since we don't allow same names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaulted to pre-PR behavior for this.
92de15d
to
89af08c
Compare
query.getDimensions(), | ||
query.getAggregatorSpecs(), | ||
// Don't do post aggs until the end of this method. ISSUE #1045 | ||
ImmutableList.<PostAggregator>of(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment is probably misleading now, since this is more of an optimization rather than fixing a bug now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
89af08c
to
f677865
Compare
Coverage decreased (-0.04%) to 52.08% when pulling f6778651a0bf42525800bc25e7ea31a76376eaf2 on metamx:groupby-postagg-naming-bug into 9f242ed on druid-io:master. |
Coverage increased (+0.02%) to 52.14% when pulling f6778651a0bf42525800bc25e7ea31a76376eaf2 on metamx:groupby-postagg-naming-bug into 9f242ed on druid-io:master. |
@@ -1152,6 +1191,62 @@ public void testGroupByWithOrderLimit4() | |||
} | |||
|
|||
@Test | |||
public void testPostAggHavingSpec() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this only tests post-aggregators in the context of merging results.
Can we add a separate one that does it without merging, i.e. that checks the results of GroupByQueryRunnerTestHelper.runQuery directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
Coverage decreased (-0.02%) to 52.1% when pulling 1ee7e4898a7b246e71298ca6f5513904e84b7cd1 on metamx:groupby-postagg-naming-bug into 9f242ed on druid-io:master. |
* add test for same-name groupBy hyperUniques post-agg * add test for same-name post-agg in groupby with approx histogram * Fixes apache#1045 * Throws an error if post aggs and aggs do not have unique names * Add more groupBy tests for Having filters
1fd6b33
to
217e674
Compare
Ensure unique names for aggregators / post-aggregators + optimize groupBy post-aggregators