Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure unique names for aggregators / post-aggregators + optimize groupBy post-aggregators #1044

Merged
merged 1 commit into from
Mar 11, 2015

Conversation

xvrl
Copy link
Member

@xvrl xvrl commented Jan 17, 2015

@drcrallen
Copy link
Contributor

Fixes #1045 via metamx@60e065d

@nishantmonu51
Copy link
Member

+1

@drcrallen
Copy link
Contributor

@xvrl : can this be a 0.7.1 fix?

@xvrl xvrl added this to the 0.7.1 milestone Feb 23, 2015
@xvrl xvrl changed the title Tests for groupBy with post-aggregators on complex metrics with the same name Tests + fix for groupBy with post-aggregators on complex metrics with the same name Feb 23, 2015
@xvrl
Copy link
Member Author

xvrl commented Feb 23, 2015

yup, added it to 0.7.1

On Mon, Feb 23, 2015 at 8:53 AM, Charles Allen notifications@github.com
wrote:

@xvrl https://github.com/xvrl : can this be a 0.7.1 fix?


Reply to this email directly or view it on GitHub
#1044 (comment).

@fjy
Copy link
Contributor

fjy commented Feb 25, 2015

Can I propose a simpler fix of just throwing a meaningful exception when agg/postagg match?

@drcrallen
Copy link
Contributor

@fjy : That is possible, but be aware that the current proposed fix overwrites the prior values with new ones. This means that if someone has complex aggregations and some other post-aggregations, the post aggs will overwrite the aggregators in the results set, this saves on ser/de when sending back to the client.

@fjy
Copy link
Contributor

fjy commented Mar 2, 2015

@cheddar @himanshug @xvrl Any thoughts on this one? In particular with my last set of comments.

@himanshug
Copy link
Contributor

I think I agree with @fjy in keeping it simple and just not allowing aggregator and post-aggregator of same names. Semantics are then simpler to understand/explain.

@drcrallen savings on serde, if required, can be attained by fixing #688

@drcrallen
Copy link
Contributor

@cheddar / @fjy / @himanshug / @xvrl
This changes behavior of the query engine and could break existing queries, but now implements the behavior as suggested by @fjy / @himanshug

@@ -214,4 +214,64 @@ public void testGroupByWithApproximateHistogramAgg()
Iterable<Row> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
TestHelper.assertExpectedObjects(expectedResults, results, "approx-histo");
}

@Test(expected = java.lang.IllegalArgumentException.class)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit, but java.lang isn't really required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, and in 1 other spot.

@himanshug
Copy link
Contributor

LGTM besides the minor comment.

@drcrallen
Copy link
Contributor

To be clear, the "fix" for the issue is still in the PR, so the underlying logic can support group by queries where the aggregate and post aggregate are named the same but now they simply refuse to.

@@ -64,6 +68,40 @@ public String apply(AggregatorFactory input)
);
combinedAggNames.add(postAgg.getName());
}

final Collection<String> aggAndPostAgg = Sets.intersection(
Sets.<String>newHashSet(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this the same as combinedAggNames?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the initial value I mean

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had originally taken this as a separate group of code to separate functionality, but it actually doesn't look too bad when I put them in the same logic area.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 52.11% when pulling 4393876087cf55b0fb64fae4a44bab5cc3040704 on metamx:groupby-postagg-naming-bug into c7ec8ba on druid-io:master.

@drcrallen
Copy link
Contributor

Modified this a little to also do the same check for aggregators (that aggregator names must be unique)

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) to 52.13% when pulling 92de15d6b2fa7ae3a738e8515046bdc68f8a7883 on metamx:groupby-postagg-naming-bug into 9f242ed on druid-io:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 52.18% when pulling e92e664800b4e7d9b37940e1e71c6e30dbd6fb3c on metamx:groupby-postagg-naming-bug into 9f242ed on druid-io:master.

final MapBasedRow inputRow = (MapBasedRow) input;
final Map<String, Object> values = Maps.newHashMap(inputRow.getEvent());
for (AggregatorFactory agg : query.getAggregatorSpecs()) {
if (postNames.contains(agg.getName())) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove this since we don't allow same names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulted to pre-PR behavior for this.

query.getDimensions(),
query.getAggregatorSpecs(),
// Don't do post aggs until the end of this method. ISSUE #1045
ImmutableList.<PostAggregator>of(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is probably misleading now, since this is more of an optimization rather than fixing a bug now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.04%) to 52.08% when pulling f6778651a0bf42525800bc25e7ea31a76376eaf2 on metamx:groupby-postagg-naming-bug into 9f242ed on druid-io:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 52.14% when pulling f6778651a0bf42525800bc25e7ea31a76376eaf2 on metamx:groupby-postagg-naming-bug into 9f242ed on druid-io:master.

@xvrl xvrl changed the title Tests + fix for groupBy with post-aggregators on complex metrics with the same name Ensure unique names for aggregators / post-aggregators + optimize groupBy post-aggregators Mar 10, 2015
@@ -1152,6 +1191,62 @@ public void testGroupByWithOrderLimit4()
}

@Test
public void testPostAggHavingSpec()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only tests post-aggregators in the context of merging results.
Can we add a separate one that does it without merging, i.e. that checks the results of GroupByQueryRunnerTestHelper.runQuery directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 52.1% when pulling 1ee7e4898a7b246e71298ca6f5513904e84b7cd1 on metamx:groupby-postagg-naming-bug into 9f242ed on druid-io:master.

* add test for same-name groupBy hyperUniques post-agg
* add test for same-name post-agg in groupby with approx histogram
* Fixes apache#1045
* Throws an error if post aggs and aggs do not have unique names
* Add more groupBy tests for Having filters
xvrl added a commit that referenced this pull request Mar 11, 2015
Ensure unique names for aggregators / post-aggregators + optimize groupBy post-aggregators
@xvrl xvrl merged commit c97fbdf into apache:master Mar 11, 2015
@xvrl xvrl deleted the groupby-postagg-naming-bug branch March 11, 2015 03:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GroupBy with complex metrics fails if there is a post-aggregator with the same name
6 participants