Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use GroupBy V2 as default #3953

Merged
merged 3 commits into from Feb 18, 2017
Merged

Use GroupBy V2 as default #3953

merged 3 commits into from Feb 18, 2017

Conversation

jon-wei
Copy link
Contributor

@jon-wei jon-wei commented Feb 17, 2017

Makes GroupBy V2 the default and updates some unit tests to work with V2

@jon-wei jon-wei added this to the 0.10.0 milestone Feb 17, 2017
@jon-wei jon-wei requested a review from gianm February 17, 2017 22:25
@@ -318,6 +318,7 @@ private static ValueExtractFunction makeValueExtractFunction(
)
{
final boolean includeTimestamp = GroupByStrategyV2.getUniversalTimestamp(query) == null;
//final boolean includeTimestamp = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, deleted this

@jon-wei jon-wei closed this Feb 17, 2017
@jon-wei jon-wei reopened this Feb 17, 2017
)
);
} catch (Exception e) {
Assert.fail(e.getMessage());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert.fail inside a callback like this seems kinda weird -- usually asserts run inside the main test code. Would it work if this is just a re-throw? Like throw Throwables.propagate(e)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to Throwables.propagate(e)

@@ -112,6 +133,45 @@ public GroupByTimeseriesQueryRunnerTest(QueryRunner runner)
super(runner, false);
}

// GroupBy handles timestamps differently when granularity is ALL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this override needed? Do groupBy v1 and v2 behave differently here?

Copy link
Contributor Author

@jon-wei jon-wei Feb 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GroupBy v1 and v2 behave the same in this case, when the query granularity is set to ALL, the result timestamps will contain the earliest timestamp in the query interval, which differs from Timeseries

Prior to this patch, testFullOnTimeseriesMaxMin() was not executing mergeResults() on GroupBy V1, so the timestamp wasn't truncated to the ALL granularity and the results matched the Timeseries results exactly.

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@b-slim b-slim merged commit bc33b68 into apache:master Feb 18, 2017
@himanshug
Copy link
Contributor

himanshug commented Feb 22, 2017

@gianm @jon-wei
somewhat related to this PR....

after this PR our builds for hadoop indexer test failed to run the groupBy query and failed with exception message "Grouping resources exhausted" from https://github.com/druid-io/druid/blob/master/processing/src/main/java/io/druid/query/groupby/epinephelinae/GroupByMergingQueryRunnerV2.java#L301 with no indication of what the problem was. I added some disk space by druid.query.groupBy.maxOnDiskStorage and the problem disappeared.
that said, it might not be so simple for a regular druid user to understand.
should we change https://github.com/druid-io/druid/blob/master/processing/src/main/java/io/druid/query/groupby/epinephelinae/SpillingGrouper.java#L119 to

throw new ISE(ex, "no space to spill results, please increase druid.query.groupBy.maxOnDiskStorage or druid.processing.buffer.sizeBytes");

@gianm
Copy link
Contributor

gianm commented Feb 22, 2017

Yeah that message could be better. I think nothing else other than temporary-storage-full could cause that, so might as well add that to the error message.

@gianm
Copy link
Contributor

gianm commented Feb 22, 2017

There's a similar message in GroupByRowProcessor that could be changed too

@himanshug
Copy link
Contributor

@gianm wondering why would we modify the message at both places instead of throwing exception with correct message in SpillingGrouper ... is there a case when throwing exception there is not desirable?

@gianm
Copy link
Contributor

gianm commented Feb 23, 2017

@himanshug I guess I avoided exceptions because grouper.aggregate returns false normally and I didn't want to have the overhead of exceptions in that code path. For example that's how BufferGrouper tells SpillingGrouper that it needs to be spilled. Usually I try to avoid exceptions for control flow in query code like that.

How about changing the return type from boolean to something that includes a message or a reason?

gianm added a commit to gianm/druid that referenced this pull request Mar 5, 2017
…ods.

Includes two fixes:
- groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults
returns a lazy sequence) and it generates incorrect results.
- Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y".

Also includes doc and test fixes:
- groupBy v1 was no longer being tested by GroupByQueryRunnerTest since apache#3953, now it
  is once again.
- chunkPeriod documentation was misleading due to its checkered past. Updated it to
  be more accurate.
himanshug pushed a commit that referenced this pull request Mar 6, 2017
…ods. (#4004)

* Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods.

Includes two fixes:
- groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults
returns a lazy sequence) and it generates incorrect results.
- Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y".

Also includes doc and test fixes:
- groupBy v1 was no longer being tested by GroupByQueryRunnerTest since #3953, now it
  is once again.
- chunkPeriod documentation was misleading due to its checkered past. Updated it to
  be more accurate.

* Remove unused import.

* Restore buffer size.
gianm added a commit to gianm/druid that referenced this pull request Mar 6, 2017
…ods. (apache#4004)

* Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods.

Includes two fixes:
- groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults
returns a lazy sequence) and it generates incorrect results.
- Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y".

Also includes doc and test fixes:
- groupBy v1 was no longer being tested by GroupByQueryRunnerTest since apache#3953, now it
  is once again.
- chunkPeriod documentation was misleading due to its checkered past. Updated it to
  be more accurate.

* Remove unused import.

* Restore buffer size.
fjy pushed a commit that referenced this pull request Mar 6, 2017
…ods. (#4004) (#4015)

* Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods.

Includes two fixes:
- groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults
returns a lazy sequence) and it generates incorrect results.
- Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y".

Also includes doc and test fixes:
- groupBy v1 was no longer being tested by GroupByQueryRunnerTest since #3953, now it
  is once again.
- chunkPeriod documentation was misleading due to its checkered past. Updated it to
  be more accurate.

* Remove unused import.

* Restore buffer size.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants