Skip to content

Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods.#4004

Merged
himanshug merged 3 commits intoapache:masterfrom
gianm:fix-chunkperiod
Mar 6, 2017
Merged

Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods.#4004
himanshug merged 3 commits intoapache:masterfrom
gianm:fix-chunkperiod

Conversation

@gianm
Copy link
Contributor

@gianm gianm commented Mar 4, 2017

Includes two fixes:

  • groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults
    returns a lazy sequence) and it generates incorrect results.
  • Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y".

Also includes doc and test fixes:

  • groupBy v1 was no longer being tested by GroupByQueryRunnerTest since Use GroupBy V2 as default #3953, now it
    is once again.
  • chunkPeriod documentation was misleading due to its checkered past. Updated it to
    be more accurate.
  • added section on "differences between groupBy v1 and v2"

@gianm gianm added the Bug label Mar 4, 2017
@gianm gianm added this to the 0.10.0 milestone Mar 4, 2017
@gianm gianm requested a review from himanshug March 4, 2017 01:42
@gianm gianm requested a review from fjy March 4, 2017 01:45
@gianm
Copy link
Contributor Author

gianm commented Mar 4, 2017

Added some docs and a new test describing behavior with subqueries + chunkPeriod.

@gianm gianm mentioned this pull request Mar 4, 2017
@fjy
Copy link
Contributor

fjy commented Mar 4, 2017

👍

@fjy
Copy link
Contributor

fjy commented Mar 4, 2017

@gianm tests are failing

@drcrallen
Copy link
Contributor

The interval chunking query decorator is called in the preMergeQueryDecoration methods. Now you're doing merges in the pre-merge. This seems like a violation of the expected query execution ordering.

@gianm
Copy link
Contributor Author

gianm commented Mar 4, 2017

The interval chunking decorator did merges before (it does toolChest.mergeResults) so at least I didn't make it worse! But this patch is no good for other reasons, which I am in the middle of writing up. chunkPeriod has an interesting history…

@gianm
Copy link
Contributor Author

gianm commented Mar 4, 2017

@himanshug @drcrallen @fjy after looking into this more I have learned some facts about chunkPeriod.

  • In the olden days (initial OSS Druid), chunkPeriod was implemented as Sequences.concat(lazyIterableOfChunkSequences) and so queries were executed sequentially. The sequences for later chunks weren't created until earlier chunks had finished, and this had the effect of reducing load on the historicals (since they'd only see part of each query at once).
  • But, TWIST: Druid 0.6.71 included commit 1d1ec1d, which effectively disabled chunkPeriod since it bypasses chunking if period.getMillis() == 0, which is true for any period that doesn't have a millisecond component. To get a nonzero millis component you'd have to use a period like "PT60.001S" which I doubt anyone was doing.
  • Later on, Druid 0.7.1 included interval chunk query runner now processes individual chunk in a threadpool #1150, where chunkPeriod was re-implemented as Sequences.concat(Lists.newArrayList(lazyIterableOfChunkSequences)). Doing Lists.newArrayList creates the sequences eagerly, which sends queries to the cluster in parallel. Additionally, the sequences are created in a thread pool, and the the chunks are all wrapped in toolChest.mergeResults(...) before sequences are created, which for groupBy v1 means each chunk is merged in its own thread in parallel. This patch also fixed the period.getMillis() == 0 check, so chunking started doing something again.
  • So in particular, interval chunk query runner now processes individual chunk in a threadpool #1150 changed chunkPeriod from something that was meant to decrease resource impact on historical nodes (by breaking up long queries and running them sequentially), into something that is meant to break up long queries and run them in parallel, which would either leave resource impact unchanged or increase it, especially on the broker.
  • The docs still say that chunkPeriod breaks queries into "shorter interval queries, reducing the impact on resources". I think the docs are wrong now in that regard.
  • The old behavior was useful at reducing resource impact for any query type, but the current behavior is only useful for parallelizing queries where toolChest.mergeResults(...).run(...) does merging eagerly, which is only groupBy v1. Other types like timeseries, topN, groupBy v2, all do merging lazily, so parallelizing the call to toolChest.mergeResults(...).run(...) isn't useful.

My question for you all: is this analysis correct and if so what should we do about it? Druid 0.6.71 and 0.7.1 are pretty old, so we've had the current behavior for quite a long time, and it seems to have gone un-noticed by anyone that might have been using chunkPeriod for its old purpose.

My vote for 0.10.0 would be to update the docs to reflect reality (chunkPeriod makes queries more resource intensive, not less, by parallelizing merging on the broker, and is only useful for groupBy v1). I'd also suggest including code to ignore chunkPeriod for groupBy v2 queries since it makes the results wrong, and even if that were fixed, it wouldn't really do anything useful anyway.

@fjy
Copy link
Contributor

fjy commented Mar 5, 2017

👍 on updating docs for 0.10.0 to reflect reality

@gianm gianm force-pushed the fix-chunkperiod branch from fb97daf to 3d7e276 Compare March 5, 2017 18:12
@gianm gianm changed the title Fix IntervalChunkingQueryRunner for groupBy v2. Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods. Mar 5, 2017
@gianm
Copy link
Contributor Author

gianm commented Mar 5, 2017

@himanshug @drcrallen @fjy Updated PR description and top comment with a new fix, disabling chunkPeriod for groupBy v2. It wouldn't help much anyway (see #4004 (comment)) and doesn't mesh well with how groupBy v2 merges results.

…ods.

Includes two fixes:
- groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults
returns a lazy sequence) and it generates incorrect results.
- Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y".

Also includes doc and test fixes:
- groupBy v1 was no longer being tested by GroupByQueryRunnerTest since apache#3953, now it
  is once again.
- chunkPeriod documentation was misleading due to its checkered past. Updated it to
  be more accurate.
@gianm gianm force-pushed the fix-chunkperiod branch from 3d7e276 to c4d7820 Compare March 5, 2017 18:18
@fjy
Copy link
Contributor

fjy commented Mar 5, 2017

👍

@himanshug
Copy link
Contributor

@gianm thanks for looking it up.

I believe we should also remove it from [Search|Select|TopN|Timeseries]QueryToolChest.preMergeDecoration(..) . I have not seen it being used and helpful anywhere except GroupBy-v1 queries. (we can fix it too but I don't see that worth it)
so effectively, chunkPeriod in its current form would retire with GroupBy-v1.

@gianm
Copy link
Contributor Author

gianm commented Mar 6, 2017

@himanshug it seems to work fine for other query types, it's not accomplishing much but it seems harmless. I want to include this patch in 0.10.0 (since chunkPeriod + groupBy v2 gives wrong results now) so would prefer to keep the changes minimal.

I am down to disable it for other query types in 0.10.1 though.

What do you think?

@himanshug himanshug merged commit 4ca5270 into apache:master Mar 6, 2017
gianm added a commit to gianm/druid that referenced this pull request Mar 6, 2017
…ods. (apache#4004)

* Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods.

Includes two fixes:
- groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults
returns a lazy sequence) and it generates incorrect results.
- Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y".

Also includes doc and test fixes:
- groupBy v1 was no longer being tested by GroupByQueryRunnerTest since apache#3953, now it
  is once again.
- chunkPeriod documentation was misleading due to its checkered past. Updated it to
  be more accurate.

* Remove unused import.

* Restore buffer size.
@gianm gianm deleted the fix-chunkperiod branch March 6, 2017 18:28
fjy pushed a commit that referenced this pull request Mar 6, 2017
…ods. (#4004) (#4015)

* Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods.

Includes two fixes:
- groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults
returns a lazy sequence) and it generates incorrect results.
- Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y".

Also includes doc and test fixes:
- groupBy v1 was no longer being tested by GroupByQueryRunnerTest since #3953, now it
  is once again.
- chunkPeriod documentation was misleading due to its checkered past. Updated it to
  be more accurate.

* Remove unused import.

* Restore buffer size.
@yuhuali1989
Copy link

yuhuali1989 commented Jan 8, 2020

@gianm Hi, we are using thetasketch with 1M size . Interesting to find query time grow linearly with intervals using timeseries. I think chunkPeriod may help. And also I guess chunkPeriod child query cache make senses .

we have tried 0.17 forkjoin broker merge, still not good enough.
Ideally, P1d ->10s, P10d ->15s. But reality is P1d ->10s, P10d ->100s.

And we encounter #4826, Would you like to give some suggestion?

gianm added a commit to gianm/druid that referenced this pull request Jan 18, 2020
@gianm
Copy link
Contributor Author

gianm commented Jan 18, 2020

@yuhuali1989 I'm curious why would you expect query time to grow sub-linearly? I guess, to me, it makes sense for a query on 10x more data to take 10x longer. Or are you saying you are running into a single-thread bottleneck somewhere?

gianm added a commit that referenced this pull request Jan 20, 2020
* Remove the deprecated interval-chunking stuff.

See #6591, #4004 (comment) for details.

* Remove unused import.

* Remove chunkInterval too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants