Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods.#4004
Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods.#4004himanshug merged 3 commits intoapache:masterfrom
Conversation
|
Added some docs and a new test describing behavior with subqueries + chunkPeriod. |
|
👍 |
|
@gianm tests are failing |
|
The interval chunking query decorator is called in the |
|
The interval chunking decorator did merges before (it does toolChest.mergeResults) so at least I didn't make it worse! But this patch is no good for other reasons, which I am in the middle of writing up. chunkPeriod has an interesting history… |
|
@himanshug @drcrallen @fjy after looking into this more I have learned some facts about chunkPeriod.
My question for you all: is this analysis correct and if so what should we do about it? Druid 0.6.71 and 0.7.1 are pretty old, so we've had the current behavior for quite a long time, and it seems to have gone un-noticed by anyone that might have been using chunkPeriod for its old purpose. My vote for 0.10.0 would be to update the docs to reflect reality (chunkPeriod makes queries more resource intensive, not less, by parallelizing merging on the broker, and is only useful for groupBy v1). I'd also suggest including code to ignore chunkPeriod for groupBy v2 queries since it makes the results wrong, and even if that were fixed, it wouldn't really do anything useful anyway. |
|
👍 on updating docs for 0.10.0 to reflect reality |
|
@himanshug @drcrallen @fjy Updated PR description and top comment with a new fix, disabling chunkPeriod for groupBy v2. It wouldn't help much anyway (see #4004 (comment)) and doesn't mesh well with how groupBy v2 merges results. |
…ods. Includes two fixes: - groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults returns a lazy sequence) and it generates incorrect results. - Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y". Also includes doc and test fixes: - groupBy v1 was no longer being tested by GroupByQueryRunnerTest since apache#3953, now it is once again. - chunkPeriod documentation was misleading due to its checkered past. Updated it to be more accurate.
|
👍 |
|
@gianm thanks for looking it up. I believe we should also remove it from [Search|Select|TopN|Timeseries]QueryToolChest.preMergeDecoration(..) . I have not seen it being used and helpful anywhere except GroupBy-v1 queries. (we can fix it too but I don't see that worth it) |
|
@himanshug it seems to work fine for other query types, it's not accomplishing much but it seems harmless. I want to include this patch in 0.10.0 (since chunkPeriod + groupBy v2 gives wrong results now) so would prefer to keep the changes minimal. I am down to disable it for other query types in 0.10.1 though. What do you think? |
…ods. (apache#4004) * Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods. Includes two fixes: - groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults returns a lazy sequence) and it generates incorrect results. - Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y". Also includes doc and test fixes: - groupBy v1 was no longer being tested by GroupByQueryRunnerTest since apache#3953, now it is once again. - chunkPeriod documentation was misleading due to its checkered past. Updated it to be more accurate. * Remove unused import. * Restore buffer size.
…ods. (#4004) (#4015) * Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods. Includes two fixes: - groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults returns a lazy sequence) and it generates incorrect results. - Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y". Also includes doc and test fixes: - groupBy v1 was no longer being tested by GroupByQueryRunnerTest since #3953, now it is once again. - chunkPeriod documentation was misleading due to its checkered past. Updated it to be more accurate. * Remove unused import. * Restore buffer size.
|
@gianm Hi, we are using thetasketch with 1M size . Interesting to find query time grow linearly with intervals using timeseries. I think chunkPeriod may help. And also I guess chunkPeriod child query cache make senses . we have tried 0.17 forkjoin broker merge, still not good enough. And we encounter #4826, Would you like to give some suggestion? |
|
@yuhuali1989 I'm curious why would you expect query time to grow sub-linearly? I guess, to me, it makes sense for a query on 10x more data to take 10x longer. Or are you saying you are running into a single-thread bottleneck somewhere? |
* Remove the deprecated interval-chunking stuff. See #6591, #4004 (comment) for details. * Remove unused import. * Remove chunkInterval too.
Includes two fixes:
returns a lazy sequence) and it generates incorrect results.
Also includes doc and test fixes:
is once again.
be more accurate.