Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Druid Broker Result Level Cache Not Working Properly When Different Query Intervals Cover the Same Set of Segments #7302

Closed
lxqfy opened this issue Mar 20, 2019 · 4 comments

Comments

@lxqfy
Copy link
Contributor

lxqfy commented Mar 20, 2019

Druid Broker Result Level Cache Not Working Properly When Different Query Intervals Cover the Same Set of Segments.

Affected Version

0.13.0-incubating

Description

We have a segment with Segment Granularity of DAY and Query Granularity of HOUR.
When we send the same queries with only a different query interval, the result level cache will return the wrong result.

Example: QueryA and QueryB has same query body.
QueryA Interval: 2019-03-15T14:00:00/ 2019-03-16T14:00:00
QueryB Interval: 2019-03-15T18:00:00/ 2019-03-16T18:00:00

Involved Segments:
QueryA: segment2019-03-15_2019-03-16
QueryB: segment2019-03-15_2019-03-16

Response:
QueryA gets the correct result.
QueryB gets the result of QueryA.

Some Investigations:
Result level cache will use the query without interval as key. When the key match, the E-Tag is used to check if the cached result should be returned. However, E-Tag is only calculated with the Identifiers of involved segments.

@gianm
Copy link
Contributor

gianm commented Mar 20, 2019

I looked into this a bit and believe the ETag calculation needs to be re-thought. It has the bug you mentioned - the one you mentioned, where it is calculated based on segment identifiers (which include the entire segment interval) and does not include a sub-interval within a segment. This causes the behavior you are seeing, and could be fixed by mixing in the SegmentDescriptor interval to the ETag.

But a potential larger bug is that it is leveraging QueryToolChest.computeCacheKey, which is designed for segment-level caching and, importantly, only includes query parameters that might affect the segment-level results. This is done to promote higher cache hit rates. For example, it doesn't include things like:

  • grandTotal or postAggregations for timeseries
  • any non-sorting postAggregations for topN
  • having, subtotalsSpec, postAggregations, or limitSpec for groupBy (although the lack of limitSpec is a bug, I think, since groupBy supports limit push down in some cases now)

I think this means we need a new method in the QueryToolChest for computing result-level cache keys.

@gianm
Copy link
Contributor

gianm commented Mar 20, 2019

the lack of limitSpec is a bug, I think, since groupBy supports limit push down in some cases now

Ah, maybe it's not. I don't remember if limit push down goes all the way to the per-segment results, or if it only affects the merge buffer. If it's the latter, it doesn't need to be in the per-segment cache key.

@surekhasaharan
Copy link

@lxqfy thanks for reporting, I'm looking into it.

@jihoonson
Copy link
Contributor

Fixed in #7325.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants