Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix result-level cache for queries #7325

Merged
merged 15 commits into from
Apr 18, 2019

Conversation

surekhasaharan
Copy link

@surekhasaharan surekhasaharan commented Mar 22, 2019

Addresses #7302
Added SegmentDescriptor's Interval to the Etag encoding so that same query with different intervals will have non-equal Etags, and query would not return incorrectly cached results.

Added a new method computeResultLevelCacheKey in the CacheStrategy for computing result-level cache keys.
Made HavingSpec cacheable and implemented getCacheKey for subclasses
Added unit tests for computeResultLevelCacheKey

@surekhasaharan surekhasaharan changed the title Add SegmentDescriptor interval in the hash while calculating Etag Fix result-level cache for queries Mar 27, 2019
public byte[] getCacheKey()
{
try {
final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would CacheKeyBuilder work for this (and similar method)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can use CacheKeyBuilder, wonder why I didn't earlier.

@gianm gianm added this to the 0.15.0 milestone Apr 10, 2019
.appendString(aggregationName)
.appendByteArray(StringUtils.toUtf8(String.valueOf(value)))
.appendStrings(aggregators.keySet())
.appendCacheables(aggregators.values())
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure the best way to add the aggregators map to CacheKeyBuilder, I am adding the keys and values separately, is that okay ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to add them to the cache key -- only the aggregation and value matter. The idea is that anything that changes the meaning or effect of the operator should be part of the cache key, but not other stuff. aggregators is just some extra information that the HavingSpec uses to do its job.

@gianm
Copy link
Contributor

gianm commented Apr 17, 2019

Labeled "Incompatible" due to adding Cacheable interface to HavingSpec.

@Override
public byte[] computeResultLevelCacheKey(SegmentMetadataQuery query)
{
return computeCacheKey(query);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs to include "merge" and "lenientAggregatorMerge". They're not part of the segment-level cache key, but they should be part of the result-level key.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, added those to result level cache key.

.appendCacheable(query.getVirtualColumns())
.appendCacheables(query.getPostAggregatorSpecs())
.appendInt(query.getLimit());
if (query.isGrandTotal()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might as well just be .appendBoolean(query.isGrandTotal()), and is probably better, since there's less of a need to make sure that things we add after it will mesh well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.appendCacheable(query.getHavingSpec())
.appendCacheable(query.getLimitSpec());

if (!query.getPostAggregatorSpecs().isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well add the post-aggregator specs unconditionally.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.appendCacheable(query.getVirtualColumns());

final List<PostAggregator> postAggregators = query.getPostAggregatorSpecs();
if (!postAggregators.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well add the post-aggregator specs unconditionally here, too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -385,6 +385,7 @@ private String computeCurrentEtag(final Set<ServerToSegment> segments, @Nullable
break;
}
hasher.putString(p.getServer().getSegment().getId().toString(), StandardCharsets.UTF_8);
hasher.putString(p.rhs.getInterval().toString(), StandardCharsets.UTF_8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment here would be nice, explaining that this interval is the query interval, not the segment interval, and it's important for it to be part of the cache key.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, added a comment

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after CI 👍

@gianm gianm merged commit c2a42e0 into apache:master Apr 18, 2019
gianm pushed a commit to implydata/druid-public that referenced this pull request Apr 20, 2019
* Add SegmentDescriptor interval in the hash while calculating Etag

* Add computeResultLevelCacheKey to CacheStrategy

Make HavingSpec cacheable and implement getCacheKey for subclasses
Add unit tests for computeResultLevelCacheKey

* Add more tests

* Use CacheKeyBuilder for HavingSpec's getCacheKey

* Initialize aggregators map to avoid NPE

* adjust cachekey builder for HavingSpec to ignore aggregators

* unused import

* PR comments
clintropolis pushed a commit that referenced this pull request Apr 24, 2019
* Add SegmentDescriptor interval in the hash while calculating Etag

* Add computeResultLevelCacheKey to CacheStrategy

Make HavingSpec cacheable and implement getCacheKey for subclasses
Add unit tests for computeResultLevelCacheKey

* Add more tests

* Use CacheKeyBuilder for HavingSpec's getCacheKey

* Initialize aggregators map to avoid NPE

* adjust cachekey builder for HavingSpec to ignore aggregators

* unused import

* PR comments
@clintropolis clintropolis modified the milestones: 0.15.0, 0.14.1 Apr 24, 2019
@surekhasaharan surekhasaharan deleted the fix-etag-calculation branch April 24, 2019 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants