Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enrich expression cache key information to support expressions which depend on external state #11358

Merged
merged 4 commits into from
Jun 15, 2021

Conversation

clintropolis
Copy link
Member

Description

This PR fixes an issue where expressions which depend on some external state which is also typically encoded into a cache key, such as a lookup, can have the incorrect cache key because the expression was only taking into account the string form of the expression. This causes incorrect results if the underlying lookup has changed, as the cache key did not encode this information and so remains unchanged, despite that the expression would result in a different value with the newer lookup.

This has been resolved by making Expr extend Cacheable and providing a default getCacheKey implementation which uses the stringified form, but is overridden by the lookup expression to form a composite key of the stringified expression and the lookup extraction functions cache key.

I moved CacheKeyBuilder from druid-processing to druid-core, which was marked with @PublicApi, so tagging with release notes (Cacheable was already defined in druid-core).

This PR has:

  • been self-reviewed.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

@Override
default byte[] getCacheKey()
{
return new CacheKeyBuilder(EXPR_CACHE_KEY).appendString(stringify()).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stringify is somewhat expensive and this will be called once per segment. Can we cache it, like we cache the original and parsed expressions? Best way I can think of right now is to implement that caching in ExpressionFilter, ExpressionVirtualColumn, etc.

Btw, that caching should be lazy, for two reasons:

  • otherwise we'll waste time computing it when it'll never be needed
  • sometimes the cache key will be incomputable (lookup doesn't exist, or error retrieving it) and we don't want that to totally prevent instantiating the VirtualColumn or DimFilter objects

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added caching similar to the Expr themselves

@@ -36,6 +37,7 @@

public class LookupExprMacro implements ExprMacroTable.ExprMacro
{
private static final byte LOOKUP_EXPR_KEY = 0x01;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we centralize these? Otherwise I worry about accidentally reusing IDs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried about that too, but hesitated since Expr is in core and this is in processing and also some others such as the DimensionSpec have this problem, but yeah I agree and fixed it and moved both of them to Exprs because wasn't sure where else to put it and it seemed sort of funny on Expr.

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, other than naming nit.

@@ -29,6 +29,9 @@

public class Exprs
{
public static final byte EXPR_CACHE_KEY = 0x00;
public static final byte LOOKUP_EXPR_KEY = 0x01;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming? LOOKUP_EXPR_KEY should be something like LOOKUP_CACHE_KEY, I think. For symmetry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, meant to rename to LOOKUP_EXPR_CACHE_KEY when i moved here, fixed

@@ -360,4 +364,25 @@ public static VectorValueSelector makeVectorValueSelector(
}
return columnSelectorFactory.makeValueSelector(fieldName);
}

public static Supplier<byte[]> getSimpleAggregatorCacheKeySupplier(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

@clintropolis clintropolis merged commit 920aa41 into apache:master Jun 15, 2021
@clintropolis clintropolis deleted the expr-cache-key branch June 15, 2021 00:26
@clintropolis clintropolis added this to the 0.22.0 milestone Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants