Add cache_key_wrapper to Jinja template processor #7816

villebro · 2019-07-03T19:26:05Z

SUMMARY

Currently using dynamic filters that change based on the logged in user can unintended consequences when caching is enabled. For example, if a where-clause references the currenct_username() function to filter only rows for the currently logged in user, the result will be cached with the same key that another user would get, despite both users getting different results from the current_username() function. This is because the rendered result of calling current_username() is never stored in the query_obj that is the basis for the cache key.

This PR adds a new function cache_key_wrapper to the jinja context, which can be wrapped around any function call, and stores the called values in a list extra_cache_keys, which are added to the cache_dict prior to hashing. This ensures that both users get unique values when referencing the same datasource. In practice this is done by "compiling" the query before calculation of the cache key, and storing all values that have been passed to cache_key_wrapper, which is then considered when calling the cache_key function in viz.py (legacy) and query_context.py/query_object.py (future). This adds some overhead, as the full SQLAlchemy selectable has to be generated and compiled once prior to cache key calculation, and again if there isn't a cache hit. The selectable could be easily stored and reused if there isn't a cache hit, but since the overhead is rather unnoticeable, I decided against it in favor of code readability.

SCREENSHOT OF NEW DOCS

TEST PLAN

Tested locally + CI

ADDITIONAL INFORMATION

Has associated issue: Fixes Queries using current_username() method do not run properly when caching is enabled #7580
Introduces new feature or API

REVIEWERS

@mistercrunch @betodealmeida @john-bodley @duffar12

villebro · 2019-07-08T21:59:55Z

@duffar12 I would appreciate your thoughts on this PR, as I believe it fixes your immediate problem and could be useful to others, too.

villebro · 2019-07-18T06:07:58Z

It seems feedback from the original issue poster isn't forthcoming, but would like to get this processed anyway. @mistercrunch @john-bodley do you have any thoughts on this?

john-bodley · 2019-07-18T17:34:53Z

@villebro I think this approach seems valid. My only question is it seems that cache_key takes a dictionary of extra cache key arguments whereas in your implementation you're using a single predefined key extra_cache_keys with a list of keys. I can understand why this is the case from your implementation but was wondering whether it's viable to use a dictionary.

villebro · 2019-07-18T18:31:34Z

@john-bodley The idea is basically to gather up all the objects that have been passed to query_key_wrapper and make sure they appear in the order they've been called. Therefore having them under one key extra_cache_keys seemed to me the most valid approach, as adding a key to each value probably wouldn't be of much utility, and might confuse/interfere with the other keys in cache_obj.

john-bodley · 2019-07-18T19:19:51Z

@villebro agreed.

villebro · 2019-07-19T12:56:37Z

Ok to merge this?

duffar12 · 2019-07-22T10:01:11Z

@villebro, sorry I missed all of this. Looks good though. Thanks for getting this in

etr2460 · 2019-08-27T17:28:57Z

@villebro: We’re running into an issue where charts and dashboards are significantly slower after your change. This seems to be because we’re “compiling” the query every time before checking the cache for it, which results in several requests to our Presto/Hive backend (show partitions, show columns, etc.) to resolve the templates. When we load a dashboard with 20 charts, all with a use of latest partition, it significantly slows down and reduces the usefulness of caching.

I had a couple possible solutions to this:

Only compile the jinja templates that relate to the current user. Your fix was aimed at fixing uses of current_username, maybe we can just special case this here?
Cache the results of the queries that compile the jinja templates. That way we wouldn’t need to repeatedly ping Presto/Hive for the partitions/columns in the table.

What do you think?
cc @graceguo-supercat @michellethomas

villebro · 2019-08-27T17:39:05Z

I propose introducing a method uses_jinja_templates() which checks the query and filters for {{.*cache_key_wrapper(.*}} or similar, and if none are present, doesn't do the double compilation.

Add cache_key_wrapper to Jinja template processor

09e7c91

pull-request-size bot added the size/L label Jul 3, 2019

mistercrunch approved these changes Jul 20, 2019

View reviewed changes

mistercrunch merged commit 4568b2a into apache:master Jul 20, 2019

alex-mark pushed a commit to alex-mark/incubator-superset that referenced this pull request Jul 29, 2019

Add cache_key_wrapper to Jinja template processor (apache#7816)

4ae60de

villebro deleted the jinja_cache branch August 21, 2019 15:37

etr2460 mentioned this pull request Aug 23, 2019

Fix sqla query cache keys function #8105

Merged

12 tasks

villebro mentioned this pull request Feb 25, 2021

fix: always recalc cache key when SQL_QUERY_MUTATOR is defined #13307

Closed

6 tasks

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.34.0 labels Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cache_key_wrapper to Jinja template processor #7816

Add cache_key_wrapper to Jinja template processor #7816

villebro commented Jul 3, 2019 •

edited

Loading

villebro commented Jul 8, 2019

villebro commented Jul 18, 2019

john-bodley commented Jul 18, 2019

villebro commented Jul 18, 2019

john-bodley commented Jul 18, 2019

villebro commented Jul 19, 2019

duffar12 commented Jul 22, 2019

etr2460 commented Aug 27, 2019

villebro commented Aug 27, 2019

Add cache_key_wrapper to Jinja template processor #7816

Add cache_key_wrapper to Jinja template processor #7816

Conversation

villebro commented Jul 3, 2019 • edited Loading

CATEGORY

SUMMARY

SCREENSHOT OF NEW DOCS

TEST PLAN

ADDITIONAL INFORMATION

REVIEWERS

villebro commented Jul 8, 2019

villebro commented Jul 18, 2019

john-bodley commented Jul 18, 2019

villebro commented Jul 18, 2019

john-bodley commented Jul 18, 2019

villebro commented Jul 19, 2019

duffar12 commented Jul 22, 2019

etr2460 commented Aug 27, 2019

villebro commented Aug 27, 2019

villebro commented Jul 3, 2019 •

edited

Loading