Fetch charts with GET to benefit from browser cache and conditional requests #7032

betodealmeida · 2019-03-14T05:00:27Z

This is a small PR that does a lot. It changes the initial request for charts (in explore or dashboards) to be done through a GET request, greatly improving the loading speed of dashboards. It also moves the caching to the HTTP layer, allowing us to benefit from Expires and ETag headers for conditional requests.

The problem

This diagram compares the current flow ("before") with the one implemented by this PR ("after"):

Before

Let's assume Superset is configured with a 1 hour cache, and also that the data changes on a longer period (daily, eg):

User "A" requests a chart from Superset doing a POST request with the payload.
Superset computes the query and sends it to the DB.
DB returns a dataframe.
Superset caches the dataframe.
Superset serializes the payload and sends it back to user "A".
User "A" refreshes the dashboard.
Superset finds the dataframe cached.
Superset serializes the payload and send it back to user "A".
Superset cache expires after 1 hour.
User "A" refreshes the dashboard.
Superset computes the query and sends it to the DB.
DB returns the exact same dataframe.
Superset caches the dataframe again.
Superset serializes the payload and sends it back to user "A".

There are a few inefficiencies here:

The browser cache is never used, because it's doing POST requests.
Superset needs to serialize the payload even on a cache hit.
Data is transferred to the browser even if it hasn't changed.

After

User "A" requests a chart from Superset doing a GET request with the chart id.
Superset computes the query and sends it to the DB.
DB returns a dataframe.
Superset serializes the dataframe and caches the HTTP response.
Superset sends the payload to user "A", with an Expires header of 1 hour, and an ETag header which is a hash of the payload.
The browser stores the response in its native cache, and SupersetClient caches it also in the Cache interface.
The user refreshes the dashboard.
Because of the Expires header and the use of GET the data is read directly from the native browser cache.
Superset cache expires after 1 hour.
User "A" refreshes the dashboard. The native cache is not used, since Expires is now in the past. SupersetClient looks for a cached response in the Cache interface, and if one is found, extracts its ETag.
The browser requests the chart with an If-None-Match header, containing the hash of the cached response (its ETag).
Superset computes the query and sends it to the DB.
DB returns the exact same dataframe.
Superset serializes the dataframe and caches the HTTP response.
Superset sees that the ETag matches the If-None-Match header, returning a 304 Not Modified response.
Browser fetches the cached response from the Cache interface.
Browser uses the response.

Notes

The GET request is done only the first time the chart is mounted. Forcing refresh on dashboards and clicking "Run Query" in the Explore views perform POST requests, which bypass the cache, and cache the new response. I tested the Explore view and dashboards with filters, and all further interactions are done with POSTs.
Since we're caching the HTTP response, we need to verify that the user has permission to read the cached response. This is done by passing a check_perms function to the decorator that caches the responses.
The fetch API has no support for conditional responses with ETags. We need to add explicit support in SupersetClient. I have a separate PR for that (see feat: add support for conditional requests apache-superset/superset-ui#119).
There is one small downside to this approach. During the time while Expires is still valid, the browser will not perform any requests for cached charts unless the user explicitly refreshes a dashboard or click "Run Query" in the Explore view. If the data is bad, they will see bad data until it expires or they purposefully refresh the chart. In the current workflow, in theory we can purge the cache in this case, since it lives only on the server-side. This is a hypothetical scenario, and we could workaround it by sending a notification to dashboards that one or more charts have bad data and should be refreshed.

* Exclude venv for python linter to ignore * Fix NaN error

This PR sets the background-color css property on `.ace_scroller` instead of `.ace_content` to prevent the white background shown during resizing of the SQL editor before drag ends.

betodealmeida · 2019-03-14T05:00:41Z

👀 @DiggidyDave

graceguo-supercat · 2019-03-14T06:41:29Z

you sure GET request can handle this.props.formData? in airbnb many, many chart's formData are longer than 4k chars :)

why not use Redis cache query results, in aribnb the round trip to fetch data from Redis is only 600ms~800ms.

i think we should use etag for dashboard metadata (like dashboard layout, huge json blob...)

betodealmeida · 2019-03-14T07:14:23Z

@graceguo-supercat:

you sure GET request can handle this.props.formData? in airbnb many, many chart's formData are longer than 4k chars :)

Thanks, I'm aware of that problem. Here the GET request has only the chart ID, and the form data is read from the saved chart (params in the slices table is used).

why not use Redis cache query results, in aribnb the round trip to fetch data from Redis is only 600ms~800ms.

The decorator is using Redis for the server-side caching, but it's caching the HTTP response instead of the dataframe (saving time that's spent in serialization). But I'm also using the native browser cache (through the Expires header) and the new "Cache interface API" (in a separate PR that I'm finishing).

i think we should use etag for dashboard metadata (like dashboard layout, huge json blob...)

Everywhere! :)

graceguo-supercat · 2019-03-14T07:20:46Z

Here the GET request has only the chart ID, and the form data is read from the saved chart (params in the slices table is used).
how about in dashboard with filter, each chart's query is not the saved params? the actual query is formData overwrite saved chart's params.
Also, you still have a formData in GET request parameters right (you can see request parameters from browser location bar)? this is the blocker that cause issue.

betodealmeida · 2019-03-14T08:32:44Z

how about in dashboard with filter, each chart's query is not the saved params? the actual query is formData overwrite saved chart's params.

Ah, you're right. The filter box works when changed, since it does a POST. But in the initial load it's not taken into consideration. Let me see how I can fix that.

Also, you still have a formData in GET request parameters right (you can see request parameters from browser location bar)? this is the blocker that cause issue.

Yes, but it has only the slice_id in it right now, eg: form_data: {"slice_id":78}. I'll see if I can append any additional parameters that are set by filters, this should keep it small.

betodealmeida · 2019-03-14T08:45:52Z

@graceguo-supercat I tested the interaction with filter boxes and it's not working. I'll work on fixing it.

williaster · 2019-03-14T18:40:19Z

@betodealmeida this seems pretty complicated on top of data requests that are already complicated
and error prone 🙉

My first question to gauge whether it's worthwhile is: what are the speedup times you are seeing for existing approach vs your new approach? (ideally for multiple dashboards of varying size)

betodealmeida · 2019-03-14T20:55:53Z

@williaster the speedup will greatly depend on how often the data changes, how big the payload is (bignum vs deck.gl varies significantly), the duration of the cache, and how slow the network is. I can (and have) run tests against the example dashboards, but I don't think they would be significative, since they don't cover all the real life use cases.

I think the question we should ask here is: "given that this is clearly an improvement, how can we make it bug free?"

john-bodley · 2019-03-14T22:04:37Z

@betodealmeida just to clarify step 4 (and per your diagram) you have:

Before: Superset caches the dataframe.
After: Superset serializes the dataframe and caches the HTTP response.

yet in the code it still seems like we're caching the result set from the database and thus I wonder if the diagram and thus After phase should mention that there's an additional entry in the server cache, i.e., after step (4) the server-side cache would contain:

The cached result set (as a dataframe).
The cached superset/explore_json HTTP response which includes server-side Python visualization specific mutations.

Note I'm not saying this is wrong as I strongly believe that the database response should be cache given that represents the bulk of the compute, I just wanted to get clarity on the logic.

williaster · 2019-03-14T22:54:30Z

@betodealmeida this is a big change that has the potential to impact many many users. If you're unable to provide numbers that indicate that this is strictly an improvement (or at a minimum no regressions), I'm a bit reluctant to introduce this additional complexity.

I think it's part of the expected work of a feature like this to demonstrate the effects with real-life examples. It seems like you should be able to use real Lyft dashboards, dashboards from the "example datasets", etc., and you can throttle network speed using dev tools.

Another concern I have is the impact on the # of requests. We've needed to introduce domain sharding because of the large number of simultaneous requests made by larger dashboards, and this potential DOUBLES that number, so I would want to see perf numbers that demonstrate no regressions for that case as well.

mistercrunch · 2019-03-15T00:31:07Z

Not directly related, but on the topic of optimization around caching, it would be an extra win to make the caching call async. Maybe a first use case for the async/await syntax in Superset.

williaster · 2019-03-15T03:58:27Z

@mistercrunch fetch (SupersetClient) is async? do you just mean the async/await syntax?

mistercrunch · 2019-03-15T05:32:18Z

@williaster I'm thinking about something else: on the server side, in python, making the cache.set(...) call async, meaning while a thread is pushing to the caching backend, the web server can start streaming the response to the client at the same time. There's no need to wait on the caching to start sending the response back...

betodealmeida · 2019-03-15T08:54:31Z

@williaster:

@betodealmeida this is a big change that has the potential to impact many many users. If you're unable to provide numbers that indicate that this is strictly an improvement (or at a minimum no regressions), I'm a bit reluctant to introduce this additional complexity.

I think it's part of the expected work of a feature like this to demonstrate the effects with real-life examples. It seems like you should be able to use real Lyft dashboards, dashboards from the "example datasets", etc., and you can throttle network speed using dev tools.

Sure, I will give the numbers of the example dashboards and some of the Lyft dashboards. My point was that, considering that this is a strict improvement, defining a threshold to accept the changes seems arbitrary to me.

Of course I agree that if this causes a regression or no significant improvement we should not do it. Maybe it's not clear, but with this PR the client will always issue a smaller number of requests, and a percentage of those requests will receive body-less responses. Combined with the fact that we move the server cache closer to the user, there shouldn't be any regressions with this PR, unless I'm doing something stupid (which I've done in the past).

Also, keep in mind that this PR is against the lyftga branch, so we expect to test it in production before merging into master.

Another concern I have is the impact on the # of requests. We've needed to introduce domain sharding because of the large number of simultaneous requests made by larger dashboards, and this potential DOUBLES that number, so I would want to see perf numbers that demonstrate no regressions for that case as well.

Sorry, why do you think this would double the number of requests? The number of requests should be strictly equal or smaller: resources within the lifetime of the cache are no longer requested because the browser reads directly from its cache, and conditional requests are still a single request. The server will either return a normal response (200 OK) or a 304 Not Modified without body. There's no pre-flight request.

betodealmeida · 2019-03-15T09:48:16Z

@john-bodley not sure if I understand your question. Currently we cache the dataframe in viz.BaseViz.get_df_payload. I haven't touched that code, but I added an additional cache storing the Response object instead. This has a few benefits:

It skips unpickling the dataframe and serializing it to json.
We don't need to recompute the ETag every time we read from the cache, since it's stored in the cached response.

Eventually we can remove the dataframe caching, but I'd rather do that in a separate PR.

betodealmeida · 2019-03-15T09:50:15Z

@graceguo-supercat I changed the code so that in the GET request we pass:

the slice_id
any extra_filters

This way the GET request is still relatively small (and we can switch to Rison for a smaller URL if needed), and the caching mechanism works with dashboards that have filters saved.

* added more functionalities for query context and object. * fixed cache logic * added default value for groupby * updated comments and removed print (cherry picked from commit d5b9795)

(cherry picked from commit 6a4d507)

…esh (apache#7027) (cherry picked from commit cc58f0e)

john-bodley · 2019-03-15T17:08:10Z

@betodealmeida my point was that I felt that the picture and description doesn't accurately reflect what your code is actually doing, i.e., that there's actually two server-side cache keys associated with each chart. They're both using the same underlying cache but the diagram and steps don't accurately reflect this.

I agree with your logic but thought maybe the steps should be more explicit, e.g.,

After

DB returns a dataframe.
Superset caches the dataframe.
Superset serializes the payload.
Superset caches the the HTTP response associated with the payload.
Superset sends the payload to user "A", with an Expires header of 1 hour, and an ETag header which is a hash of the payload.

betodealmeida · 2019-03-16T05:30:59Z

@john-bodley you're right. I ended up describing in "after" the workflow once we remove the dataframe cache.

williaster · 2019-04-03T16:57:00Z

thanks for the benchmarks!

will let @john-bodley sign off since he had the last requested change.

mistercrunch · 2019-04-23T19:35:01Z

I was working on this tricky bug and ended up here as the cause for it. The issue is around the fact that the formData may need to get sanitized by frontend-related logic.

The bug or symptom is the "Genders by State" example started showing an extra (3rd) metric (sum__num) on top of Boys and Girls.

There's a lot going on related to this, but a key point here is that we have control-related logic like default and things around control panel that sanitizes formData. In that particular case, the example's form data is malformed with both metrics and metric. In the explore view, or in a normal POST request, the formData gets sanitized:

missing keys get filled with the control's default
extra keys get deleted
more stuff

Now another thing that compounds here is that viz.py blindly looks at all METRIC_KEYS to help craft the query_obj. So the combo of these things create the issue:

example is malformed, it should have only metrics and NO metric key (easy to fix)
in GET mode, the formData does not get "sanitized" by the frontend
viz.py could be more prescriptive about the metrics it builds into the query object

Now it's pretty clear that addressing any of these 3 things would fix my symptom, but point 2 on its own is worrisome. It can lead to intricate issues over time. Say I save a chart, and after that I add a new control to that vizType, in the context of the GET, it won't get the right default.

Good news is I'm working on a refactor #7350 to help and clean up all of the control / formData processing logic. It grew out of control over time. Now the assumption is that all request would make it through this logic, and I'm realizing that's not the case, at least since this PR.

Ideas?

mistercrunch · 2019-05-29T06:34:46Z

superset/utils/decorators.py

+        @wraps(f)
+        def wrapper(*args, **kwargs):
+            # check if the user can access the resource
+            check_perms(*args, **kwargs)


I was digging around to try and figure out where the datasource access permission is done nowadays, and found it here in the etag_cache decorator. I feel like it's not the right place for it.

I understand that this needs to happen prior to reading from cache, but maybe it should be done as a prior decorator, or maybe both of these routines should be done inside a method instead of decorators, to avoid calling get_viz twice.

mistercrunch · 2019-05-29T06:36:08Z

superset/views/core.py

+    form_data, slc = get_form_data(slice_id, use_slice_data=True)
+    datasource_type = slc.datasource.type
+    datasource_id = slc.datasource.id
+    viz_obj = get_viz(


I think get_viz gets called at least two times now (here and in the view itself)

mistercrunch · 2019-05-29T06:40:50Z

Found some other issues here that I wanted to raise ^^^

Also I noticed that the big "merge" on master of this and much more stuff got actually done on a single commit instead of a proper merge 538776b

In the future, lyftga branches and the likes should be merged, not squashed and merged as we loose tons of history. For instance if we wanted to revert this PR, there's no single commit in master we can address, we'd have to revert all of 538776b or get really creative...

DiggidyDave · 2019-05-29T17:13:40Z

@mistercrunch agreed, and after that lyftga branch all of our commits were merged individually (from lyft-release-sp8. We have now switched to working on and continuously deploying from master so this should not longer even be able to happen. ;-)

khtruong and others added 4 commits March 5, 2019 11:11

Sparkline dates aren't formatting in Time Series Table (apache#6976)

1ef7fb6

* Exclude venv for python linter to ignore * Fix NaN error

Fix the white background shown in SQL editor on drag (apache#7021)

e7d97db

This PR sets the background-color css property on `.ace_scroller` instead of `.ace_content` to prevent the white background shown during resizing of the SQL editor before drag ends.

Show tooltip with time frame (apache#6979)

e6886fb

Fix time filter control (apache#6978)

b3966e4

betodealmeida requested review from mistercrunch and xtinec March 14, 2019 05:00

betodealmeida mentioned this pull request Mar 14, 2019

feat: add support for conditional requests apache-superset/superset-ui#119

Merged

conglei and others added 3 commits March 15, 2019 09:11

Enhancement of query context and object. (apache#6962)

f4e3923

* added more functionalities for query context and object. * fixed cache logic * added default value for groupby * updated comments and removed print (cherry picked from commit d5b9795)

[fix] /superset/slice/id url is too long (apache#6989)

d7b2c3e

(cherry picked from commit 6a4d507)

[WIP] fix user specified JSON metadata not updating dashboard on refr…

9855837

…esh (apache#7027) (cherry picked from commit cc58f0e)

kristw added enhancement:request Enhancement request submitted by anyone from the community risk:breaking-change Issues or PRs that will introduce breaking changes labels Mar 15, 2019

betodealmeida added 11 commits April 2, 2019 10:49

Do POST request on new charts

c0caac8

Set extra/adhoc filters only in GET requests

4b9ffa7

Raise if check_perms fails

b75e11e

Refactor auth

96e3c9b

Fix flake8

a1fdb44

Fix js unit tests

d0bff86

Fix js unit tests that fail in lyftga

057b953

Fix js

7da0157

Merge

ce17182

Fix bad merge

a364194

Use far future when max_age=0

5e418ed

john-bodley approved these changes Apr 3, 2019

View reviewed changes

betodealmeida added 4 commits April 3, 2019 11:05

Merge

407a87f

Merge branch 'lyft-VIZ-334' into VIZ-334

b48c57e

Fix conflict

cb8cc64

Merge branch 'lyft-VIZ-334' into VIZ-334

67e5980

betodealmeida merged commit 538776b into apache:lyftga Apr 3, 2019

This was referenced Apr 17, 2019

Cache invalidation #7319

Closed

Caching charts in Superset #7340

Closed

graceguo-supercat mentioned this pull request Apr 23, 2019

Feature flag for client cache #7348

Merged

12 tasks

mistercrunch reviewed May 29, 2019

View reviewed changes

ktmud mentioned this pull request Oct 2, 2020

fix: enable consistent etag across workers and force no-cache for dashboards #11137

Merged

6 tasks

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.34.0 labels Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fetch charts with GET to benefit from browser cache and conditional requests #7032

Fetch charts with GET to benefit from browser cache and conditional requests #7032

betodealmeida commented Mar 14, 2019 •

edited

Loading

betodealmeida commented Mar 14, 2019 •

edited

Loading

graceguo-supercat commented Mar 14, 2019 •

edited

Loading

betodealmeida commented Mar 14, 2019

graceguo-supercat commented Mar 14, 2019 •

edited

Loading

betodealmeida commented Mar 14, 2019

betodealmeida commented Mar 14, 2019

williaster commented Mar 14, 2019

betodealmeida commented Mar 14, 2019

john-bodley commented Mar 14, 2019

williaster commented Mar 14, 2019

mistercrunch commented Mar 15, 2019

williaster commented Mar 15, 2019

mistercrunch commented Mar 15, 2019 •

edited

Loading

betodealmeida commented Mar 15, 2019 •

edited

Loading

betodealmeida commented Mar 15, 2019

betodealmeida commented Mar 15, 2019

john-bodley commented Mar 15, 2019

betodealmeida commented Mar 16, 2019

williaster commented Apr 3, 2019

mistercrunch commented Apr 23, 2019

mistercrunch May 29, 2019

mistercrunch May 29, 2019

mistercrunch commented May 29, 2019

DiggidyDave commented May 29, 2019 •

edited

Loading

Fetch charts with GET to benefit from browser cache and conditional requests #7032

Fetch charts with GET to benefit from browser cache and conditional requests #7032

Conversation

betodealmeida commented Mar 14, 2019 • edited Loading

The problem

Before

After

Notes

betodealmeida commented Mar 14, 2019 • edited Loading

graceguo-supercat commented Mar 14, 2019 • edited Loading

betodealmeida commented Mar 14, 2019

graceguo-supercat commented Mar 14, 2019 • edited Loading

betodealmeida commented Mar 14, 2019

betodealmeida commented Mar 14, 2019

williaster commented Mar 14, 2019

betodealmeida commented Mar 14, 2019

john-bodley commented Mar 14, 2019

williaster commented Mar 14, 2019

mistercrunch commented Mar 15, 2019

williaster commented Mar 15, 2019

mistercrunch commented Mar 15, 2019 • edited Loading

betodealmeida commented Mar 15, 2019 • edited Loading

betodealmeida commented Mar 15, 2019

betodealmeida commented Mar 15, 2019

john-bodley commented Mar 15, 2019

betodealmeida commented Mar 16, 2019

williaster commented Apr 3, 2019

mistercrunch commented Apr 23, 2019

mistercrunch May 29, 2019

Choose a reason for hiding this comment

mistercrunch May 29, 2019

Choose a reason for hiding this comment

mistercrunch commented May 29, 2019

DiggidyDave commented May 29, 2019 • edited Loading

betodealmeida commented Mar 14, 2019 •

edited

Loading

betodealmeida commented Mar 14, 2019 •

edited

Loading

graceguo-supercat commented Mar 14, 2019 •

edited

Loading

graceguo-supercat commented Mar 14, 2019 •

edited

Loading

mistercrunch commented Mar 15, 2019 •

edited

Loading

betodealmeida commented Mar 15, 2019 •

edited

Loading

DiggidyDave commented May 29, 2019 •

edited

Loading