feat: Synchronously return cached charts #15157

benjreinhart · 2021-06-14T22:51:44Z

SUMMARY

This is for the global async queries feature. We'd like to return the query results immediately (synchronously) if they are cached, otherwise we'll kick off the background job to run the query and later notify clients.

TESTING INSTRUCTIONS

Manual

ADDITIONAL INFORMATION

Has associated issue:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

cc @robdiciuccio

codecov · 2021-06-14T23:08:58Z

Codecov Report

Merging #15157 (e82cab3) into master (5e543e3) will increase coverage by 0.14%.
The diff coverage is 44.73%.

@@            Coverage Diff             @@
##           master   #15157      +/-   ##
==========================================
+ Coverage   77.09%   77.24%   +0.14%     
==========================================
  Files         971      971              
  Lines       50236    50343     +107     
  Branches     6494     6148     -346     
==========================================
+ Hits        38729    38887     +158     
+ Misses      11302    11251      -51     
  Partials      205      205

Flag	Coverage Δ
hive	`81.44% <96.42%> (+0.01%)`	⬆️
javascript	`71.75% <14.58%> (+0.14%)`	⬆️
mysql	`81.71% <96.42%> (+0.01%)`	⬆️
postgres	`81.73% <96.42%> (+0.01%)`	⬆️
presto	`81.43% <96.42%> (?)`
python	`82.26% <96.42%> (+0.16%)`	⬆️
sqlite	`81.36% <96.42%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...veFilters/FilterBar/FilterControls/FilterValue.tsx	`67.30% <0.00%> (-1.51%)`	⬇️
...src/explore/components/controls/ViewQueryModal.tsx	`76.92% <0.00%> (ø)`
...onfigModal/FiltersConfigForm/FiltersConfigForm.tsx	`68.30% <7.14%> (+0.09%)`	⬆️
superset-frontend/src/chart/chartAction.js	`50.96% <30.76%> (-1.75%)`	⬇️
superset/views/core.py	`75.58% <91.66%> (+0.12%)`	⬆️
...nd/src/explore/components/DataTablesPane/index.tsx	`83.90% <100.00%> (ø)`
superset/charts/api.py	`86.80% <100.00%> (+0.35%)`	⬆️
...nfigModal/FiltersConfigForm/CollapsibleControl.tsx	`91.30% <0.00%> (-8.70%)`	⬇️
...ters/FiltersConfigModal/FiltersConfigForm/state.ts	`96.66% <0.00%> (-3.34%)`	⬇️
superset-frontend/src/SqlLab/reducers/sqlLab.js	`34.12% <0.00%> (-0.84%)`	⬇️
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5e543e3...e82cab3. Read the comment docs.

dpgaspar · 2021-06-15T12:00:37Z

superset/charts/api.py

+
+            # If the chart query has already been cached, return it immediately.
+            if already_cached_result:
+                return self.send_chart_response(result)


can you add a test for this code path?

villebro · 2021-06-15T12:54:45Z

@benjreinhart I believe the alternative to return already cached results sync was discussed earlier in the GAQ development process, but it was decided to always use the async route. What is the motivation for returning the already cached results sync? (I don't oppose the decision, just curious to understand if there are perf or other reasons to do this).

benjreinhart · 2021-06-15T18:27:43Z

@villebro yeah I think from an interface design PoV, it's definitely cleaner to have consistent behavior / response structure. So I think you're right to question this change.

The reasoning thus far has been performance related. We've noticed in our testing that there are times when the celery queue gets congested quickly and latency increases. A number of those tasks are to run queries in the background. A subset of those tasks will end up returning the cached result. For the subset that is cached, this change will help take some load off the celery queues and also reduce the chances that users are stuck waiting for celery queues to process a task before they can see their chart render. @robdiciuccio may have some additional reasoning.

That being said, after typing that out, it does feel like the best solution is to get celery into a more predictable state with separate queues and resource allocation. I know @robdiciuccio has been working hard on that, so maybe there's an update there?

robdiciuccio · 2021-06-15T20:24:09Z

superset-frontend/src/chart/chartAction.js

@@ -395,7 +395,16 @@ export function exploreJSON(
        if (isFeatureEnabled(FeatureFlag.GLOBAL_ASYNC_QUERIES)) {
          // deal with getChartDataRequest transforming the response data
          const result = 'result' in response ? response.result[0] : response;
-          return waitForAsyncData(result);
+          const awaitingChartData =
+            'job_id' in result && 'channel_id' in result;


Can we switch on HTTP status code instead here?

Yep, good suggestion!

robdiciuccio · 2021-06-15T22:58:58Z

We had this debate early on, whether pre-cached data should be returned synchronously. There are pros and cons here, as @benjreinhart pointed out. We're realizing potential performance gains while making the API slightly more complex. That said, the responses from the API endpoints use different HTTP status codes for different use cases (200 for cached data, 202 for async job metadata). Personally, I feel it's acceptable for an API endpoint to have different status code responses for different use cases, but this quickly gets pretty subjective and philosophical.

The problem we're trying to solve for with this change is avoiding chart/dashboard rendering delays due to (unnecessary) async job workflows for pre-cached data. Running Celery at scale without delays is *cough* difficult, and we're optimizing for user experience.

Happy to continue the discussion and get additional viewpoints. I believe @ktmud and @etr2460 had comments on this in the initial implementation.

villebro · 2021-06-16T05:17:35Z

@benjreinhart @robdiciuccio thanks for the context. Like I said, I don't have a strong opinion here, and I agree that deferring an already cached result to the async flow seems like an unnecessary hoop to jump through. Having said that, it could be nice to have the option to configure this behavior with a new config key GLOBAL_ASYNC_QUERIES_RESPONSE_TYPE that accepts values eager (the new logic, default) and async (the old behavior).

benjreinhart · 2021-06-16T16:40:49Z

These are the existing states we have:

Legacy API
- GAQ turned off: synchronous
- GAQ turned on: asynchronous
  - Response cached? synchronous
New/Chart API
- GAQ turned off: synchronous
- GAQ turned on: asynchronous
  - Response cached? synchronous

There's also the cache retrieval endpoints, and a configurable transport that implies either more endpoints (polling) or a new service. In other words, there's already quite a bit going on here.

It may be pretty small to make the change you're describing, @villebro, but my preference would be to keep things simpler unless it's truly needed, and I don't know that making this behavior configurable adds much value. Happy to consider making the change if you feel strongly or others feel differently.

villebro · 2021-06-16T18:20:05Z

@benjreinhart no, I don't feel strongly about this, so no problem with restricting the options and keeping configuration simple (it's easy to add this later if the need ever comes up). Btw I will try to actually look at the code tomorrow unless it gets merged by then 😄

robdiciuccio · 2021-06-16T22:14:29Z

/testenv up

github-actions · 2021-06-16T22:16:18Z

@robdiciuccio Ephemeral environment spinning up at http://54.212.107.240:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

robdiciuccio

Code LGTM. I'm going to do some more testing locally with different configurations.

robdiciuccio

Tested sync and async query operation. LGTM with latest fix for forced cache refresh.

github-actions · 2021-06-22T17:01:23Z

Ephemeral environment shutdown and build artifacts deleted.

* feat: Synchronously return cached charts * Fix lint issue * Fix python lint error * Change getChartDataRequest to return response * Fix lint errors * Add test * explore_json: skip cached data check for forced refresh Co-authored-by: Rob DiCiuccio <rob.diciuccio@gmail.com>

pull-request-size bot added the size/M label Jun 14, 2021

dpgaspar reviewed Jun 15, 2021

View reviewed changes

benjreinhart force-pushed the benjreinhart/sync-resp branch from a9b253d to ffca544 Compare June 15, 2021 18:13

robdiciuccio reviewed Jun 15, 2021

View reviewed changes

benjreinhart mentioned this pull request Jun 15, 2021

getChartDataRequest handling does not always account for async query responses #15183

Closed

6 tasks

pull-request-size bot added size/L and removed size/M labels Jun 16, 2021

benjreinhart added 6 commits June 16, 2021 14:44

feat: Synchronously return cached charts

8b93cad

Fix lint issue

0b6b6f8

Fix python lint error

e66c423

Change getChartDataRequest to return response

0f9e1a2

Fix lint errors

0f91326

Add test

83742cd

benjreinhart force-pushed the benjreinhart/sync-resp branch from b5654e5 to 83742cd Compare June 16, 2021 21:45

robdiciuccio reviewed Jun 16, 2021

View reviewed changes

explore_json: skip cached data check for forced refresh

e82cab3

robdiciuccio approved these changes Jun 17, 2021

View reviewed changes

robdiciuccio merged commit ab153e6 into apache:master Jun 22, 2021

robdiciuccio deleted the benjreinhart/sync-resp branch June 22, 2021 17:01

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.3.0 labels Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Synchronously return cached charts #15157

feat: Synchronously return cached charts #15157

benjreinhart commented Jun 14, 2021

codecov bot commented Jun 14, 2021 •

edited

Loading

dpgaspar Jun 15, 2021

villebro commented Jun 15, 2021

benjreinhart commented Jun 15, 2021 •

edited

Loading

robdiciuccio Jun 15, 2021

benjreinhart Jun 15, 2021

robdiciuccio commented Jun 15, 2021 •

edited

Loading

villebro commented Jun 16, 2021

benjreinhart commented Jun 16, 2021 •

edited

Loading

villebro commented Jun 16, 2021

robdiciuccio commented Jun 16, 2021

github-actions bot commented Jun 16, 2021

robdiciuccio left a comment

robdiciuccio left a comment

github-actions bot commented Jun 22, 2021

feat: Synchronously return cached charts #15157

feat: Synchronously return cached charts #15157

Conversation

benjreinhart commented Jun 14, 2021

SUMMARY

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

codecov bot commented Jun 14, 2021 • edited Loading

Codecov Report

dpgaspar Jun 15, 2021

Choose a reason for hiding this comment

villebro commented Jun 15, 2021

benjreinhart commented Jun 15, 2021 • edited Loading

robdiciuccio Jun 15, 2021

Choose a reason for hiding this comment

benjreinhart Jun 15, 2021

Choose a reason for hiding this comment

robdiciuccio commented Jun 15, 2021 • edited Loading

villebro commented Jun 16, 2021

benjreinhart commented Jun 16, 2021 • edited Loading

villebro commented Jun 16, 2021

robdiciuccio commented Jun 16, 2021

github-actions bot commented Jun 16, 2021

robdiciuccio left a comment

Choose a reason for hiding this comment

robdiciuccio left a comment

Choose a reason for hiding this comment

github-actions bot commented Jun 22, 2021

codecov bot commented Jun 14, 2021 •

edited

Loading

benjreinhart commented Jun 15, 2021 •

edited

Loading

robdiciuccio commented Jun 15, 2021 •

edited

Loading

benjreinhart commented Jun 16, 2021 •

edited

Loading