Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: remove unnecessary dataset queries from dashboard requests #16110

Merged

Conversation

graceguo-supercat
Copy link

@graceguo-supercat graceguo-supercat commented Aug 6, 2021

SUMMARY

In airbnb we did some investigation for server-side processing time for dashboard requests, try to find perf bottleneck and improve the dashboard performance, especially for those large ones with multiple datasets and tabs.

Currently when user opens a dashboard we need 3 different API to fetch data:

  • GET /api/v1/dashboard/<id_or_slug>: to get dashboard metadata
  • GET /api/v1/dashboard/<id_or_slug>/charts: to get charts metadata for a given dashboard id
  • GET /api/v1/dashboard/<id_or_slug>/datasets: to get dataset metadata for a given dashboard id

Since this PR, we found the queries to tables is perf bottleneck, so we already made datasets request non-blocking for dashboard render. And given we already split big blob of dashboard data into 3 different APIs, it seems not necessary for each of request to query tables. This PR is to remove unnecessary dataset queries from these 2 requests.

We run some profiling for /dashboard/id and /dashboard/id/charts requests in Datadog. I used a heavy dashboard in airbnb as example, which has 80 datasets and about 300 charts.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Before:
Screen Shot 2021-08-06 at 5 53 10 PM

Screen Shot 2021-08-06 at 6 42 02 PM

You will see that even each query to tables table only takes 1.2 ~ 1.9 ms, but the queries was called a few hundred times, it cause the overall response time took ~ 1 seconds (for both dashboard and dashboard/charts request)

After:
Screen Shot 2021-08-06 at 6 47 18 PM
Screen Shot 2021-08-06 at 6 47 36 PM

After removing unnecessary queries to datasets, pymysql.query operation is reduced a lot.

TESTING INSTRUCTIONS

CI and manual test.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@codecov
Copy link

codecov bot commented Aug 6, 2021

Codecov Report

Merging #16110 (0b05c27) into master (423ff50) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #16110   +/-   ##
=======================================
  Coverage   76.83%   76.84%           
=======================================
  Files         995      995           
  Lines       52884    52881    -3     
  Branches     6721     6721           
=======================================
  Hits        40636    40636           
+ Misses      12023    12020    -3     
  Partials      225      225           
Flag Coverage Δ
hive 81.33% <100.00%> (+0.03%) ⬆️
javascript 71.23% <ø> (+0.01%) ⬆️
mysql 81.59% <100.00%> (-0.01%) ⬇️
postgres 81.61% <100.00%> (-0.01%) ⬇️
presto 81.40% <100.00%> (-0.01%) ⬇️
python 82.12% <100.00%> (-0.01%) ⬇️
sqlite 81.25% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/charts/schemas.py 100.00% <ø> (ø)
superset/dashboards/schemas.py 99.37% <ø> (-0.01%) ⬇️
superset/models/dashboard.py 76.69% <ø> (-0.30%) ⬇️
superset/views/dashboard/mixin.py 95.00% <100.00%> (ø)
...rontend/src/components/Select/DeprecatedSelect.tsx 85.71% <0.00%> (-1.03%) ⬇️
.../src/dashboard/components/RefreshIntervalModal.tsx 89.47% <0.00%> (ø)
...ontrols/DndColumnSelectControl/DndColumnSelect.tsx 46.42% <0.00%> (ø)
...ntend/src/dashboard/components/CssEditor/index.jsx 96.42% <0.00%> (+0.59%) ⬆️
...uperset-frontend/src/components/Menu/MenuRight.tsx 92.15% <0.00%> (+1.59%) ⬆️
...ols/DndColumnSelectControl/utils/optionSelector.ts 46.15% <0.00%> (+11.53%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 423ff50...0b05c27. Read the comment docs.

@graceguo-supercat graceguo-supercat changed the title [wip]refactor: remove unnecessary dataset queries from dashboard requests refactor: remove unnecessary dataset queries from dashboard requests Aug 6, 2021
@ktmud
Copy link
Member

ktmud commented Aug 6, 2021

table_names is still used here (looks like for FAB CRUD views), should we clean up it as well?

@ktmud
Copy link
Member

ktmud commented Aug 6, 2021

I think it's also safe to remove

@property
def table_names(self) -> str:
# pylint: disable=no-member
return ", ".join(str(s.datasource.full_name) for s in self.slices)
as that method is not particularly efficient (should've used batch query instead of looping through all slices).

@pull-request-size pull-request-size bot added size/S and removed size/XS labels Aug 6, 2021
Copy link
Member

@suddjian suddjian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks familiar :)

@graceguo-supercat graceguo-supercat merged commit 85329c3 into apache:master Aug 7, 2021
opus-42 pushed a commit to opus-42/incubator-superset that referenced this pull request Nov 14, 2021
…pache#16110)

* refactor: remove unnecessary dataset queries from dashboard requests

* fix comments
QAlexBall pushed a commit to QAlexBall/superset that referenced this pull request Dec 28, 2021
…pache#16110)

* refactor: remove unnecessary dataset queries from dashboard requests

* fix comments
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.4.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/S 🚢 1.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants