Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improve performance on reports log queries #26416

Merged
merged 4 commits into from
Jan 15, 2024

Conversation

dpgaspar
Copy link
Member

@dpgaspar dpgaspar commented Jan 5, 2024

SUMMARY

Add indexes to alerts and reports logs to improve REST API performance

Using test data with 100 reports with 10K report logs each spread out by 100 days. total 1M logs. Globally on 100 reports we have 10K logs per day.

Current PG explain for:

explain analyse SELECT report_execution_log.id AS report_execution_log_id, report_execution_log.uuid AS report_execution_log_uuid, report_execution_log.scheduled_dttm AS report_execution_log_scheduled_dttm, report_execution_log.start_dttm AS report_execution_log_start_dttm, report_execution_log.end_dttm AS report_execution_log_end_dttm, report_execution_log.value AS report_execution_log_value, report_execution_log.value_row_json AS report_execution_log_value_row_json, report_execution_log.state AS report_execution_log_state, report_execution_log.error_message AS report_execution_log_error_message
FROM report_execution_log
WHERE report_execution_log.start_dttm > '2023-04-01T00:00:00.000000'::timestamp AND report_execution_log.state != 'Working' AND 13 = report_execution_log.report_schedule_id ORDER BY report_execution_log.start_dttm DESC
 LIMIT 100 OFFSET 100
Limit  (cost=17645.78..17657.45 rows=100 width=124) (actual time=286.918..303.029 rows=100 loops=1)
  ->  Gather Merge  (cost=17634.11..17736.55 rows=878 width=124) (actual time=286.798..302.946 rows=200 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Sort  (cost=16634.09..16635.18 rows=439 width=124) (actual time=250.493..250.548 rows=67 loops=3)
              Sort Key: start_dttm DESC
              Sort Method: top-N heapsort  Memory: 50kB
              Worker 0:  Sort Method: quicksort  Memory: 25kB
              Worker 1:  Sort Method: quicksort  Memory: 25kB
              ->  Parallel Seq Scan on report_execution_log  (cost=0.00..16615.11 rows=439 width=124) (actual time=154.762..248.396 rows=333 loops=3)
                    Filter: ((start_dttm > '2023-04-01 00:00:00'::timestamp without time zone) AND ((state)::text <> 'Working'::text) AND (13 = report_schedule_id))
                    Rows Removed by Filter: 333000
Planning Time: 0.654 ms
Execution Time: 303.230 ms

Other simpler example:

explain analyse SELECT report_execution_log.id AS report_execution_log_id, report_execution_log.uuid AS report_execution_log_uuid, report_execution_log.scheduled_dttm AS report_execution_log_scheduled_dttm, report_execution_log.start_dttm AS report_execution_log_start_dttm, report_execution_log.end_dttm AS report_execution_log_end_dttm, report_execution_log.value AS report_execution_log_value, report_execution_log.value_row_json AS report_execution_log_value_row_json, report_execution_log.state AS report_execution_log_state, report_execution_log.error_message AS report_execution_log_error_message
FROM report_execution_log 
WHERE report_execution_log.report_schedule_id = 13
 LIMIT 100 OFFSET 100
Limit  (cost=202.54..405.08 rows=100 width=124) (actual time=1.658..1.836 rows=100 loops=1)
  ->  Seq Scan on report_execution_log  (cost=0.00..21807.34 rows=10767 width=124) (actual time=1.488..1.727 rows=200 loops=1)
        Filter: (report_schedule_id = 13)
        Rows Removed by Filter: 1440
Planning Time: 1.900 ms
Execution Time: 2.560 ms

After creating the indexes

explain analyse SELECT report_execution_log.id AS report_execution_log_id, report_execution_log.uuid AS report_execution_log_uuid, report_execution_log.scheduled_dttm AS report_execution_log_scheduled_dttm, report_execution_log.start_dttm AS report_execution_log_start_dttm, report_execution_log.end_dttm AS report_execution_log_end_dttm, report_execution_log.value AS report_execution_log_value, report_execution_log.value_row_json AS report_execution_log_value_row_json, report_execution_log.state AS report_execution_log_state, report_execution_log.error_message AS report_execution_log_error_message
FROM report_execution_log
WHERE report_execution_log.start_dttm > '2023-04-01T00:00:00.000000'::timestamp AND report_execution_log.state != 'Working' AND 13 = report_execution_log.report_schedule_id ORDER BY report_execution_log.start_dttm DESC
 LIMIT 100 OFFSET 100
Limit  (cost=433.36..433.61 rows=100 width=124) (actual time=17.852..17.937 rows=100 loops=1)
  ->  Sort  (cost=433.11..435.75 rows=1057 width=124) (actual time=17.790..17.857 rows=200 loops=1)
        Sort Key: start_dttm DESC
        Sort Method: top-N heapsort  Memory: 50kB
        ->  Index Scan using ix_report_execution_log_report_schedule_id on report_execution_log  (cost=0.42..387.43 rows=1057 width=124) (actual time=0.238..15.534 rows=1000 loops=1)
              Index Cond: (report_schedule_id = 13)
              Filter: ((start_dttm > '2023-04-01 00:00:00'::timestamp without time zone) AND ((state)::text <> 'Working'::text))
              Rows Removed by Filter: 9000
Planning Time: 1.083 ms
Execution Time: 18.130 ms

On the simpler example:

explain analyse SELECT report_execution_log.id AS report_execution_log_id, report_execution_log.uuid AS report_execution_log_uuid, report_execution_log.scheduled_dttm AS report_execution_log_scheduled_dttm, report_execution_log.start_dttm AS report_execution_log_start_dttm, report_execution_log.end_dttm AS report_execution_log_end_dttm, report_execution_log.value AS report_execution_log_value, report_execution_log.value_row_json AS report_execution_log_value_row_json, report_execution_log.state AS report_execution_log_state, report_execution_log.error_message AS report_execution_log_error_message
FROM report_execution_log 
WHERE report_execution_log.report_schedule_id = 13
 LIMIT 100 OFFSET 100
Limit  (cost=3.51..6.59 rows=100 width=124) (actual time=0.352..0.606 rows=100 loops=1)
  ->  Index Scan using ix_report_execution_log_report_schedule_id on report_execution_log  (cost=0.42..333.43 rows=10800 width=124) (actual time=0.123..0.505 rows=200 loops=1)
        Index Cond: (report_schedule_id = 13)
Planning Time: 0.480 ms
Execution Time: 0.761 ms

Observed improvement:

  • Using start_dttm and report_schedule_id filters: ~16x
  • Using just a report_schedule_id filter: ~3x

Locally upgrading took:

flask db upgrade  2.46s user 2.26s system 29% cpu 16.261 total

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@dpgaspar dpgaspar requested a review from a team as a code owner January 5, 2024 16:25
Copy link

codecov bot commented Jan 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (3a7d76c) 69.15% compared to head (1ab8165) 67.04%.
Report is 16 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #26416      +/-   ##
==========================================
- Coverage   69.15%   67.04%   -2.12%     
==========================================
  Files        1947     1947              
  Lines       76010    76032      +22     
  Branches     8481     8481              
==========================================
- Hits        52565    50974    -1591     
- Misses      21264    22877    +1613     
  Partials     2181     2181              
Flag Coverage Δ
hive 53.68% <100.00%> (+<0.01%) ⬆️
mysql 78.05% <100.00%> (-0.03%) ⬇️
postgres 78.17% <100.00%> (-0.01%) ⬇️
presto ?
python 78.45% <100.00%> (-4.41%) ⬇️
sqlite 77.76% <100.00%> (+0.93%) ⬆️
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

__table_args__ = (
Index("ix_report_execution_log_report_schedule_id", report_schedule_id),
Index("ix_report_execution_log_start_dttm", start_dttm),
Index("ix_report_execution_log_state", state),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This idx might have the opposite effect as the optimizer can potentially leverage it first when filtering on several fields, which will basically return all rows, since the status col has low cardinality

@john-bodley
Copy link
Member

@dpgaspar do you have a sense how long this migration will take? Is there merit in adding a line item in UPDATING.md indicating that down time might be necessary?

@john-bodley john-bodley added the review:checkpoint Last PR reviewed during the daily review standup label Jan 5, 2024
@john-bodley john-bodley removed the review:checkpoint Last PR reviewed during the daily review standup label Jan 8, 2024
@dpgaspar
Copy link
Member Author

dpgaspar commented Jan 9, 2024

@dpgaspar do you have a sense how long this migration will take? Is there merit in adding a line item in UPDATING.md indicating that down time might be necessary?

Added details to the description and a line to UPDATING.md

Copy link
Member

@craig-rueda craig-rueda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dpgaspar dpgaspar added the risk:db-migration PRs that require a DB migration label Jan 12, 2024
@dpgaspar dpgaspar merged commit 2f3bb5b into apache:master Jan 15, 2024
33 checks passed
@dpgaspar dpgaspar deleted the fix/improve-reports-performance branch January 15, 2024 19:04
sadpandajoe pushed a commit to preset-io/superset that referenced this pull request Feb 14, 2024
@sadpandajoe
Copy link
Contributor

🏷️ preset:2024.8

sfirke pushed a commit to sfirke/superset that referenced this pull request Mar 22, 2024
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 4.0.0 labels Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels risk:db-migration PRs that require a DB migration size/M 🚢 4.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants