fix: improve performance on reports log queries #26416

dpgaspar · 2024-01-05T16:25:18Z

SUMMARY

Add indexes to alerts and reports logs to improve REST API performance

Using test data with 100 reports with 10K report logs each spread out by 100 days. total 1M logs. Globally on 100 reports we have 10K logs per day.

Current PG explain for:

explain analyse SELECT report_execution_log.id AS report_execution_log_id, report_execution_log.uuid AS report_execution_log_uuid, report_execution_log.scheduled_dttm AS report_execution_log_scheduled_dttm, report_execution_log.start_dttm AS report_execution_log_start_dttm, report_execution_log.end_dttm AS report_execution_log_end_dttm, report_execution_log.value AS report_execution_log_value, report_execution_log.value_row_json AS report_execution_log_value_row_json, report_execution_log.state AS report_execution_log_state, report_execution_log.error_message AS report_execution_log_error_message
FROM report_execution_log
WHERE report_execution_log.start_dttm > '2023-04-01T00:00:00.000000'::timestamp AND report_execution_log.state != 'Working' AND 13 = report_execution_log.report_schedule_id ORDER BY report_execution_log.start_dttm DESC
 LIMIT 100 OFFSET 100

Limit  (cost=17645.78..17657.45 rows=100 width=124) (actual time=286.918..303.029 rows=100 loops=1)
  ->  Gather Merge  (cost=17634.11..17736.55 rows=878 width=124) (actual time=286.798..302.946 rows=200 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Sort  (cost=16634.09..16635.18 rows=439 width=124) (actual time=250.493..250.548 rows=67 loops=3)
              Sort Key: start_dttm DESC
              Sort Method: top-N heapsort  Memory: 50kB
              Worker 0:  Sort Method: quicksort  Memory: 25kB
              Worker 1:  Sort Method: quicksort  Memory: 25kB
              ->  Parallel Seq Scan on report_execution_log  (cost=0.00..16615.11 rows=439 width=124) (actual time=154.762..248.396 rows=333 loops=3)
                    Filter: ((start_dttm > '2023-04-01 00:00:00'::timestamp without time zone) AND ((state)::text <> 'Working'::text) AND (13 = report_schedule_id))
                    Rows Removed by Filter: 333000
Planning Time: 0.654 ms
Execution Time: 303.230 ms

Other simpler example:

explain analyse SELECT report_execution_log.id AS report_execution_log_id, report_execution_log.uuid AS report_execution_log_uuid, report_execution_log.scheduled_dttm AS report_execution_log_scheduled_dttm, report_execution_log.start_dttm AS report_execution_log_start_dttm, report_execution_log.end_dttm AS report_execution_log_end_dttm, report_execution_log.value AS report_execution_log_value, report_execution_log.value_row_json AS report_execution_log_value_row_json, report_execution_log.state AS report_execution_log_state, report_execution_log.error_message AS report_execution_log_error_message
FROM report_execution_log 
WHERE report_execution_log.report_schedule_id = 13
 LIMIT 100 OFFSET 100

Limit  (cost=202.54..405.08 rows=100 width=124) (actual time=1.658..1.836 rows=100 loops=1)
  ->  Seq Scan on report_execution_log  (cost=0.00..21807.34 rows=10767 width=124) (actual time=1.488..1.727 rows=200 loops=1)
        Filter: (report_schedule_id = 13)
        Rows Removed by Filter: 1440
Planning Time: 1.900 ms
Execution Time: 2.560 ms

After creating the indexes

explain analyse SELECT report_execution_log.id AS report_execution_log_id, report_execution_log.uuid AS report_execution_log_uuid, report_execution_log.scheduled_dttm AS report_execution_log_scheduled_dttm, report_execution_log.start_dttm AS report_execution_log_start_dttm, report_execution_log.end_dttm AS report_execution_log_end_dttm, report_execution_log.value AS report_execution_log_value, report_execution_log.value_row_json AS report_execution_log_value_row_json, report_execution_log.state AS report_execution_log_state, report_execution_log.error_message AS report_execution_log_error_message
FROM report_execution_log
WHERE report_execution_log.start_dttm > '2023-04-01T00:00:00.000000'::timestamp AND report_execution_log.state != 'Working' AND 13 = report_execution_log.report_schedule_id ORDER BY report_execution_log.start_dttm DESC
 LIMIT 100 OFFSET 100

Limit  (cost=433.36..433.61 rows=100 width=124) (actual time=17.852..17.937 rows=100 loops=1)
  ->  Sort  (cost=433.11..435.75 rows=1057 width=124) (actual time=17.790..17.857 rows=200 loops=1)
        Sort Key: start_dttm DESC
        Sort Method: top-N heapsort  Memory: 50kB
        ->  Index Scan using ix_report_execution_log_report_schedule_id on report_execution_log  (cost=0.42..387.43 rows=1057 width=124) (actual time=0.238..15.534 rows=1000 loops=1)
              Index Cond: (report_schedule_id = 13)
              Filter: ((start_dttm > '2023-04-01 00:00:00'::timestamp without time zone) AND ((state)::text <> 'Working'::text))
              Rows Removed by Filter: 9000
Planning Time: 1.083 ms
Execution Time: 18.130 ms

On the simpler example:

explain analyse SELECT report_execution_log.id AS report_execution_log_id, report_execution_log.uuid AS report_execution_log_uuid, report_execution_log.scheduled_dttm AS report_execution_log_scheduled_dttm, report_execution_log.start_dttm AS report_execution_log_start_dttm, report_execution_log.end_dttm AS report_execution_log_end_dttm, report_execution_log.value AS report_execution_log_value, report_execution_log.value_row_json AS report_execution_log_value_row_json, report_execution_log.state AS report_execution_log_state, report_execution_log.error_message AS report_execution_log_error_message
FROM report_execution_log 
WHERE report_execution_log.report_schedule_id = 13
 LIMIT 100 OFFSET 100

Limit  (cost=3.51..6.59 rows=100 width=124) (actual time=0.352..0.606 rows=100 loops=1)
  ->  Index Scan using ix_report_execution_log_report_schedule_id on report_execution_log  (cost=0.42..333.43 rows=10800 width=124) (actual time=0.123..0.505 rows=200 loops=1)
        Index Cond: (report_schedule_id = 13)
Planning Time: 0.480 ms
Execution Time: 0.761 ms

Observed improvement:

Using start_dttm and report_schedule_id filters: ~16x
Using just a report_schedule_id filter: ~3x

Locally upgrading took:

flask db upgrade  2.46s user 2.26s system 29% cpu 16.261 total

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

codecov · 2024-01-05T16:30:59Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (3a7d76c) 69.15% compared to head (1ab8165) 67.04%.
Report is 16 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #26416      +/-   ##
==========================================
- Coverage   69.15%   67.04%   -2.12%     
==========================================
  Files        1947     1947              
  Lines       76010    76032      +22     
  Branches     8481     8481              
==========================================
- Hits        52565    50974    -1591     
- Misses      21264    22877    +1613     
  Partials     2181     2181

Flag	Coverage Δ
hive	`53.68% <100.00%> (+<0.01%)`	⬆️
mysql	`78.05% <100.00%> (-0.03%)`	⬇️
postgres	`78.17% <100.00%> (-0.01%)`	⬇️
presto	`?`
python	`78.45% <100.00%> (-4.41%)`	⬇️
sqlite	`77.76% <100.00%> (+0.93%)`	⬆️
unit	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

craig-rueda · 2024-01-05T16:39:38Z

superset/reports/models.py

+    __table_args__ = (
+        Index("ix_report_execution_log_report_schedule_id", report_schedule_id),
+        Index("ix_report_execution_log_start_dttm", start_dttm),
+        Index("ix_report_execution_log_state", state),


This idx might have the opposite effect as the optimizer can potentially leverage it first when filtering on several fields, which will basically return all rows, since the status col has low cardinality

john-bodley · 2024-01-05T18:23:55Z

@dpgaspar do you have a sense how long this migration will take? Is there merit in adding a line item in UPDATING.md indicating that down time might be necessary?

dpgaspar · 2024-01-09T16:42:23Z

@dpgaspar do you have a sense how long this migration will take? Is there merit in adding a line item in UPDATING.md indicating that down time might be necessary?

Added details to the description and a line to UPDATING.md

craig-rueda

LGTM

(cherry picked from commit 2f3bb5b)

sadpandajoe · 2024-02-14T19:09:56Z

🏷️ preset:2024.8

dpgaspar added 2 commits January 5, 2024 16:23

fix: improve performance on reports log queries

1c30c43

remove EOL

ec3365c

dpgaspar requested a review from a team as a code owner January 5, 2024 16:25

pull-request-size bot added the size/M label Jan 5, 2024

geido approved these changes Jan 5, 2024

View reviewed changes

craig-rueda reviewed Jan 5, 2024

View reviewed changes

removing state index

41a5f31

john-bodley added the review:checkpoint Last PR reviewed during the daily review standup label Jan 5, 2024

dpgaspar requested a review from craig-rueda January 6, 2024 18:01

john-bodley removed the review:checkpoint Last PR reviewed during the daily review standup label Jan 8, 2024

add updating

1ab8165

dpgaspar requested a review from john-bodley January 9, 2024 16:41

craig-rueda approved these changes Jan 10, 2024

View reviewed changes

dpgaspar added the risk:db-migration PRs that require a DB migration label Jan 12, 2024

dpgaspar merged commit 2f3bb5b into apache:master Jan 15, 2024
33 checks passed

dpgaspar deleted the fix/improve-reports-performance branch January 15, 2024 19:04

sadpandajoe pushed a commit to preset-io/superset that referenced this pull request Feb 14, 2024

fix: improve performance on reports log queries (apache#26416)

cbd4d85

(cherry picked from commit 2f3bb5b)

sfirke pushed a commit to sfirke/superset that referenced this pull request Mar 22, 2024

fix: improve performance on reports log queries (apache#26416)

7077931

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 4.0.0 labels Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve performance on reports log queries #26416

fix: improve performance on reports log queries #26416

dpgaspar commented Jan 5, 2024 •

edited

Loading

codecov bot commented Jan 5, 2024 •

edited

Loading

craig-rueda Jan 5, 2024

john-bodley commented Jan 5, 2024

dpgaspar commented Jan 9, 2024

craig-rueda left a comment

sadpandajoe commented Feb 14, 2024

fix: improve performance on reports log queries #26416

fix: improve performance on reports log queries #26416

Conversation

dpgaspar commented Jan 5, 2024 • edited Loading

SUMMARY

After creating the indexes

Locally upgrading took:

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

codecov bot commented Jan 5, 2024 • edited Loading

Codecov Report

craig-rueda Jan 5, 2024

Choose a reason for hiding this comment

john-bodley commented Jan 5, 2024

dpgaspar commented Jan 9, 2024

craig-rueda left a comment

Choose a reason for hiding this comment

sadpandajoe commented Feb 14, 2024

dpgaspar commented Jan 5, 2024 •

edited

Loading

codecov bot commented Jan 5, 2024 •

edited

Loading