Skip to content

Conversation

@smudge
Copy link
Member

@smudge smudge commented Dec 2, 2025

This PR is the first of several query optimizations, and can largely be read in two parts:

  1. A pair of heavily-annotated migrations that add new index(es) as safely as I could muster generically (w/ timeouts and a lock retry loop). I also updated the README w/ some expanded instructions and ⚠️ safety warnings. ⚠️
  2. The resulting changes to all of the EXPLAIN snapshots, which seems to improve numerous plans without regressing any. (I haven't yet removed the old index, of course, but that's the eventual goal, once I make a few more query changes.)

(Non-AI) Summary of EXPLAIN changes:

  • worker
    • postgres: failed_at IS NULL is no loner necessary in the Filter:, and when a single queue is specified, it gets promoted to an Index Cond:
    • sqlite: SCAN ... USING INDEX either way (doesn't reveal more than that)
    • mysql: Table scan becomes Index lookup 💪 with both failed_at and queue in the index conditions! (A Sort is also completely avoided as it's able to use the pre-sorted index now.)
  • monitor
    • postgres:
      • again, failed_at is no longer necessary in Filter: clauses.
      • failed_count goes from Seq Scan to Index Only Scan 💯
      • max_lock_age and working_count go from Seq Scan to Index Scan 👍
    • sqlite:
      • for several (but not all) metrics, SCAN delayed_jobs ... becomes SCAN delayed_jobs USING INDEX ...
    • mysql:
      • count and future_count go from Table scan to Covering index scan
      • failed_count goes from Table scan to Covering index range scan
      • max_lock_age, max_age, working_count, workable_count, and alert_age_percent go from Table scan to Index lookup w/ index conditions

/no-platform

@smudge smudge requested review from effron and samandmoore December 2, 2025 00:31
@smudge smudge force-pushed the query-optimization/3 branch from a93631a to b8a76dc Compare December 2, 2025 00:39
Copy link
Contributor

@effron effron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question about the migration! Generally, is there a chance this impacts write throughput? should we prioritize removing the old indices if they are not needed anymore?

If partial indexes are supported, this adds a second
(smaller) index for the dead letter queue, and
filters failed jobs out of the primary index.

Otherwise (for MySQL), `failed_at` is included in
the primary index.

(Also: Insert more rows to see if it fixes mysql in CI)
@smudge smudge force-pushed the query-optimization/3 branch from b8a76dc to f38e933 Compare December 2, 2025 18:19
@smudge smudge requested a review from effron December 2, 2025 18:20
Copy link
Contributor

@effron effron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

domainLGTM!

@smudge smudge merged commit 17be23f into Betterment:main Dec 2, 2025
25 checks passed
@smudge smudge deleted the query-optimization/3 branch December 2, 2025 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants