Skip to content

perf(api): revert TrendingJob to bulk DELETE+INSERT score rewrite#894

Merged
raymondjacobson merged 1 commit into
mainfrom
api/trending-bulk-rewrite
Jun 2, 2026
Merged

perf(api): revert TrendingJob to bulk DELETE+INSERT score rewrite#894
raymondjacobson merged 1 commit into
mainfrom
api/trending-bulk-rewrite

Conversation

@raymondjacobson
Copy link
Copy Markdown
Member

Summary

  • perf(api): stop TrendingJob from continuously rewriting score tables #890 swapped the trending score-table rewrite for a temp-table stage + INSERT ... ON CONFLICT DO UPDATE (skip-unchanged) + anti-join prune, to avoid rewriting the ~16GB *_trending_scores tables every run.
  • In prod that upsert does one unique-index probe + IS DISTINCT FROM comparison per row over ~4M rows. It took 40+ minutes and never finished inside the hourly window — so trending scores went stale (observed 12–18h old) and isRunning blocked every subsequent tick.
  • This reverts computeTrendingTracks / computeTrendingPlaylists to the strategy discovery's Python ran for years: within the single transaction already opened, DELETE the (type, version) slice then repopulate with plain bulk INSERTs.

Why this is correct

  • No empty-table window: the DELETE and INSERTs are in one transaction, so the DELETE isn't visible to readers until COMMIT — exactly how Python's update_track_score_query behaved.
  • Scores unchanged: the score expressions are byte-for-byte identical to before (and to apps' pnagD/AnlGe strategies); only bind-parameter numbering changed because type/version/created_at moved back into the SELECT.
  • Concurrent matview refresh from perf(api): stop TrendingJob from continuously rewriting score tables #890 is kept — that part was a genuine improvement over Python (which refreshed non-concurrently).

Tradeoff

The bulk rewrite produces more dead tuples per run than the skip-unchanged upsert. Autovacuum handled this under Python for years. The alternative — a job that never completes — is strictly worse.

Evidence

  • Live prod query showed the upsert INSERT INTO track_trending_scores active for 2650s (44 min) on IO/DataFileRead, not blocked, still on step 1 of 4.
  • track_trending_scores max created_at frozen 12–18h while the pod had been up >1h and the matview refreshes succeeded.

Test plan

  • go build ./jobs/ clean
  • gofmt clean
  • go test ./jobs/ -run Trending passes, incl. TestTrendingJob_PrunesStaleRows (deleted tracks still drop out, now via the full DELETE)
  • After deploy: confirm max(created_at) in track_trending_scores advances each cycle and a full cycle completes in seconds

🤖 Generated with Claude Code

#890 replaced the score-table rewrite with a temp-table stage +
INSERT ... ON CONFLICT DO UPDATE (skip-unchanged) + anti-join prune,
to avoid rewriting the ~16GB *_trending_scores tables every run. In
prod that upsert does one unique-index probe + IS DISTINCT FROM compare
per row over ~4M rows and took 40+ minutes — it never finished inside
the hourly window, so trending scores went stale (observed 12-18h old)
and isRunning blocked subsequent ticks.

Revert computeTrendingTracks / computeTrendingPlaylists to the strategy
discovery's Python ran for years: within the single transaction already
opened, DELETE the (type, version) slice then repopulate with plain bulk
INSERTs. Readers never see an empty table (DELETE isn't visible until
COMMIT). Score expressions are byte-for-byte unchanged; only bind params
were renumbered. The concurrent matview refresh from #890 is kept.

Tradeoff: more dead tuples per run (autovacuum copes, as under Python)
in exchange for a job that completes in seconds instead of never.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@raymondjacobson raymondjacobson merged commit 63eb819 into main Jun 2, 2026
5 checks passed
@raymondjacobson raymondjacobson deleted the api/trending-bulk-rewrite branch June 2, 2026 02:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant