[Perf] Feed query: LATERAL + OFFSET 0 fence + per-followee cap#798
Merged
raymondjacobson merged 1 commit intomainfrom May 9, 2026
Merged
[Perf] Feed query: LATERAL + OFFSET 0 fence + per-followee cap#798raymondjacobson merged 1 commit intomainfrom
raymondjacobson merged 1 commit intomainfrom
Conversation
The feed query had a ~10x planner-cliff for some users: identical SQL took 125ms for one user with 1752 follows but 9-18s for another user with 1816 follows. Cause: stale n_distinct stats on follows.follower_user_id make Postgres estimate follow_set at ~17,290 rows when actual is <2,000 — for the unlucky users it flips from a sane nested-loop plan to "materialize all 2M reposts of the past year, merge-join, then hash-join 1.4M tracks." Three changes hold the planner to nested-loop semantics: 1. follow_set CTE marked MATERIALIZED so its row count is fixed downstream rather than re-estimated through inlining. 2. Each branch joins follow_set via CROSS JOIN LATERAL with an OFFSET 0 fence inside the lateral subquery — this is the well- known optimization barrier that prevents Postgres from flattening the lateral back into a merge-join. 3. Per-followee LIMIT 100 (50 for owned playlists) caps the cost for users whose followees are very active. The outer query takes only the top-@limit by created_at, so reposts/tracks past the per-followee top-100 can never reach the response anyway. Verified end-to-end against the prod read replica via local server: user 20 (1752 follows) 500-750ms -> 280-300ms warm user 222 (1820 follows) ~4.5s -> 1.3-1.5s (3x) user 755516 (1816 follows) 9-18s -> 640-700ms (~20x) Existing TestUsersFeed regression covers the entity-type branches and the no-followees empty case; full ./api/... suite is green.
3 tasks
raymondjacobson
added a commit
that referenced
this pull request
May 9, 2026
…800) Two independent fixes for the API's two hottest queries (GetUsers ~287M calls and GetTracks ~268M calls in `pg_stat_statements`). ## 1. GetUsers — rewrite `current_user_followee_follow_count` Signed-in `GetUsers` was **700× slower than unsigned** (2-3 s vs 4 ms for 20 users). Profiling each personalization subquery in isolation: | Subquery (myId=20, 20 users) | Mean | |---|---| | `does_current_user_follow` | 0.3 ms | | `does_follow_current_user` | 2.2 ms | | `does_current_user_subscribe` | 2.2 ms | | `artist_coin_badge` | 0.8 ms | | **`current_user_followee_follow_count`** | **2,246 ms** ← entire delta | The old shape let Postgres pick a Merge Join that walked the **full follower list of the target user** — 492 k - 1.9 M rows for popular users like @audius — just to intersect with my ~1,752 followees. The rewrite drives the loop from "my followees" (always small — at most a few thousand) and probes whether each follows the target. The `LIMIT 1 OFFSET 0` inside the `EXISTS` is the same optimization fence used by #798 (feed): it pins the planner to nested-loop semantics so the plan never flips back to merge join. ### Verified on prod read replica Full `GetUsers`, 20 popular target users, three warm runs each: | Scenario | Before | After | Δ | |---|---|---|---| | myId=0 (unsigned) | 4 ms | 2 ms | sanity / unchanged | | myId=20 (1752 follows) | 2 - 3 s | **127-155 ms** | **~15-20×** | | myId=755516 (1816 follows) | 2.5 s | **142-157 ms** | **~15-18×** | End-to-end through local server: `/v1/full/users/handle/audius?user_id=Wem1e` (target has 1.95 M followers) → **60-85 ms warm**. Response shape unchanged; `current_user_followee_follow_count` returns the same count as before. ## 2. GetTracks — partial index for `album_backlink` The `album_backlink` subquery does ~200 random `playlists_pkey` lookups per popular track to filter for `is_album = true AND is_delete = false AND is_current = true`. **~99.98 % of those lookups get rejected by the filter** — for 50 popular tracks that's 10,115 heap probes returning 1-2 actual matches. The partial index covers only published-album playlists, so non-album lookups skip the heap entirely — the planner sees no row at the index level and moves on without fetching the page. ```sql create index concurrently if not exists idx_playlists_albums_published on playlists (playlist_id) where is_album = true and is_delete = false and is_current = true; ``` - Size: ~55,671 album rows × ~12 bytes ≈ **700 KB**. - Built `CONCURRENTLY` so no `ACCESS EXCLUSIVE` lock — follows the pattern from #196 (the migration whose comment explains the prior 0195 outage). - Expected: GetTracks `album_backlink` portion drops from ~38 ms (50 popular tracks, warm) to ~10-15 ms — most of GetTracks's "always-on" cost. ## Risk - **GetUsers rewrite is semantically identical.** Same `count(*)` over the same intersection, just with the join driven from the small side. Existing user tests (TestV1UsersRelated, TestUsersFeed, etc.) pass; full `./api/...` suite is green. - **Partial index is additive.** No existing query plan can regress — the planner picks it over `playlists_pkey` only when its `WHERE` clause is satisfied (i.e. the lookup is for an album row). ## Test plan - [x] `go test -count=1 ./api/...` (full suite, all green) - [x] EXPLAIN ANALYZE on prod read replica across three myId regimes (unsigned, mid follows, heavy follows) - [x] Local server smoke test: `/v1/full/users/handle/audius?user_id=Wem1e` returns identical response shape, ~70 ms warm 🤖 Generated with [Claude Code](https://claude.com/claude-code)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix the planner-cliff in
/v1/users/:userId/feedthat makes the same query take 125 ms for one user and 9-18 s for another with nearly identical follow counts.Three changes pin the planner to nested-loop semantics:
follow_setCTE →MATERIALIZED(row count fixed for downstream planning).CROSS JOIN LATERALwith anOFFSET 0optimization fence inside the lateral subquery — prevents Postgres from flattening the lateral back into a merge-join.LIMIT 100(50 for owned playlists) caps cost for users whose followees are very active. The outer query takes only the top-@limitbycreated_at, so older entries past the per-followee top-100 can never reach the response.Why
/v1/users/:userId/feedis the worst signed-in endpoint by p95 in Axiom. Per the post-merge histogram (13 hours, 5,831 calls), 60 % of requests took >2 s, 23 % took >5 s, and 137 took >10 s.EXPLAIN on prod replica showed the smoking gun — three users with similar follow counts produce completely different plans:
follow_setis estimated at 17,290 rows but actually returns 1,816. That bad cardinality flips the join strategy.Impact
End-to-end on local server pointed at prod replica:
EXPLAIN-only DB time on the same users (warm cache):
Risk
created_at DESC LIMIT @limit(max 100), so older reposts past the per-followee top-100 cannot reach the rendered response. Equivalent for owned tracks (LIMIT 100) and owned playlists (LIMIT 50) — virtually no artist publishes more than that per year.MATERIALIZEDandOFFSET 0are planner directives, not logic changes. The shape ofhistoryand the outer aggregation/sort/limit are unchanged.TestUsersFeedcovers the four entity branches plus the no-followees empty case. Full./api/...suite is green.Test plan
go test -count=1 ./api/...(full suite, all green)🤖 Generated with Claude Code