[Perf] Feed query: LATERAL + OFFSET 0 fence + per-followee cap by raymondjacobson · Pull Request #798 · AudiusProject/api

raymondjacobson · 2026-05-08T23:48:08Z

Summary

Fix the planner-cliff in /v1/users/:userId/feed that makes the same query take 125 ms for one user and 9-18 s for another with nearly identical follow counts.

Three changes pin the planner to nested-loop semantics:

follow_set CTE → MATERIALIZED (row count fixed for downstream planning).
Each UNION branch uses CROSS JOIN LATERAL with an OFFSET 0 optimization fence inside the lateral subquery — prevents Postgres from flattening the lateral back into a merge-join.
Per-followee LIMIT 100 (50 for owned playlists) caps cost for users whose followees are very active. The outer query takes only the top-@limit by created_at, so older entries past the per-followee top-100 can never reach the response.

Why

/v1/users/:userId/feed is the worst signed-in endpoint by p95 in Axiom. Per the post-merge histogram (13 hours, 5,831 calls), 60 % of requests took >2 s, 23 % took >5 s, and 137 took >10 s.

EXPLAIN on prod replica showed the smoking gun — three users with similar follow counts produce completely different plans:

User	follows	Plan	Time
20 (Phuture)	1752	nested-loop	125 ms
222	1820	nested-loop, lots of data	4.5 s
755516	1816	merge join, materialize all 2M reposts of last year + hash-join 1.4M tracks	9-18 s

follow_set is estimated at 17,290 rows but actually returns 1,816. That bad cardinality flips the join strategy.

Impact

End-to-end on local server pointed at prod replica:

User	Before this PR	After
20 (1752 follows)	500-750 ms	280-300 ms
222 (1820 follows)	~4.5 s	1.3-1.5 s (3×)
755516 (1816 follows)	9-18 s	640-700 ms (~20×)

EXPLAIN-only DB time on the same users (warm cache):

User	Before	After
20	113 ms	105 ms
222	4.4 s	1.5 s
755516	9-18 s	700 ms

Risk

Per-followee LIMIT 100 trims the input set. A followee who reposts >100 things in the past year contributes only their top-100 most recent. The outer query orders by created_at DESC LIMIT @limit (max 100), so older reposts past the per-followee top-100 cannot reach the rendered response. Equivalent for owned tracks (LIMIT 100) and owned playlists (LIMIT 50) — virtually no artist publishes more than that per year.
MATERIALIZED and OFFSET 0 are planner directives, not logic changes. The shape of history and the outer aggregation/sort/limit are unchanged.
Existing TestUsersFeed covers the four entity branches plus the no-followees empty case. Full ./api/... suite is green.

Test plan

go test -count=1 ./api/... (full suite, all green)
EXPLAIN ANALYZE on prod read replica across four users (1545-1820 follows) confirms ~3-25× speedup on the slow cohort with no regression on the lucky cohort
Local server end-to-end timings confirm the same shape as the EXPLAIN data

🤖 Generated with Claude Code

@limit

The feed query had a ~10x planner-cliff for some users: identical SQL took 125ms for one user with 1752 follows but 9-18s for another user with 1816 follows. Cause: stale n_distinct stats on follows.follower_user_id make Postgres estimate follow_set at ~17,290 rows when actual is <2,000 — for the unlucky users it flips from a sane nested-loop plan to "materialize all 2M reposts of the past year, merge-join, then hash-join 1.4M tracks." Three changes hold the planner to nested-loop semantics: 1. follow_set CTE marked MATERIALIZED so its row count is fixed downstream rather than re-estimated through inlining. 2. Each branch joins follow_set via CROSS JOIN LATERAL with an OFFSET 0 fence inside the lateral subquery — this is the well- known optimization barrier that prevents Postgres from flattening the lateral back into a merge-join. 3. Per-followee LIMIT 100 (50 for owned playlists) caps the cost for users whose followees are very active. The outer query takes only the top-@limit by created_at, so reposts/tracks past the per-followee top-100 can never reach the response anyway. Verified end-to-end against the prod read replica via local server: user 20 (1752 follows) 500-750ms -> 280-300ms warm user 222 (1820 follows) ~4.5s -> 1.3-1.5s (3x) user 755516 (1816 follows) 9-18s -> 640-700ms (~20x) Existing TestUsersFeed regression covers the entity-type branches and the no-followees empty case; full ./api/... suite is green.

…800) Two independent fixes for the API's two hottest queries (GetUsers ~287M calls and GetTracks ~268M calls in `pg_stat_statements`). ## 1. GetUsers — rewrite `current_user_followee_follow_count` Signed-in `GetUsers` was **700× slower than unsigned** (2-3 s vs 4 ms for 20 users). Profiling each personalization subquery in isolation: | Subquery (myId=20, 20 users) | Mean | |---|---| | `does_current_user_follow` | 0.3 ms | | `does_follow_current_user` | 2.2 ms | | `does_current_user_subscribe` | 2.2 ms | | `artist_coin_badge` | 0.8 ms | | **`current_user_followee_follow_count`** | **2,246 ms** ← entire delta | The old shape let Postgres pick a Merge Join that walked the **full follower list of the target user** — 492 k - 1.9 M rows for popular users like @audius — just to intersect with my ~1,752 followees. The rewrite drives the loop from "my followees" (always small — at most a few thousand) and probes whether each follows the target. The `LIMIT 1 OFFSET 0` inside the `EXISTS` is the same optimization fence used by #798 (feed): it pins the planner to nested-loop semantics so the plan never flips back to merge join. ### Verified on prod read replica Full `GetUsers`, 20 popular target users, three warm runs each: | Scenario | Before | After | Δ | |---|---|---|---| | myId=0 (unsigned) | 4 ms | 2 ms | sanity / unchanged | | myId=20 (1752 follows) | 2 - 3 s | **127-155 ms** | **~15-20×** | | myId=755516 (1816 follows) | 2.5 s | **142-157 ms** | **~15-18×** | End-to-end through local server: `/v1/full/users/handle/audius?user_id=Wem1e` (target has 1.95 M followers) → **60-85 ms warm**. Response shape unchanged; `current_user_followee_follow_count` returns the same count as before. ## 2. GetTracks — partial index for `album_backlink` The `album_backlink` subquery does ~200 random `playlists_pkey` lookups per popular track to filter for `is_album = true AND is_delete = false AND is_current = true`. **~99.98 % of those lookups get rejected by the filter** — for 50 popular tracks that's 10,115 heap probes returning 1-2 actual matches. The partial index covers only published-album playlists, so non-album lookups skip the heap entirely — the planner sees no row at the index level and moves on without fetching the page. ```sql create index concurrently if not exists idx_playlists_albums_published on playlists (playlist_id) where is_album = true and is_delete = false and is_current = true; ``` - Size: ~55,671 album rows × ~12 bytes ≈ **700 KB**. - Built `CONCURRENTLY` so no `ACCESS EXCLUSIVE` lock — follows the pattern from #196 (the migration whose comment explains the prior 0195 outage). - Expected: GetTracks `album_backlink` portion drops from ~38 ms (50 popular tracks, warm) to ~10-15 ms — most of GetTracks's "always-on" cost. ## Risk - **GetUsers rewrite is semantically identical.** Same `count(*)` over the same intersection, just with the join driven from the small side. Existing user tests (TestV1UsersRelated, TestUsersFeed, etc.) pass; full `./api/...` suite is green. - **Partial index is additive.** No existing query plan can regress — the planner picks it over `playlists_pkey` only when its `WHERE` clause is satisfied (i.e. the lookup is for an album row). ## Test plan - [x] `go test -count=1 ./api/...` (full suite, all green) - [x] EXPLAIN ANALYZE on prod read replica across three myId regimes (unsigned, mid follows, heavy follows) - [x] Local server smoke test: `/v1/full/users/handle/audius?user_id=Wem1e` returns identical response shape, ~70 ms warm 🤖 Generated with [Claude Code](https://claude.com/claude-code)

raymondjacobson merged commit 3f4fc78 into main May 9, 2026
5 checks passed

raymondjacobson deleted the ray/perf-feed-lateral-fence branch May 9, 2026 00:03

raymondjacobson mentioned this pull request May 9, 2026

[Perf] GetUsers triangle rewrite + partial album index for GetTracks #800

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Perf] Feed query: LATERAL + OFFSET 0 fence + per-followee cap#798

[Perf] Feed query: LATERAL + OFFSET 0 fence + per-followee cap#798
raymondjacobson merged 1 commit intomainfrom
ray/perf-feed-lateral-fence

raymondjacobson commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raymondjacobson commented May 8, 2026

Summary

Why

Impact

Risk

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant