Stream Mattermost channels & members instead of paginating#776
Conversation
The /api/mattermost/channels and /api/mattermost/channels/unreads route handlers fanned out 5+ paginated upstream calls per request (channels, members, categories, preferences, threads, plus an admin lookup), each with up to 3 pages × 200 items. Under traffic spikes this fan-out chewed through the Node.js event loop and competed with SSR for CPU, contributing to in-band /api/healthcheck timeouts that triggered Swarm to kill replicas. Switch to the streaming forms the official Mattermost webapp uses: - GET /users/me/channels (no page params): server streams the full channel list in a single response. Replaces fetchAllChannelPages's 3-page loop. - GET /users/me/channel_members?page=-1: NDJSON streaming mode that returns every member row in one response. Replaces fetchAllChannelMemberPages's per-team paginated loop. Adds an mmFetchNdjson helper in server/mattermost.ts to parse line-delimited JSON with the same timeout/abort semantics as mmFetch. Threads still paginate (max 2 pages) since the threads endpoint has no streaming form and threads are bounded by user activity. The truncated flag in the /unreads response narrows in meaning: it now only flips to true when a user has more than 400 active threads, never from channel/member pagination caps. The websocket optimistic-update path in mattermost-websocket.ts already handles a missing target channel gracefully, so this is backwards compatible. Verified against chat.ecency.com 10.11 with @Good-Karma's PAT: - /channels: byte-for-byte identical response (23 channels, 17,934 b) - /unreads: identical (totalUnread=65, totalMentions=1, totalDMs=0, totalThreads=0, truncated=false) Side effect: the previous paginated form's ?page=N&per_page=200 against /users/me/channels exhibited cumulative duplicate rows for cross-team admin accounts, doubling totalUnread. The streaming form returns each membership exactly once, so totals are now correct for those edge cases. Regular single-team users (the typical case) see no change.
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughConsolidates Mattermost channel and member fetching by adding mmFetchRaw/mmFetchNdjson utilities, switching channel/member helpers to single-request and NDJSON streaming endpoints, updating routes, tightening thread-unread pagination reporting, and adding timeouts to several upstream fetches. ChangesMattermost Channel & Member Fetch Refactoring
Sequence Diagram(s)sequenceDiagram
participant Client
participant Route
participant fetchHelpers
participant MattermostAPI
Client->>Route: request unreads / channels
Route->>fetchHelpers: fetchAllChannelPages(), fetchAllChannelMemberPages(token)
fetchHelpers->>MattermostAPI: /users/me/channels (single request)
fetchHelpers->>MattermostAPI: /users/me/channel_members?page=-1 (NDJSON)
MattermostAPI-->>fetchHelpers: JSON / NDJSON responses
fetchHelpers-->>Route: combined results
Route-->>Client: aggregated response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
apps/web/src/server/mattermost.ts (1)
420-479: ⚡ Quick winExtract a shared
mmFetchRawprimitive to avoid duplicating the timeout/abort/error-handling boilerplate.
mmFetchNdjsonrepeats ~25 lines frommmFetchverbatim — base-URL resolution,AbortSignal.anycomposition, thefetch(...)call, the non-OK error block, and theTimeoutErrorrethrow. The only real difference is how the response body is parsed. A single internal helper that returnsstring | nulllets both callers focus on their parsing logic, and ensures any future change (tracing, metric instrumentation, error mapping) only needs to happen once.♻️ Proposed refactor
+async function mmFetchRaw(path: string, init?: RequestInit): Promise<string | null> { + const base = requireEnv(MATTERMOST_BASE_URL, "MATTERMOST_BASE_URL"); + const timeoutSignal = AbortSignal.timeout(MM_FETCH_TIMEOUT_MS); + const signal = init?.signal + ? AbortSignal.any([init.signal, timeoutSignal]) + : timeoutSignal; + try { + const res = await fetch(`${base}${path}`, { + ...init, + headers: { ...(init?.headers || {}), Accept: "application/json" }, + signal + }); + if (!res.ok) { + const text = await res.text(); + throw new MattermostError(`Mattermost request failed (${res.status}): ${text}`, res.status); + } + const text = await res.text(); + return text || null; + } catch (err) { + if (err instanceof Error && err.name === "TimeoutError") { + throw new MattermostError( + `Mattermost request timed out after ${MM_FETCH_TIMEOUT_MS}ms (${path})`, + 504 + ); + } + throw err; + } +} async function mmFetch<T>(path: string, init?: RequestInit): Promise<T> { - const base = requireEnv(MATTERMOST_BASE_URL, "MATTERMOST_BASE_URL"); - const timeoutSignal = AbortSignal.timeout(MM_FETCH_TIMEOUT_MS); - const signal = init?.signal - ? AbortSignal.any([init.signal, timeoutSignal]) - : timeoutSignal; - try { - const res = await fetch(`${base}${path}`, { ...init, headers: { ...(init?.headers || {}), Accept: "application/json" }, signal }); - if (!res.ok) { - const text = await res.text(); - throw new MattermostError(`Mattermost request failed (${res.status}): ${text}`, res.status); - } - const text = await res.text(); - if (!text) return undefined as T; - return JSON.parse(text) as T; - } catch (err) { - if (err instanceof Error && err.name === "TimeoutError") { - throw new MattermostError(`Mattermost request timed out after ${MM_FETCH_TIMEOUT_MS}ms (${path})`, 504); - } - throw err; - } + const text = await mmFetchRaw(path, init); + if (text === null) return undefined as T; + return JSON.parse(text) as T; } async function mmFetchNdjson<T>(path: string, init?: RequestInit): Promise<T[]> { - const base = requireEnv(MATTERMOST_BASE_URL, "MATTERMOST_BASE_URL"); - const timeoutSignal = AbortSignal.timeout(MM_FETCH_TIMEOUT_MS); - const signal = init?.signal - ? AbortSignal.any([init.signal, timeoutSignal]) - : timeoutSignal; - try { - const res = await fetch(`${base}${path}`, { ...init, headers: { ...(init?.headers || {}), Accept: "application/json" }, signal }); - if (!res.ok) { - const text = await res.text(); - throw new MattermostError(`Mattermost request failed (${res.status}): ${text}`, res.status); - } - const text = await res.text(); - if (!text) return []; + const text = await mmFetchRaw(path, init); + if (!text) return []; const results: T[] = []; for (const line of text.split("\n")) { const trimmed = line.trim(); if (!trimmed) continue; try { results.push(JSON.parse(trimmed) as T); } catch { // Tolerate a malformed final line (partial chunk at EOS). } } return results; - } catch (err) { - if (err instanceof Error && err.name === "TimeoutError") { - throw new MattermostError(`Mattermost request timed out after ${MM_FETCH_TIMEOUT_MS}ms (${path})`, 504); - } - throw err; - } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/web/src/server/mattermost.ts` around lines 420 - 479, Extract a shared helper (e.g., mmFetchRaw) that centralizes base URL resolution (requireEnv(MATTERMOST_BASE_URL,...)), AbortSignal.any composition with MM_FETCH_TIMEOUT_MS, the fetch(...) invocation, non-OK response handling that throws MattermostError, and the TimeoutError mapping, and have mmFetchNdjson and mmUserFetchNdjson call this helper and only perform response parsing; replace duplicated logic in mmFetchNdjson with a call to mmFetchRaw(path, init) which returns the response text or null, and update mmUserFetchNdjson to pass the Authorization header into mmFetchRaw so both functions keep only parsing-specific behavior while all timeout/abort/error boilerplate lives in mmFetchRaw.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@apps/web/src/server/mattermost.ts`:
- Around line 420-479: Extract a shared helper (e.g., mmFetchRaw) that
centralizes base URL resolution (requireEnv(MATTERMOST_BASE_URL,...)),
AbortSignal.any composition with MM_FETCH_TIMEOUT_MS, the fetch(...) invocation,
non-OK response handling that throws MattermostError, and the TimeoutError
mapping, and have mmFetchNdjson and mmUserFetchNdjson call this helper and only
perform response parsing; replace duplicated logic in mmFetchNdjson with a call
to mmFetchRaw(path, init) which returns the response text or null, and update
mmUserFetchNdjson to pass the Authorization header into mmFetchRaw so both
functions keep only parsing-specific behavior while all timeout/abort/error
boilerplate lives in mmFetchRaw.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2dec7ebf-0068-49ba-9671-b0d3fa584485
📒 Files selected for processing (4)
apps/web/src/app/api/mattermost/channels/helpers.tsapps/web/src/app/api/mattermost/channels/route.tsapps/web/src/app/api/mattermost/channels/unreads/route.tsapps/web/src/server/mattermost.ts
mmFetch and mmFetchNdjson both did the same setup (base URL, timeout signal merge with caller AbortSignal, fetch, non-OK -> MattermostError, TimeoutError -> 504 mapping) and only differed in how they parsed the response body. Extract that boilerplate into a private mmFetchRaw that returns the response body text; have both wrappers call it and only implement parsing. Behavior is unchanged. Verified against chat.ecency.com 10.11 with @Good-Karma's PAT — same top-level keys, same channel-item keys, same 23 channels with matching IDs, same totalUnread on both /channels and /channels/unreads.
|
@codex review this PR/branch changes |
|
Codex Review: Didn't find any major issues. Delightful! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Several server-side proxy routes used bare fetch() with no timeout. If
the upstream hangs, the request stays in vision_web's active list
indefinitely and stacks up under load — the same shape of bug that
caused vision_web replicas to be killed when pl.ecency.com degraded
(the /pl/api/event Next.js external rewrite has no timeout either; that
piece is being addressed separately at the nginx layer).
Add AbortSignal.timeout to:
- /api/mattermost/users/[userId]/image (8s) — avatar proxy via
chat.ecency.com. Wraps the fetch in try/catch and returns 504 on
timeout, 502 on other transport errors.
- /api/threespeak/upload-token (10s) — 3Speak admin API.
- /api/threespeak/thumbnail (10s) — 3Speak admin API.
- /api/stats (8s) — Plausible /api/v2/query, the same Plausible host
whose recent intermittent slowness we already saw stack up requests
on the events endpoint. Wraps fetch in try/catch and returns 504 on
timeout, 502 on other errors.
8s matches the CF Worker primary timeout. Anything slower than that has
already been cut off at the edge, so there's no value in waiting longer.
Pre-existing issue noted, NOT fixed in this commit:
/api/mattermost/users/[userId]/image still uses Next.js 14-style sync
`params` in its handler signature; Next.js 15 expects
`Promise<{ userId: string }>`. Likely a runtime bug (params.userId
returns undefined on a Promise). Worth a follow-up PR — out of scope
here.
Review feedback on the previous commit (c6e32dd): - /api/mattermost/users/[userId]/image: the try/catch only wrapped the fetch() call; body reads (res.text() / res.arrayBuffer()) lived outside it, so an abort during streaming would escape uncaught. Move all response handling inside the try block, and treat both TimeoutError (initial-fetch timeout) and AbortError (mid-stream abort) as the timeout case (504); other errors stay as 502. - /api/stats: response.json() runs under the same AbortSignal as the fetch, so a timeout during body read currently routes through the parse-error path and returns 400. Split the second try/catch: TimeoutError/AbortError -> 504, anything else -> 502 (was 400). - /api/threespeak/thumbnail: catch returned 500 for everything. TimeoutError/AbortError -> 504, TypeError (Node fetch wraps DNS/TLS/ connect failures as TypeError('fetch failed')) -> 502, other -> 500 (unchanged). The sibling /api/threespeak/upload-token has the same catch shape but wasn't called out in review; left alone for minimal scope. Worth mirroring in a follow-up if/when that file is touched again.
Summary
Switch the
/api/mattermost/channelsand/api/mattermost/channels/unreadsroute handlers from a paginated fan-out (5+ upstream calls per request, up to 3 pages each) to the streaming endpoints the official Mattermost webapp uses:GET /users/me/channels(nopage/per_page) — server streams the full channel list in one response.GET /users/me/channel_members?page=-1— NDJSON streaming mode for member rows.A small
mmFetchNdjsonhelper inserver/mattermost.tsparses line-delimited JSON with the same timeout/abort semantics asmmFetch.Why
Under traffic spikes the prior fan-out (channels + members + categories + preferences + threads + a
/users/idsPOST, each potentially paginating) competed with SSR rendering for the Node.js event loop. The in-band/api/healthcheckqueued behind the work and Swarm killed replicas withdockerexec: unhealthy container. Reducing per-request upstream calls cuts CPU pressure on the proxy path. This won't fully eliminate the kill cascade by itself — SSR-side load is still the dominant pressure — but it removes one of the contributors and fixes a latent counting bug along the way.Backwards compatibility
Response shapes are preserved exactly. Verified against
chat.ecency.com10.11 with@good-karma's PAT:/channels: byte-for-byte identical (23 channels, 17,934 b, all keys preserved includingdirectUseron DMs)/unreads: identical totals (totalUnread=65,totalMentions=1,totalDMs=0,totalThreads=0,truncated=false)The
truncatedflag in/unreadsnarrows in meaning: it now only flips totrueif a user has >400 active threads, never from channel/member pagination caps. The websocket optimistic-update path inmattermost-websocket.tsalready short-circuits gracefully when channels are missing, so this is backwards compatible for the mobile app and other clients.Side effect (admin-only)
The previous paginated form against
/users/me/channels?page=N&per_page=200produced cumulative duplicate rows for cross-team admin accounts, doublingtotalUnread. The streaming form returns each membership once, so admin totals are now correct. Regular single-team users (the typical case) see no numeric change.Test plan
pnpm --filter @ecency/web build) succeeds@good-karma)chat.ecency.comMattermost thread unreads truncatedevents/api/mattermost/*p99 latency unchanged or better; no rise in 504sSummary by CodeRabbit
Performance Improvements
Reliability & Error Handling
Bug Fixes