Stream Mattermost channels & members instead of paginating by feruzm · Pull Request #776 · ecency/vision-next

feruzm · 2026-05-08T11:32:07Z

Summary

Switch the /api/mattermost/channels and /api/mattermost/channels/unreads route handlers from a paginated fan-out (5+ upstream calls per request, up to 3 pages each) to the streaming endpoints the official Mattermost webapp uses:

GET /users/me/channels (no page/per_page) — server streams the full channel list in one response.
GET /users/me/channel_members?page=-1 — NDJSON streaming mode for member rows.
Threads still paginate (no streaming form available); cap unchanged at 2 pages × 200.

A small mmFetchNdjson helper in server/mattermost.ts parses line-delimited JSON with the same timeout/abort semantics as mmFetch.

Why

Under traffic spikes the prior fan-out (channels + members + categories + preferences + threads + a /users/ids POST, each potentially paginating) competed with SSR rendering for the Node.js event loop. The in-band /api/healthcheck queued behind the work and Swarm killed replicas with dockerexec: unhealthy container. Reducing per-request upstream calls cuts CPU pressure on the proxy path. This won't fully eliminate the kill cascade by itself — SSR-side load is still the dominant pressure — but it removes one of the contributors and fixes a latent counting bug along the way.

Backwards compatibility

Response shapes are preserved exactly. Verified against chat.ecency.com 10.11 with @good-karma's PAT:

/channels: byte-for-byte identical (23 channels, 17,934 b, all keys preserved including directUser on DMs)
/unreads: identical totals (totalUnread=65, totalMentions=1, totalDMs=0, totalThreads=0, truncated=false)

The truncated flag in /unreads narrows in meaning: it now only flips to true if a user has >400 active threads, never from channel/member pagination caps. The websocket optimistic-update path in mattermost-websocket.ts already short-circuits gracefully when channels are missing, so this is backwards compatible for the mobile app and other clients.

Side effect (admin-only)

The previous paginated form against /users/me/channels?page=N&per_page=200 produced cumulative duplicate rows for cross-team admin accounts, doubling totalUnread. The streaming form returns each membership once, so admin totals are now correct. Regular single-team users (the typical case) see no numeric change.

Test plan

Type-check (no new errors in changed files)
Production build (pnpm --filter @ecency/web build) succeeds
Endpoint shape parity with current prod (admin and @good-karma)
Local container test against live chat.ecency.com
Smoke after rollout: chat panel mount loads channels, badge count matches, mark-as-read still updates badge instantly via WS
Sentry: no rise in Mattermost thread unreads truncated events
Nginx logs: /api/mattermost/* p99 latency unchanged or better; no rise in 504s

Summary by CodeRabbit

Performance Improvements
- Faster Mattermost channel/member loading via single-request and streaming fetches, improving responsiveness.
Reliability & Error Handling
- Network calls (avatars, stats, thumbnails, uploads) now have timeouts and clearer error responses to avoid hangs.
Bug Fixes
- Unread counts/truncation reporting adjusted to reflect thread-specific pagination and avoid misleading truncation warnings.

@Good-Karma

The /api/mattermost/channels and /api/mattermost/channels/unreads route handlers fanned out 5+ paginated upstream calls per request (channels, members, categories, preferences, threads, plus an admin lookup), each with up to 3 pages × 200 items. Under traffic spikes this fan-out chewed through the Node.js event loop and competed with SSR for CPU, contributing to in-band /api/healthcheck timeouts that triggered Swarm to kill replicas. Switch to the streaming forms the official Mattermost webapp uses: - GET /users/me/channels (no page params): server streams the full channel list in a single response. Replaces fetchAllChannelPages's 3-page loop. - GET /users/me/channel_members?page=-1: NDJSON streaming mode that returns every member row in one response. Replaces fetchAllChannelMemberPages's per-team paginated loop. Adds an mmFetchNdjson helper in server/mattermost.ts to parse line-delimited JSON with the same timeout/abort semantics as mmFetch. Threads still paginate (max 2 pages) since the threads endpoint has no streaming form and threads are bounded by user activity. The truncated flag in the /unreads response narrows in meaning: it now only flips to true when a user has more than 400 active threads, never from channel/member pagination caps. The websocket optimistic-update path in mattermost-websocket.ts already handles a missing target channel gracefully, so this is backwards compatible. Verified against chat.ecency.com 10.11 with @Good-Karma's PAT: - /channels: byte-for-byte identical response (23 channels, 17,934 b) - /unreads: identical (totalUnread=65, totalMentions=1, totalDMs=0, totalThreads=0, truncated=false) Side effect: the previous paginated form's ?page=N&per_page=200 against /users/me/channels exhibited cumulative duplicate rows for cross-team admin accounts, doubling totalUnread. The streaming form returns each membership exactly once, so totals are now correct for those edge cases. Regular single-team users (the typical case) see no change.

coderabbitai · 2026-05-08T11:38:32Z

Warning

Rate limit exceeded

@feruzm has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 27 minutes and 50 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bad9b4ec-8be0-4a43-ae8d-752a094cca49

📥 Commits

Reviewing files that changed from the base of the PR and between c6e32dd and 298c45c.

📒 Files selected for processing (3)

apps/web/src/app/api/mattermost/users/[userId]/image/route.ts
apps/web/src/app/api/stats/route.ts
apps/web/src/app/api/threespeak/thumbnail/route.ts

📝 Walkthrough

Walkthrough

Consolidates Mattermost channel and member fetching by adding mmFetchRaw/mmFetchNdjson utilities, switching channel/member helpers to single-request and NDJSON streaming endpoints, updating routes, tightening thread-unread pagination reporting, and adding timeouts to several upstream fetches.

Changes

Mattermost Channel & Member Fetch Refactoring

Layer / File(s)	Summary
NDJSON Fetch Utility `apps/web/src/server/mattermost.ts`	Add `mmFetchRaw`, adapt `mmFetch` to use it, and add `mmFetchNdjson` + exported `mmUserFetchNdjson` for NDJSON parsing.
Channel & Member Helpers `apps/web/src/app/api/mattermost/channels/helpers.ts`	`fetchAllChannelPages` becomes a single `/users/me/channels` call; `fetchAllChannelMemberPages` streams `/users/me/channel_members?page=-1` via NDJSON and drops `teamId` and pagination constants.
Route Integration `apps/web/src/app/api/mattermost/channels/route.ts`, `apps/web/src/app/api/mattermost/channels/unreads/route.ts`	Routes/imports updated to call new helper signatures; unreads endpoint uses streaming helpers for channels/members.
Unreads Pagination & Reporting `apps/web/src/app/api/mattermost/channels/unreads/route.ts`	Introduce thread-specific `THREAD_PAGE_SIZE`/limits; thread pagination drives `truncated` and Sentry payloads.
Avatar Timeout Handling `apps/web/src/app/api/mattermost/users/[userId]/image/route.ts`	Wrap upstream avatar fetch in try/catch with `AbortSignal.timeout(8000)`; map timeouts to 504 and other fetch errors to 502.
Plausible Stats Timeout `apps/web/src/app/api/stats/route.ts`	Wrap Plausible fetch in try/catch with `AbortSignal.timeout(8000)`; map timeouts to 504 and other fetch failures to 502; JSON parse errors return 400.
3Speak Timeouts & Origins `apps/web/src/app/api/threespeak/thumbnail/route.ts`, `apps/web/src/app/api/threespeak/upload-token/route.ts`	Add 10s outbound fetch timeouts; upload-token sets `allowed_origins` only when request includes an `Origin` (ecency domains), otherwise empty.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Route
  participant fetchHelpers
  participant MattermostAPI
  Client->>Route: request unreads / channels
  Route->>fetchHelpers: fetchAllChannelPages(), fetchAllChannelMemberPages(token)
  fetchHelpers->>MattermostAPI: /users/me/channels (single request)
  fetchHelpers->>MattermostAPI: /users/me/channel_members?page=-1 (NDJSON)
  MattermostAPI-->>fetchHelpers: JSON / NDJSON responses
  fetchHelpers-->>Route: combined results
  Route-->>Client: aggregated response

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

ecency/vision-next#630: Overlaps with replacing pagination-based helpers with single-request/NDJSON flows.
ecency/vision-next#747: Modifies Mattermost fetch utilities and AbortSignal handling; related to mmFetch/mmFetchRaw changes.
ecency/vision-next#676: Also touches Mattermost HTTP helpers and exported fetch functions.

Suggested labels

patch

Poem

🐰 I hopped through streams, no pages more,
One call, one flow, I bounce to the core.
Timeouts set so nets don't drag,
NDJSON hums — a tidy tag.
Cheers from a rabbit, light on the floor!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and accurately summarizes the main change: replacing pagination with streaming for Mattermost channels and members fetching.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/mattermost-streaming-proxy

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

apps/web/src/server/mattermost.ts (1)

420-479: ⚡ Quick win

Extract a shared mmFetchRaw primitive to avoid duplicating the timeout/abort/error-handling boilerplate.

mmFetchNdjson repeats ~25 lines from mmFetch verbatim — base-URL resolution, AbortSignal.any composition, the fetch(...) call, the non-OK error block, and the TimeoutError rethrow. The only real difference is how the response body is parsed. A single internal helper that returns string | null lets both callers focus on their parsing logic, and ensures any future change (tracing, metric instrumentation, error mapping) only needs to happen once.

♻️ Proposed refactor

+async function mmFetchRaw(path: string, init?: RequestInit): Promise<string | null> {
+  const base = requireEnv(MATTERMOST_BASE_URL, "MATTERMOST_BASE_URL");
+  const timeoutSignal = AbortSignal.timeout(MM_FETCH_TIMEOUT_MS);
+  const signal = init?.signal
+    ? AbortSignal.any([init.signal, timeoutSignal])
+    : timeoutSignal;
+  try {
+    const res = await fetch(`${base}${path}`, {
+      ...init,
+      headers: { ...(init?.headers || {}), Accept: "application/json" },
+      signal
+    });
+    if (!res.ok) {
+      const text = await res.text();
+      throw new MattermostError(`Mattermost request failed (${res.status}): ${text}`, res.status);
+    }
+    const text = await res.text();
+    return text || null;
+  } catch (err) {
+    if (err instanceof Error && err.name === "TimeoutError") {
+      throw new MattermostError(
+        `Mattermost request timed out after ${MM_FETCH_TIMEOUT_MS}ms (${path})`,
+        504
+      );
+    }
+    throw err;
+  }
+}

 async function mmFetch<T>(path: string, init?: RequestInit): Promise<T> {
-  const base = requireEnv(MATTERMOST_BASE_URL, "MATTERMOST_BASE_URL");
-  const timeoutSignal = AbortSignal.timeout(MM_FETCH_TIMEOUT_MS);
-  const signal = init?.signal
-    ? AbortSignal.any([init.signal, timeoutSignal])
-    : timeoutSignal;
-  try {
-    const res = await fetch(`${base}${path}`, { ...init, headers: { ...(init?.headers || {}), Accept: "application/json" }, signal });
-    if (!res.ok) {
-      const text = await res.text();
-      throw new MattermostError(`Mattermost request failed (${res.status}): ${text}`, res.status);
-    }
-    const text = await res.text();
-    if (!text) return undefined as T;
-    return JSON.parse(text) as T;
-  } catch (err) {
-    if (err instanceof Error && err.name === "TimeoutError") {
-      throw new MattermostError(`Mattermost request timed out after ${MM_FETCH_TIMEOUT_MS}ms (${path})`, 504);
-    }
-    throw err;
-  }
+  const text = await mmFetchRaw(path, init);
+  if (text === null) return undefined as T;
+  return JSON.parse(text) as T;
 }

 async function mmFetchNdjson<T>(path: string, init?: RequestInit): Promise<T[]> {
-  const base = requireEnv(MATTERMOST_BASE_URL, "MATTERMOST_BASE_URL");
-  const timeoutSignal = AbortSignal.timeout(MM_FETCH_TIMEOUT_MS);
-  const signal = init?.signal
-    ? AbortSignal.any([init.signal, timeoutSignal])
-    : timeoutSignal;
-  try {
-    const res = await fetch(`${base}${path}`, { ...init, headers: { ...(init?.headers || {}), Accept: "application/json" }, signal });
-    if (!res.ok) {
-      const text = await res.text();
-      throw new MattermostError(`Mattermost request failed (${res.status}): ${text}`, res.status);
-    }
-    const text = await res.text();
-    if (!text) return [];
+  const text = await mmFetchRaw(path, init);
+  if (!text) return [];
     const results: T[] = [];
     for (const line of text.split("\n")) {
       const trimmed = line.trim();
       if (!trimmed) continue;
       try {
         results.push(JSON.parse(trimmed) as T);
       } catch {
         // Tolerate a malformed final line (partial chunk at EOS).
       }
     }
     return results;
-  } catch (err) {
-    if (err instanceof Error && err.name === "TimeoutError") {
-      throw new MattermostError(`Mattermost request timed out after ${MM_FETCH_TIMEOUT_MS}ms (${path})`, 504);
-    }
-    throw err;
-  }
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/web/src/server/mattermost.ts` around lines 420 - 479, Extract a shared
helper (e.g., mmFetchRaw) that centralizes base URL resolution
(requireEnv(MATTERMOST_BASE_URL,...)), AbortSignal.any composition with
MM_FETCH_TIMEOUT_MS, the fetch(...) invocation, non-OK response handling that
throws MattermostError, and the TimeoutError mapping, and have mmFetchNdjson and
mmUserFetchNdjson call this helper and only perform response parsing; replace
duplicated logic in mmFetchNdjson with a call to mmFetchRaw(path, init) which
returns the response text or null, and update mmUserFetchNdjson to pass the
Authorization header into mmFetchRaw so both functions keep only
parsing-specific behavior while all timeout/abort/error boilerplate lives in
mmFetchRaw.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@apps/web/src/server/mattermost.ts`:
- Around line 420-479: Extract a shared helper (e.g., mmFetchRaw) that
centralizes base URL resolution (requireEnv(MATTERMOST_BASE_URL,...)),
AbortSignal.any composition with MM_FETCH_TIMEOUT_MS, the fetch(...) invocation,
non-OK response handling that throws MattermostError, and the TimeoutError
mapping, and have mmFetchNdjson and mmUserFetchNdjson call this helper and only
perform response parsing; replace duplicated logic in mmFetchNdjson with a call
to mmFetchRaw(path, init) which returns the response text or null, and update
mmUserFetchNdjson to pass the Authorization header into mmFetchRaw so both
functions keep only parsing-specific behavior while all timeout/abort/error
boilerplate lives in mmFetchRaw.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2dec7ebf-0068-49ba-9671-b0d3fa584485

📥 Commits

Reviewing files that changed from the base of the PR and between a56e8f2 and 892b893.

📒 Files selected for processing (4)

apps/web/src/app/api/mattermost/channels/helpers.ts
apps/web/src/app/api/mattermost/channels/route.ts
apps/web/src/app/api/mattermost/channels/unreads/route.ts
apps/web/src/server/mattermost.ts

@Good-Karma

mmFetch and mmFetchNdjson both did the same setup (base URL, timeout signal merge with caller AbortSignal, fetch, non-OK -> MattermostError, TimeoutError -> 504 mapping) and only differed in how they parsed the response body. Extract that boilerplate into a private mmFetchRaw that returns the response body text; have both wrappers call it and only implement parsing. Behavior is unchanged. Verified against chat.ecency.com 10.11 with @Good-Karma's PAT — same top-level keys, same channel-item keys, same 23 channels with matching IDs, same totalUnread on both /channels and /channels/unreads.

feruzm · 2026-05-08T12:26:02Z

@codex review this PR/branch changes

chatgpt-codex-connector · 2026-05-08T12:30:47Z

Codex Review: Didn't find any major issues. Delightful!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Several server-side proxy routes used bare fetch() with no timeout. If the upstream hangs, the request stays in vision_web's active list indefinitely and stacks up under load — the same shape of bug that caused vision_web replicas to be killed when pl.ecency.com degraded (the /pl/api/event Next.js external rewrite has no timeout either; that piece is being addressed separately at the nginx layer). Add AbortSignal.timeout to: - /api/mattermost/users/[userId]/image (8s) — avatar proxy via chat.ecency.com. Wraps the fetch in try/catch and returns 504 on timeout, 502 on other transport errors. - /api/threespeak/upload-token (10s) — 3Speak admin API. - /api/threespeak/thumbnail (10s) — 3Speak admin API. - /api/stats (8s) — Plausible /api/v2/query, the same Plausible host whose recent intermittent slowness we already saw stack up requests on the events endpoint. Wraps fetch in try/catch and returns 504 on timeout, 502 on other errors. 8s matches the CF Worker primary timeout. Anything slower than that has already been cut off at the edge, so there's no value in waiting longer. Pre-existing issue noted, NOT fixed in this commit: /api/mattermost/users/[userId]/image still uses Next.js 14-style sync `params` in its handler signature; Next.js 15 expects `Promise<{ userId: string }>`. Likely a runtime bug (params.userId returns undefined on a Promise). Worth a follow-up PR — out of scope here.

Review feedback on the previous commit (c6e32dd): - /api/mattermost/users/[userId]/image: the try/catch only wrapped the fetch() call; body reads (res.text() / res.arrayBuffer()) lived outside it, so an abort during streaming would escape uncaught. Move all response handling inside the try block, and treat both TimeoutError (initial-fetch timeout) and AbortError (mid-stream abort) as the timeout case (504); other errors stay as 502. - /api/stats: response.json() runs under the same AbortSignal as the fetch, so a timeout during body read currently routes through the parse-error path and returns 400. Split the second try/catch: TimeoutError/AbortError -> 504, anything else -> 502 (was 400). - /api/threespeak/thumbnail: catch returned 500 for everything. TimeoutError/AbortError -> 504, TypeError (Node fetch wraps DNS/TLS/ connect failures as TypeError('fetch failed')) -> 502, other -> 500 (unchanged). The sibling /api/threespeak/upload-token has the same catch shape but wasn't called out in review; left alone for minimal scope. Worth mirroring in a follow-up if/when that file is touched again.

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

This comment was marked as resolved.

Sign in to view

feruzm merged commit e13e2fb into develop May 8, 2026
1 check passed

feruzm deleted the perf/mattermost-streaming-proxy branch May 8, 2026 17:25

feruzm mentioned this pull request May 8, 2026

Update mattermost-channels spec for streaming fetchers #777

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream Mattermost channels & members instead of paginating#776

Stream Mattermost channels & members instead of paginating#776
feruzm merged 4 commits into
developfrom
perf/mattermost-streaming-proxy

feruzm commented May 8, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 8, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

feruzm commented May 8, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 8, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

feruzm commented May 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Backwards compatibility

Side effect (admin-only)

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

feruzm commented May 8, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 8, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feruzm commented May 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 8, 2026 •

edited

Loading