Skip to content

Stream Mattermost channels & members instead of paginating#776

Merged
feruzm merged 4 commits into
developfrom
perf/mattermost-streaming-proxy
May 8, 2026
Merged

Stream Mattermost channels & members instead of paginating#776
feruzm merged 4 commits into
developfrom
perf/mattermost-streaming-proxy

Conversation

@feruzm
Copy link
Copy Markdown
Member

@feruzm feruzm commented May 8, 2026

Summary

Switch the /api/mattermost/channels and /api/mattermost/channels/unreads route handlers from a paginated fan-out (5+ upstream calls per request, up to 3 pages each) to the streaming endpoints the official Mattermost webapp uses:

  • GET /users/me/channels (no page/per_page) — server streams the full channel list in one response.
  • GET /users/me/channel_members?page=-1 — NDJSON streaming mode for member rows.
  • Threads still paginate (no streaming form available); cap unchanged at 2 pages × 200.

A small mmFetchNdjson helper in server/mattermost.ts parses line-delimited JSON with the same timeout/abort semantics as mmFetch.

Why

Under traffic spikes the prior fan-out (channels + members + categories + preferences + threads + a /users/ids POST, each potentially paginating) competed with SSR rendering for the Node.js event loop. The in-band /api/healthcheck queued behind the work and Swarm killed replicas with dockerexec: unhealthy container. Reducing per-request upstream calls cuts CPU pressure on the proxy path. This won't fully eliminate the kill cascade by itself — SSR-side load is still the dominant pressure — but it removes one of the contributors and fixes a latent counting bug along the way.

Backwards compatibility

Response shapes are preserved exactly. Verified against chat.ecency.com 10.11 with @good-karma's PAT:

  • /channels: byte-for-byte identical (23 channels, 17,934 b, all keys preserved including directUser on DMs)
  • /unreads: identical totals (totalUnread=65, totalMentions=1, totalDMs=0, totalThreads=0, truncated=false)

The truncated flag in /unreads narrows in meaning: it now only flips to true if a user has >400 active threads, never from channel/member pagination caps. The websocket optimistic-update path in mattermost-websocket.ts already short-circuits gracefully when channels are missing, so this is backwards compatible for the mobile app and other clients.

Side effect (admin-only)

The previous paginated form against /users/me/channels?page=N&per_page=200 produced cumulative duplicate rows for cross-team admin accounts, doubling totalUnread. The streaming form returns each membership once, so admin totals are now correct. Regular single-team users (the typical case) see no numeric change.

Test plan

  • Type-check (no new errors in changed files)
  • Production build (pnpm --filter @ecency/web build) succeeds
  • Endpoint shape parity with current prod (admin and @good-karma)
  • Local container test against live chat.ecency.com
  • Smoke after rollout: chat panel mount loads channels, badge count matches, mark-as-read still updates badge instantly via WS
  • Sentry: no rise in Mattermost thread unreads truncated events
  • Nginx logs: /api/mattermost/* p99 latency unchanged or better; no rise in 504s

Summary by CodeRabbit

  • Performance Improvements

    • Faster Mattermost channel/member loading via single-request and streaming fetches, improving responsiveness.
  • Reliability & Error Handling

    • Network calls (avatars, stats, thumbnails, uploads) now have timeouts and clearer error responses to avoid hangs.
  • Bug Fixes

    • Unread counts/truncation reporting adjusted to reflect thread-specific pagination and avoid misleading truncation warnings.

The /api/mattermost/channels and /api/mattermost/channels/unreads route
handlers fanned out 5+ paginated upstream calls per request (channels,
members, categories, preferences, threads, plus an admin lookup), each
with up to 3 pages × 200 items. Under traffic spikes this fan-out chewed
through the Node.js event loop and competed with SSR for CPU, contributing
to in-band /api/healthcheck timeouts that triggered Swarm to kill replicas.

Switch to the streaming forms the official Mattermost webapp uses:

- GET /users/me/channels (no page params): server streams the full
  channel list in a single response. Replaces fetchAllChannelPages's
  3-page loop.
- GET /users/me/channel_members?page=-1: NDJSON streaming mode that
  returns every member row in one response. Replaces
  fetchAllChannelMemberPages's per-team paginated loop. Adds an
  mmFetchNdjson helper in server/mattermost.ts to parse line-delimited
  JSON with the same timeout/abort semantics as mmFetch.

Threads still paginate (max 2 pages) since the threads endpoint has no
streaming form and threads are bounded by user activity.

The truncated flag in the /unreads response narrows in meaning: it now
only flips to true when a user has more than 400 active threads, never
from channel/member pagination caps. The websocket optimistic-update path
in mattermost-websocket.ts already handles a missing target channel
gracefully, so this is backwards compatible.

Verified against chat.ecency.com 10.11 with @Good-Karma's PAT:
- /channels: byte-for-byte identical response (23 channels, 17,934 b)
- /unreads: identical (totalUnread=65, totalMentions=1, totalDMs=0,
  totalThreads=0, truncated=false)

Side effect: the previous paginated form's ?page=N&per_page=200 against
/users/me/channels exhibited cumulative duplicate rows for cross-team
admin accounts, doubling totalUnread. The streaming form returns each
membership exactly once, so totals are now correct for those edge cases.
Regular single-team users (the typical case) see no change.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 8, 2026

Review Change Stack

Warning

Rate limit exceeded

@feruzm has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 27 minutes and 50 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bad9b4ec-8be0-4a43-ae8d-752a094cca49

📥 Commits

Reviewing files that changed from the base of the PR and between c6e32dd and 298c45c.

📒 Files selected for processing (3)
  • apps/web/src/app/api/mattermost/users/[userId]/image/route.ts
  • apps/web/src/app/api/stats/route.ts
  • apps/web/src/app/api/threespeak/thumbnail/route.ts
📝 Walkthrough

Walkthrough

Consolidates Mattermost channel and member fetching by adding mmFetchRaw/mmFetchNdjson utilities, switching channel/member helpers to single-request and NDJSON streaming endpoints, updating routes, tightening thread-unread pagination reporting, and adding timeouts to several upstream fetches.

Changes

Mattermost Channel & Member Fetch Refactoring

Layer / File(s) Summary
NDJSON Fetch Utility
apps/web/src/server/mattermost.ts
Add mmFetchRaw, adapt mmFetch to use it, and add mmFetchNdjson + exported mmUserFetchNdjson for NDJSON parsing.
Channel & Member Helpers
apps/web/src/app/api/mattermost/channels/helpers.ts
fetchAllChannelPages becomes a single /users/me/channels call; fetchAllChannelMemberPages streams /users/me/channel_members?page=-1 via NDJSON and drops teamId and pagination constants.
Route Integration
apps/web/src/app/api/mattermost/channels/route.ts, apps/web/src/app/api/mattermost/channels/unreads/route.ts
Routes/imports updated to call new helper signatures; unreads endpoint uses streaming helpers for channels/members.
Unreads Pagination & Reporting
apps/web/src/app/api/mattermost/channels/unreads/route.ts
Introduce thread-specific THREAD_PAGE_SIZE/limits; thread pagination drives truncated and Sentry payloads.
Avatar Timeout Handling
apps/web/src/app/api/mattermost/users/[userId]/image/route.ts
Wrap upstream avatar fetch in try/catch with AbortSignal.timeout(8000); map timeouts to 504 and other fetch errors to 502.
Plausible Stats Timeout
apps/web/src/app/api/stats/route.ts
Wrap Plausible fetch in try/catch with AbortSignal.timeout(8000); map timeouts to 504 and other fetch failures to 502; JSON parse errors return 400.
3Speak Timeouts & Origins
apps/web/src/app/api/threespeak/thumbnail/route.ts, apps/web/src/app/api/threespeak/upload-token/route.ts
Add 10s outbound fetch timeouts; upload-token sets allowed_origins only when request includes an Origin (ecency domains), otherwise empty.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Route
  participant fetchHelpers
  participant MattermostAPI
  Client->>Route: request unreads / channels
  Route->>fetchHelpers: fetchAllChannelPages(), fetchAllChannelMemberPages(token)
  fetchHelpers->>MattermostAPI: /users/me/channels (single request)
  fetchHelpers->>MattermostAPI: /users/me/channel_members?page=-1 (NDJSON)
  MattermostAPI-->>fetchHelpers: JSON / NDJSON responses
  fetchHelpers-->>Route: combined results
  Route-->>Client: aggregated response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

patch

Poem

🐰 I hopped through streams, no pages more,
One call, one flow, I bounce to the core.
Timeouts set so nets don't drag,
NDJSON hums — a tidy tag.
Cheers from a rabbit, light on the floor!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main change: replacing pagination with streaming for Mattermost channels and members fetching.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/mattermost-streaming-proxy

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
apps/web/src/server/mattermost.ts (1)

420-479: ⚡ Quick win

Extract a shared mmFetchRaw primitive to avoid duplicating the timeout/abort/error-handling boilerplate.

mmFetchNdjson repeats ~25 lines from mmFetch verbatim — base-URL resolution, AbortSignal.any composition, the fetch(...) call, the non-OK error block, and the TimeoutError rethrow. The only real difference is how the response body is parsed. A single internal helper that returns string | null lets both callers focus on their parsing logic, and ensures any future change (tracing, metric instrumentation, error mapping) only needs to happen once.

♻️ Proposed refactor
+async function mmFetchRaw(path: string, init?: RequestInit): Promise<string | null> {
+  const base = requireEnv(MATTERMOST_BASE_URL, "MATTERMOST_BASE_URL");
+  const timeoutSignal = AbortSignal.timeout(MM_FETCH_TIMEOUT_MS);
+  const signal = init?.signal
+    ? AbortSignal.any([init.signal, timeoutSignal])
+    : timeoutSignal;
+  try {
+    const res = await fetch(`${base}${path}`, {
+      ...init,
+      headers: { ...(init?.headers || {}), Accept: "application/json" },
+      signal
+    });
+    if (!res.ok) {
+      const text = await res.text();
+      throw new MattermostError(`Mattermost request failed (${res.status}): ${text}`, res.status);
+    }
+    const text = await res.text();
+    return text || null;
+  } catch (err) {
+    if (err instanceof Error && err.name === "TimeoutError") {
+      throw new MattermostError(
+        `Mattermost request timed out after ${MM_FETCH_TIMEOUT_MS}ms (${path})`,
+        504
+      );
+    }
+    throw err;
+  }
+}

 async function mmFetch<T>(path: string, init?: RequestInit): Promise<T> {
-  const base = requireEnv(MATTERMOST_BASE_URL, "MATTERMOST_BASE_URL");
-  const timeoutSignal = AbortSignal.timeout(MM_FETCH_TIMEOUT_MS);
-  const signal = init?.signal
-    ? AbortSignal.any([init.signal, timeoutSignal])
-    : timeoutSignal;
-  try {
-    const res = await fetch(`${base}${path}`, { ...init, headers: { ...(init?.headers || {}), Accept: "application/json" }, signal });
-    if (!res.ok) {
-      const text = await res.text();
-      throw new MattermostError(`Mattermost request failed (${res.status}): ${text}`, res.status);
-    }
-    const text = await res.text();
-    if (!text) return undefined as T;
-    return JSON.parse(text) as T;
-  } catch (err) {
-    if (err instanceof Error && err.name === "TimeoutError") {
-      throw new MattermostError(`Mattermost request timed out after ${MM_FETCH_TIMEOUT_MS}ms (${path})`, 504);
-    }
-    throw err;
-  }
+  const text = await mmFetchRaw(path, init);
+  if (text === null) return undefined as T;
+  return JSON.parse(text) as T;
 }

 async function mmFetchNdjson<T>(path: string, init?: RequestInit): Promise<T[]> {
-  const base = requireEnv(MATTERMOST_BASE_URL, "MATTERMOST_BASE_URL");
-  const timeoutSignal = AbortSignal.timeout(MM_FETCH_TIMEOUT_MS);
-  const signal = init?.signal
-    ? AbortSignal.any([init.signal, timeoutSignal])
-    : timeoutSignal;
-  try {
-    const res = await fetch(`${base}${path}`, { ...init, headers: { ...(init?.headers || {}), Accept: "application/json" }, signal });
-    if (!res.ok) {
-      const text = await res.text();
-      throw new MattermostError(`Mattermost request failed (${res.status}): ${text}`, res.status);
-    }
-    const text = await res.text();
-    if (!text) return [];
+  const text = await mmFetchRaw(path, init);
+  if (!text) return [];
     const results: T[] = [];
     for (const line of text.split("\n")) {
       const trimmed = line.trim();
       if (!trimmed) continue;
       try {
         results.push(JSON.parse(trimmed) as T);
       } catch {
         // Tolerate a malformed final line (partial chunk at EOS).
       }
     }
     return results;
-  } catch (err) {
-    if (err instanceof Error && err.name === "TimeoutError") {
-      throw new MattermostError(`Mattermost request timed out after ${MM_FETCH_TIMEOUT_MS}ms (${path})`, 504);
-    }
-    throw err;
-  }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/web/src/server/mattermost.ts` around lines 420 - 479, Extract a shared
helper (e.g., mmFetchRaw) that centralizes base URL resolution
(requireEnv(MATTERMOST_BASE_URL,...)), AbortSignal.any composition with
MM_FETCH_TIMEOUT_MS, the fetch(...) invocation, non-OK response handling that
throws MattermostError, and the TimeoutError mapping, and have mmFetchNdjson and
mmUserFetchNdjson call this helper and only perform response parsing; replace
duplicated logic in mmFetchNdjson with a call to mmFetchRaw(path, init) which
returns the response text or null, and update mmUserFetchNdjson to pass the
Authorization header into mmFetchRaw so both functions keep only
parsing-specific behavior while all timeout/abort/error boilerplate lives in
mmFetchRaw.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@apps/web/src/server/mattermost.ts`:
- Around line 420-479: Extract a shared helper (e.g., mmFetchRaw) that
centralizes base URL resolution (requireEnv(MATTERMOST_BASE_URL,...)),
AbortSignal.any composition with MM_FETCH_TIMEOUT_MS, the fetch(...) invocation,
non-OK response handling that throws MattermostError, and the TimeoutError
mapping, and have mmFetchNdjson and mmUserFetchNdjson call this helper and only
perform response parsing; replace duplicated logic in mmFetchNdjson with a call
to mmFetchRaw(path, init) which returns the response text or null, and update
mmUserFetchNdjson to pass the Authorization header into mmFetchRaw so both
functions keep only parsing-specific behavior while all timeout/abort/error
boilerplate lives in mmFetchRaw.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2dec7ebf-0068-49ba-9671-b0d3fa584485

📥 Commits

Reviewing files that changed from the base of the PR and between a56e8f2 and 892b893.

📒 Files selected for processing (4)
  • apps/web/src/app/api/mattermost/channels/helpers.ts
  • apps/web/src/app/api/mattermost/channels/route.ts
  • apps/web/src/app/api/mattermost/channels/unreads/route.ts
  • apps/web/src/server/mattermost.ts

mmFetch and mmFetchNdjson both did the same setup (base URL, timeout
signal merge with caller AbortSignal, fetch, non-OK -> MattermostError,
TimeoutError -> 504 mapping) and only differed in how they parsed the
response body. Extract that boilerplate into a private mmFetchRaw that
returns the response body text; have both wrappers call it and only
implement parsing.

Behavior is unchanged. Verified against chat.ecency.com 10.11 with
@Good-Karma's PAT — same top-level keys, same channel-item keys, same
23 channels with matching IDs, same totalUnread on both /channels and
/channels/unreads.
@feruzm
Copy link
Copy Markdown
Member Author

feruzm commented May 8, 2026

@codex review this PR/branch changes

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Delightful!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Several server-side proxy routes used bare fetch() with no timeout. If
the upstream hangs, the request stays in vision_web's active list
indefinitely and stacks up under load — the same shape of bug that
caused vision_web replicas to be killed when pl.ecency.com degraded
(the /pl/api/event Next.js external rewrite has no timeout either; that
piece is being addressed separately at the nginx layer).

Add AbortSignal.timeout to:

- /api/mattermost/users/[userId]/image (8s) — avatar proxy via
  chat.ecency.com. Wraps the fetch in try/catch and returns 504 on
  timeout, 502 on other transport errors.
- /api/threespeak/upload-token (10s) — 3Speak admin API.
- /api/threespeak/thumbnail (10s) — 3Speak admin API.
- /api/stats (8s) — Plausible /api/v2/query, the same Plausible host
  whose recent intermittent slowness we already saw stack up requests
  on the events endpoint. Wraps fetch in try/catch and returns 504 on
  timeout, 502 on other errors.

8s matches the CF Worker primary timeout. Anything slower than that has
already been cut off at the edge, so there's no value in waiting longer.

Pre-existing issue noted, NOT fixed in this commit:
/api/mattermost/users/[userId]/image still uses Next.js 14-style sync
`params` in its handler signature; Next.js 15 expects
`Promise<{ userId: string }>`. Likely a runtime bug (params.userId
returns undefined on a Promise). Worth a follow-up PR — out of scope
here.
coderabbitai[bot]

This comment was marked as resolved.

Review feedback on the previous commit (c6e32dd):

- /api/mattermost/users/[userId]/image: the try/catch only wrapped the
  fetch() call; body reads (res.text() / res.arrayBuffer()) lived
  outside it, so an abort during streaming would escape uncaught. Move
  all response handling inside the try block, and treat both
  TimeoutError (initial-fetch timeout) and AbortError (mid-stream
  abort) as the timeout case (504); other errors stay as 502.

- /api/stats: response.json() runs under the same AbortSignal as the
  fetch, so a timeout during body read currently routes through the
  parse-error path and returns 400. Split the second try/catch:
  TimeoutError/AbortError -> 504, anything else -> 502 (was 400).

- /api/threespeak/thumbnail: catch returned 500 for everything.
  TimeoutError/AbortError -> 504, TypeError (Node fetch wraps DNS/TLS/
  connect failures as TypeError('fetch failed')) -> 502, other -> 500
  (unchanged).

The sibling /api/threespeak/upload-token has the same catch shape but
wasn't called out in review; left alone for minimal scope. Worth
mirroring in a follow-up if/when that file is touched again.
@feruzm feruzm merged commit e13e2fb into develop May 8, 2026
1 check passed
@feruzm feruzm deleted the perf/mattermost-streaming-proxy branch May 8, 2026 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant