fix(account): defensive against malformed KV user records (FAULT 17 — Sytze hotfix)#49
Merged
Merged
Conversation
… 1 diagnosis The Chrome dashboard edit at ~08:30 UTC overwrote user:spmosselaar@gmail.com with a partial JSON value missing the apiKey, email, and createdAt fields. Both /api/auth/me (line 51) and /account/page.tsx (line 291) do user.apiKey.slice(-4) → TypeError → 500 → "server-side exception" page. Confirmed Sytze-specific: production deploy SHA matches main HEAD from 2026-05-17 (no new deploy), only 3 Sentry events (all from one user/geo), anonymous traffic clean. No rollback needed — deploy is innocent. Phase 3 fix follows: defensive guards in /api/auth/me, /account, and getUser shape validation, plus Sentry capture on malformed records so the next dashboard edit doesn't silently break a paying customer.
Hotfix for the prod 500 on /account + /api/auth/me caused by the
~08:30 UTC Chrome dashboard edit (FAULT 16 manual recovery) which
replaced the JSON at user:spmosselaar@gmail.com instead of merging
into it. Post-edit the record was missing the apiKey, email, and
createdAt fields. Both /api/auth/me:51 and /account/page.tsx:291 did
user.apiKey.slice(-4) and threw TypeError — Next.js rendered the
generic server-side-exception page.
- /api/auth/me/route.ts: key_last4 nullish on missing apiKey; email
falls back to the session email; Sentry.captureMessage on a missing
apiKey ("malformed_user_record" warning level).
- /account/page.tsx: API key chip renders "not available — contact
support" when apiKey is missing; "needs attention" banner with a
mailto:contact link explains the customer's subscription is intact;
Sentry.captureMessage on either missing field.
- usage lookups gated on apiKey presence so the page renders even
when the record is partially malformed.
FAULT 17 added to FAULT-HISTORY-AND-PREVENTION.md. STATE.md +
CHANGELOG.md + lib/changelog-data.ts mirror the incident per the
FAULT 5 release-hygiene rule. Production release SHA was unchanged
through the incident (e9749a8 from 2026-05-17) — this is a data-shape
fix, not a code-regression rollback.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary — production incident hotfix
Sytze (first paying Pro customer) saw
Application error: a server-side exception has occurred while loading www.freightutils.com (Digest: 3327402320)at ~09:14 UTC.Root cause is NOT a deploy regression. Production SHA is
e9749a8e55115a54872cd8ef3a415243278ec6c9from 2026-05-17 and was unchanged through the incident (verified against the Verceltarget: "production"deployment, the Sentry release tag, and the runtime logs). PR #47 (admin-signup-notifications) is unmerged, base SHA = current main HEAD — its preview never reached production.Root cause is a data-shape regression from the ~08:30 UTC Chrome dashboard edit (FAULT 16 manual recovery): the Upstash JSON edit was a full-string replace, not a merge, so
user:spmosselaar@gmail.comwas left as{ plan: "pro", stripeCustomerId: "cus_UYLSdNQnCwt5Tf" }— missing theapiKey,email, andcreatedAtfields the application code expects. Both/api/auth/me:51and/account/page.tsx:291then diduser.apiKey.slice(-4)and threwTypeError: Cannot read properties of undefined (reading 'slice').Sentry caught it (
FREIGHTUTILS-3on/api/auth/meat 09:13:08 UTC,FREIGHTUTILS-4on/accountat 09:36:14 UTC, 3 events total all from one geo) but no alert rule was wired so Soap didn't see it until Sytze emailed. Anonymous traffic and every other authenticated user were unaffected through the entire incident.No rollback — there is no broken deploy to roll back; the deploy is innocent and rolling back wouldn't repair Sytze's KV record.
Phase 1 diagnosis
Committed first as
docs/incidents/2026-05-21-prod-homepage-500.md(commit0a945b7) per the hard rule that the audit/incident doc lands before any code or destructive action. Contains the full evidence trail: prod deployment ID, Sentry event details, Vercel runtime logs, file:line references for both throw sites, and the timeline.Phase 3 code fix (this PR)
app/api/auth/me/route.tskey_last4nullish-falls-back whenuser.apiKeyis missing;emailfalls back to the session email;Sentry.captureMessage('malformed_user_record', { level: 'warning' })on missing apiKey.app/account/page.tsxuser.apiKeyis missing; new "needs attention" banner withmailto:contact@freightutils.com; usage lookups gated onapiKeypresence;Sentry.captureMessage('malformed_user_record', { level: 'warning' })on either missing field.docs/incidents/2026-05-21-prod-homepage-500.md0a945b7).docs/FAULT-HISTORY-AND-PREVENTION.mdSTATE.mdCHANGELOG.md+lib/changelog-data.tsTwo surfaces stop 500'ing on malformed records. The Sentry capture means the next dashboard edit that drops a field pages Soap before the customer notices (assuming the alert rule below is wired).
Soap's checklist before merge
1. Sytze KV record repair — Chrome (cannot do from this sandbox)
The dashboard edit dropped 3 fields from
user:spmosselaar@gmail.com. Need to restore them. TheapiKeyvalue is recoverable from the mirror key — everykey:fu_live_*whose JSON value contains"email": "spmosselaar@gmail.com"has thatapiKeyas its key-name suffix.Target post-restore JSON (preserve every existing field byte-identical, only overwrite the three dropped ones):
{ "email": "spmosselaar@gmail.com", "plan": "pro", "apiKey": "fu_live_<recover from mirror key name>", "stripeCustomerId": "cus_UYLSdNQnCwt5Tf", "createdAt": "2026-05-20T18:21:00.000Z" }createdAtis approximated from the Stripe customer'screatedtimestamp; this field is display-only on/accountand doesn't gate any logic, so a sensible approximation is fine.The rule going forward (and PLEASE pass to any future Chrome dashboard-edit agent): start by copy-pasting the current JSON into a scratchpad VERBATIM, modify only the targeted fields, paste the full result back. The Upstash UX is a string replace, not a merge.
2. Sentry alert rule (one-time, manual)
Sentry → Alerts → Create Alert Rule → Issue Alert:
message:"malformed_user_record"ANDlevel:warningThis is the early-warning for the next FAULT 17 incident. Without it the only signal was Sytze's email — the 3 Sentry events sat unread in the issue list for ~25 minutes.
3. PR #48 (FAULT 16 fix) interaction
PR #48 is still open and untouched in this fix. Both PRs touch
STATE.md,CHANGELOG.md,lib/changelog-data.ts, anddocs/FAULT-HISTORY-AND-PREVENTION.md. Whichever PR merges second will have a small mechanical conflict on those files (both PRs append; resolution is "keep both"). The PR #48 runbookdocs/runbooks/customer-tier-sync.mdshould be amended post-merge with:4. Smoke verification post-merge
curl https://www.freightutils.com/→ 200 (anonymous, already verified via Vercel proxy fetch)curl https://www.freightutils.com/api/auth/whoami -H "X-API-Key: <Sytze's key>"→tier: "pro"/accountno longer 500s — renders the "needs attention" banner with the support email; Sytze can verify his tier is Pro on the page.Customer follow-up email (draft — copy/paste)
FAULT 5 checklist
Most items N/A — no new pages, no new endpoints, no new MCP tools, no displayed-number changes. Items that apply:
CHANGELOG.mdentry added — 2026-05-21/changelogpage (lib/changelog-data.ts) renders the new entry0a945b7)npm run buildpasses with zero errorsdocs/FAULT-HISTORY-AND-PREVENTION.mdsiteStats.ts,app/sitemap.ts,public/openapi.json,/api-docspage, nav dropdown, homepage tool grid, MCP registration, footer links,freightutils-mcpREADME, npm bump, Postman, tool-page word count,withAuditRest(no new routes —/api/auth/mealready excluded),generateMetadata(no new pages),indexnow-submit(no new URLs)Test plan
npx tsc --noEmitclean.npm run lint:auditpasses.npm run lint:seo-titlespasses.npm run buildsucceeds.0a945b7then58553d0).target: "production"lookup; release SHA unchanged through incident)./accountwith malformed-record fixture, then with valid record)./api/auth/whoamireturnstier: "pro"for Sytze post-restoration.malformed_user_recordcreated.docs/runbooks/customer-tier-sync.md) amended with the dashboard-edit-rule paragraph.https://claude.ai/code/session_019A4f9SxA6vyzdoC67JLmTZ
Generated by Claude Code