How Brain Platform protects your data, the auth modes, the voucher gate, and the zero-error iteration loop we run before any release.
Brain Platform is a multi-tenant knowledge substrate. The data that must not leak:
- Raw session events — include full coding prompts that may carry pasted secrets, internal URLs, customer data.
- Extracted Knowledge rows — include business rules, internal framework choices, personnel-specific preferences.
- MCP tokens — grant full read/write access to a user's Brain.
- Audit log — by design contains what admins did, when, and to whom. Leaking it leaks privileged workflow.
The adversaries we defend against:
- An anonymous internet visitor reaching the webapp URL (highest likelihood).
- An authenticated user of tenant A reading tenant B's data (scope leakage).
- A compromised LLM provider key (rotate, rate-limit, cap cost).
- A disgruntled former admin (revocation must be immediate; audit of admin actions is append-only).
Out of scope (for now): nation-state APT, physical VM compromise, side-channel attacks on embedding inference.
The v0.11.2 audit sweep (catalog #103, closed 2026-05-07) materially improved the posture in five places. If you read older copies of this doc, here's what changed:
- MCP HTTP session is bound to the bootstrap token (audit C1, PR #124). Previously a leaked
Mcp-Session-Idplus any valid Bearer = act-as-victim. Now the transporttimingSafeEquals the request's Bearer to the session's stored token; mismatch returns401 -32001 "Session-token mismatch". Session UUIDs are also redacted frommcp.session.open/closelog lines (W4). - Voucher codes are CSPRNG-generated (audit C2, PR #116). Previously
Math.random(); nownode:crypto.randomInt. V8 PRNG state recovery from observed codes no longer applies. - Cross-user IDOR cluster closed (audit C3-C6, PR #117). All MCP tools that take a caller-supplied
sessionId/projectId(brain_log_event,brain_report_session_outcome,brain_teach_knowledge,brain_start_session) and the webPOST /api/knowledgevalidate ownership againstauth.userIdanduserCanAccessProject(...)before mutating. Knowledge counter bumps inbrain_report_session_outcomeare scoped toownerUserIdso a foreign Knowledge ID is silently skipped instead of flipping its counter. - Credentials sign-in timing flattened (audits W1+W2, PR #137). Both the no-
UserCredentialand no-Userbranches now run a dummy bcrypt before returning null. Email enumeration via response-time delta no longer works. - Token-route audit writes are awaited (audit W6, PR #142).
token.create/token.revokeno longer usevoid writeAudit(...)— a process restart between the response and the async insert can no longer drop the audit row.
The full audit catalog with closed/open status lives in #103 (BrainPlatform, private).
Three mutually-exclusive modes, chosen by env config:
| Mode | When | Required env | Who can reach the app |
|---|---|---|---|
| OAuth (pilot / production) | Always set for public deployments | AUTH_GITHUB_ID, AUTH_GITHUB_SECRET, AUTH_SECRET |
Only users who have successfully signed in via GitHub + hold a valid JWT. New signups require a voucher code by default. |
| Dev shim (local dev only) | Single-tenant demo, no OAuth app yet | ALLOW_DEV_AUTH=true |
Whoever can reach the server — everyone shares one User row. Refused in NODE_ENV=production unless ALLOW_DEV_AUTH_IN_PRODUCTION=true is also set. |
| Unconfigured (default) | Neither of the above | (none) | Nobody. Every request returns 503 auth_not_configured. A freshly-deployed VM with neither AUTH_* nor ALLOW_DEV_AUTH set is locked shut until the operator picks a mode. This is the secure-by-default posture. |
Before 2026-04-23, an unconfigured deployment silently fell through to the dev shim and served everyone as the first User row in the DB. That meant any scanning IP could read the seeded Alex persona's Knowledge. The fix (auth.ts::getCurrentUserId throwing auth_not_configured) closes that hole at the default setting, so the operator's mistake is "the app doesn't work" rather than "the app works for anyone who finds it."
ADMIN_EMAILS is a comma-separated env var (one entry or many). On each sign-in, if the signing-in user's email matches (case-insensitive), their User.role is set to admin — either on creation (new user) or on update (existing non-admin). The list is read at sign-in only; removing an email from ADMIN_EMAILS does NOT demote the user.
/admin/users provides per-user Promote to admin / Demote buttons backed by PATCH /api/admin/users/[id]/role. Every change writes an admin.role_change audit row with {email, from, to, selfChange} payload. A soft guard refuses to demote the last remaining admin (409 last_admin_cannot_be_demoted) so the deployment never becomes UI-unrecoverable — the operator must promote someone else first. ADMIN_EMAILS remains the chicken-and-egg bootstrap for a fresh deployment.
Admins listed in ADMIN_EMAILS bypass the voucher gate for their own first sign-in. This is the chicken-and-egg escape hatch — the first admin can always bootstrap themselves without needing another admin to hand them a voucher.
refuseDevShimInProduction()refuses the dev shim whenNODE_ENV=productionunlessALLOW_DEV_AUTH_IN_PRODUCTION=true.deploy/docker-compose.prod.ymldoes not setALLOW_DEV_AUTH— forces OAuth mode.- NextAuth's
/api/auth/*route is exempt from the in-process rate limiter (proxy.ts) so the OAuth callback can't 429 on a cold deploy.
Two tables: VoucherCode and VoucherRedemption.
VoucherCode
id cuid
code unique TEXT (uppercase, normalized)
kind "personal" | "organization"
organizationLabel TEXT? -- display name for org codes
maxUses INT (>=1)
usedCount INT
expiresAt TIMESTAMP?
disabled BOOLEAN
note TEXT? -- admin-facing free text
createdByUserId User.id?
createdAt/updatedAt
redemptions VoucherRedemption[]
VoucherRedemption
id cuid
voucherId -> VoucherCode
userId -> User (unique — one redemption per user, ever)
redeemedAt TIMESTAMP
- New user lands on
/signin(set asREGISTRATION_REQUIRES_VOUCHER=truedefault). - User types their voucher code, clicks "Continue with GitHub".
- Server action sets a 10-minute httpOnly cookie
bp_voucher=<CODE>, then initiates OAuth. - GitHub returns to
/api/auth/callback/githubwith a verified email. - NextAuth
signIncallback runs:- If the email already has a User row → refresh profile, maybe apply admin promotion, continue.
- If new user AND
ADMIN_EMAILScontains the email → create withrole='admin', bypass voucher. - If new user AND no
bp_vouchercookie → redirect/signin?error=voucher_required. - If new user AND voucher cookie → call
claimVoucher({code, email, name, image}).
claimVoucher()runs a Postgres transaction withSELECT ... FOR UPDATEon the voucher row, checksdisabled / expiresAt / usedCount < maxUses, creates the User row, incrementsusedCount, writes theVoucherRedemptionrow. All four steps commit atomically, so two concurrent claims on the last seat of a multi-use code cannot both succeed.- On failure, redirects
/signin?error=voucher_<reason>with one of:invalid | disabled | expired | exhausted.
- Personal codes default to
maxUses=1. Meant for a single pilot invitee. - Organization codes carry
organizationLabel(e.g. "Acme Inc.") and typicallymaxUses=5..50. Every redemption is its own User row; the label is purely for admin-facing grouping. (A full teams surface is a later phase.)
expiresAtis any future timestamp, ornullfor "never expires".- Admin UI exposes a "days from today" input (0 = never) that converts to the absolute timestamp.
- Expired codes validate as
expiredand refuse new redemptions, but existing redemptions keep working — an expired code does not sign users out.
All through /api/admin/vouchers/* (role-gated by requireAdmin()) or the /admin/vouchers page:
- Create — custom or auto-generated code, specify kind / label / maxUses / TTL / note.
- List — with status chips: active / disabled / expired / exhausted.
- Toggle disabled — reversible pause without deleting.
- Update
maxUses— refused if below currentusedCount(400maxUses_below_usedCount). - Delete — cascades to redemptions. Audit log retains the action record by contract.
Every mutation writes an AuditLog row via writeAudit() with action: "voucher.create|update|delete". Recursive secret-redaction applies.
checkVoucherRateLimit(clientIp) in apps/web/lib/brain/vouchers.ts gates the /signin server action. Each X-Forwarded-For source IP gets 10 voucher submissions per rolling hour; over the cap returns /signin?error=voucher_rate_limited. Backed by the same async Store interface as proxy.ts (Redis in production, in-memory otherwise), so a multi-replica deployment shares the counter — an attacker hopping replicas cannot reset their window.
The 8-char alphanumeric code space (~10^10 per prefix) plus the per-IP limit means a realistic brute-force needs >10^9 hours per IP. Honest mistypes still get 10 tries per hour before being asked to wait or contact an admin.
Gated by requireAdmin() at the layout + API layer. Non-admin users get redirected to / — the page doesn't render a 403 because "admin page exists here" is itself a leak.
| Route | Purpose |
|---|---|
/admin |
Overview — user count, active vouchers, redemptions, Knowledge/session counts, audit entries, Oracle spend |
/admin/vouchers |
Issue + manage invite codes |
/admin/users |
Roster: email, role, knowledge/session counts, join date |
/admin/audit |
200 most recent audit rows: when / actor / action / target / IP |
Future items (tracked in KNOWN_ISSUES): per-user role change UI, GDPR erase UI (endpoint exists; button doesn't yet), teams + tenant management.
scripts/verify-lockdown.sh runs the full network-level audit against a live stack. It reads the auth mode from .env and expects different behaviour for each:
# Against the local dev stack:
./scripts/verify-lockdown.sh
# Against a remote/public deployment:
BASE_URL=https://brain.example.com MCP_URL=https://mcp.brain.example.com \
./scripts/verify-lockdown.shExit codes: 0 locked correctly / 1 LEAK — fix before release / 2 stack unreachable.
Both scripts/deploy.sh and scripts/deploy-prod.sh now run it automatically at the end. deploy-prod.sh refuses the deploy if the audit fails. deploy.sh warns but proceeds (dev usage is allowed to be in dev-shim mode).
The script also enforces the one-way combination rule that caused the original VM leak: ALLOW_DEV_AUTH=true together with OAuth envs is a configuration error, and deploy-prod.sh dies on that combination before bringing anything up.
The phase-1 pilot uses username + bcrypt-hashed password stored in .env — no OAuth App required. The flow:
- Generate the password hash:
# Prepend a space to keep the plaintext out of shell history # (requires HISTCONTROL=ignorespace in your shell). pnpm hash-admin-password 'PickASensibleStrongPassword' # outputs: $2b$12$...
- Paste into
.env.local(single-quoted — bcrypt hashes contain$that bash would expand):ADMIN_USERNAME="admin" ADMIN_PASSWORD_HASH='$2b$12$...' ADMIN_EMAIL="admin@brain-platform.local" # optional; synthesized otherwise ALLOW_DEV_AUTH="false"
- Restart so the container picks up the envs:
docker compose -f deploy/docker-compose.yml --env-file .env restart web mcp-server
- Verify the gate is live:
curl -s -o /dev/null -w "%{http_code}\n" http://<host>:3000/api/me # → 401 means Credentials mode is correctly enforcing auth # → 200 means dev-shim is still active somewhere; see troubleshooting below ./scripts/verify-lockdown.sh # should report Mode: CREDENTIALS
Security notes.
.env.localmust be mode 600. Anyone with read access to the hash can brute-force it offline at ~200 ms/guess; anyone with write access owns the admin account.stat -c '%a' .env.localmust show600; if it shows644or worse,chmod 600 .env.local. The.envsymlink pattern is documented indocs/RUNBOOK.md—chmodfollows the symlink sochmod 600 .envdoes the right thing.- Bcrypt cost 12 = ~200 ms/guess, which caps offline brute-force rate at roughly 5 guesses/sec per CPU core. A 12-character random password takes trillions of years at that rate; a dictionary word takes minutes. The helper script refuses passwords shorter than 12 chars.
- Rotation. To rotate: generate a new hash, replace
ADMIN_PASSWORD_HASHin.env.local,docker compose restart web mcp-server. Existing sessions survive (the JWT cookie is signed byAUTH_SECRET, not the password hash), so rotating the password doesn't kick the operator out — rotateAUTH_SECRETto invalidate existing sessions. - What credentials-mode does NOT give you: per-user accounts, voucher-gated signup, team scopes, user-visible audit of who-signed-in-when (the admin account always shows as
admin@...in the audit log). Those are the reasons OAuth mode exists for a real pilot with multiple users — credentials mode is the "one operator on one box" posture.
Observed before the phase-1 credentials pivot: .env.local had AUTH_GITHUB_ID="" and AUTH_GITHUB_SECRET="" (empty-string values), with ALLOW_DEV_AUTH="true" also set. Because authConfigured() in apps/web/auth.ts checks !!process.env.AUTH_GITHUB_ID (falsy on empty string), the server took the dev-shim path and served the first User row to every anonymous caller — while the operator thought they were in OAuth mode because the key names were present in the file.
Symptom: "after sign out I can still see the app." Because there's no real session in dev-shim mode, there's nothing to sign out from — the app kept serving Alex to every visitor.
Diagnose:
docker compose -f deploy/docker-compose.yml --env-file .env exec -T web printenv \
| grep -E '^(ADMIN_USERNAME|ADMIN_PASSWORD_HASH|AUTH_GITHUB_ID|AUTH_GITHUB_SECRET|AUTH_SECRET|ALLOW_DEV_AUTH)=' \
| awk -F= '{if ($1 ~ /SECRET|ID|HASH/ && length($2)>0) print $1"=<set>"; else print $0}'
curl -s -o /dev/null -w "/api/me unauth: HTTP %{http_code}\n" http://localhost:3000/api/meThis trap is closed for phase-1 by defaulting to Credentials mode — operators populate ADMIN_USERNAME + ADMIN_PASSWORD_HASH at setup, set ALLOW_DEV_AUTH=false, and the empty-GitHub-envs case stops being a live hazard. When OAuth is re-enabled later, the trap's boot-time refusal (AUTH_GITHUB_ID="" + ALLOW_DEV_AUTH=true → explicit error, don't silently pick dev-shim) should still be added — tracked in KNOWN_ISSUES.
Before marking any security change as "done," run through this checklist. Every step produces an artifact so the result is reproducible — no "I ran it mentally."
-
Typecheck the whole workspace.
pnpm turbo run typecheck
Must end with
N successful, N total. Zero errors. -
Unit tests.
pnpm --filter @brain/core testAll green (71 tests as of 2026-04-23).
-
Auth-guard audit on every API route. Every file under
apps/web/app/api/**/route.tsmust either:- Call
getCurrentUserId()orrequireAdmin()before any DB read, OR - Be a NextAuth handler (
/api/auth/[...nextauth]), OR - Be an explicit public probe (
/api/healthz,/api/readyz).
One-liner to catch new routes that forget:
find apps/web/app/api -name route.ts | while read f; do if ! grep -qE "getCurrentUserId|requireAdmin|handlers|authErrorResponse" "$f" \ && ! echo "$f" | grep -qE "(healthz|readyz|auth/\[)"; then echo "NEEDS AUTH REVIEW: $f" fi done
Expected output: empty.
- Call
-
Unauth lockdown probe. With
ALLOW_DEV_AUTH=falseand no OAuth envs, the app must serve 503 on/api/*and 307→/signin on/:curl -o /dev/null -w "%{http_code}\n" http://localhost:3000/ curl -o /dev/null -w "%{http_code}\n" http://localhost:3000/api/knowledge
Expected:
307then503. -
E2E full suite.
E2E_BASE_URL=http://localhost:3000 pnpm --filter @brain/web e2e
Expected: zero failures. Skips are OK if every skip has a
test.skip(true, "<reason>")string. -
Security spec specifically.
E2E_BASE_URL=http://localhost:3000 pnpm --filter @brain/web e2e e2e/security.spec.ts
All green.
-
MCP bearer token fail-closed. Hit
:3100/mcpwith no Authorization header and with a bogus one; both must be ≥ 400. -
Audit log append-only spot-check. Any new admin action introduced in the change must write an
AuditLogrow. Open/admin/auditafter exercising the new surface; the action should appear.
Or, in one command that runs steps 3, 4, 7 end-to-end against the running stack:
./scripts/verify-lockdown.shIterate — any red step fails the release gate. Re-run the whole list after every fix, not just the one that failed; auth changes have a nasty habit of regressing unrelated paths.
NextAuth v5's generic error for three distinct misconfigurations:
| Cause | Fix |
|---|---|
AUTH_SECRET is unset or empty |
openssl rand -base64 32 → set in .env → ./scripts/reload.sh web |
AUTH_TRUST_HOST unset on a non-localhost deployment |
Set AUTH_TRUST_HOST="true" in .env. auth.ts now also sets trustHost: true in code as belt-and-braces, but the env var is what NextAuth reads at module load. |
AUTH_URL mismatch |
Must exactly match the URL the user types: protocol, host, port, no trailing slash. http://178.104.238.220:3000 ≠ http://178.104.238.220:3000/ ≠ https://.... |
A quick diagnostic:
docker compose -f deploy/docker-compose.yml --env-file .env exec -T web env \
| grep -E "^AUTH_(SECRET|URL|TRUST_HOST|GITHUB_)" | sortAll four (AUTH_SECRET, AUTH_URL, AUTH_TRUST_HOST, AUTH_GITHUB_ID) should be populated in OAuth mode. The fix after editing .env is always ./scripts/reload.sh web (force-recreate — a plain restart keeps the old env).
Resolved 2026-04-23. The old user menu posted to /api/auth/signout without a CSRF token; NextAuth v5 refuses that and returns the generic server-config error. Now the user menu links to /signout, which renders a server-action form that calls signOut() via NextAuth's own action (CSRF handled internally).
If you still see the error, it's the AUTH_TRUST_HOST / AUTH_URL issue above — the /signout page renders, but the underlying signOut() server action still needs a trusted host.
Resolved 2026-04-23. NextAuth v5's signOut({ redirectTo: "..." }) in a server action can redirect to the literal string "/undefined" when a callbackUrl form field is absent — the redirect-construction code path reads the form value and stringifies undefined into the URL. This produced the 404 reported on the pilot VM.
Fix: /signout now uses a two-step pattern: signOut({ redirect: false }) clears the session cookie without redirecting, then redirect("/signin") from next/navigation does the redirect explicitly to a path we control. Regression test: apps/web/e2e/signout.spec.ts asserts no response in the signout chain redirects to /undefined.
auth.ts hardcodes trustHost: true in the NextAuth config. That tells NextAuth to trust the incoming Host header without verifying it against AUTH_URL. Correct for 99% of deployments — behind Caddy (deploy/PRODUCTION.md) or Cloudflare, the reverse proxy sets Host reliably, and requiring an exact AUTH_URL match instead blocks legitimate callback paths when the proxy strips or rewrites headers.
When to override it to false: the Brain container is directly internet-exposed with no reverse proxy, AND the hostname/port must match AUTH_URL exactly. In that setup an attacker who can spoof the Host header could theoretically trick NextAuth's callback redirect logic. Production deployments that hit this edge case should set AUTH_TRUST_HOST="false" in the env AND pin AUTH_URL to the canonical URL — though if you're on a real internet IP without a proxy, you probably have bigger problems than the Host header.
We cannot safely override /api/auth/signout with our own handler — it's captured by NextAuth's catch-all [...nextauth] route, and shadowing it would break the underlying signout mechanism our own /signout page relies on. If a user has bookmarked the old URL, they'll still land on NextAuth's default confirmation page (which 500's in the misconfigured case but works fine once AUTH_TRUST_HOST / AUTH_SECRET are set). Tell them to use /signout instead; or put a link to it in any operator-facing dashboard.
When a user creates or changes a token at /settings/tokens, the raw bp_… value is returned by the API exactly once. The post-mint wizard renders per-client/OS install snippets from that value entirely client-side — the token is interpolated into the snippet in the browser and never sent back outbound in an HTTP request. This is an implementation detail of the operator-facing UX, not a change to the underlying threat model: the raw token is still shown once at mint and lives only in the user's clipboard or config file after that.
The "Test connection" button in the wizard calls POST /api/tokens/test with the tokenId (a non-secret DB row ID) rather than the raw token. The endpoint validates token state — active, revoked, expired, or scheduled-revoke — from the DB row and returns the result without ever receiving or logging the raw value. Audit: token.test action with { tokenId }.
POST /api/tokens/:id/change replaces the hash in-place on the same MCPToken row — same id, name, scope, expiresAt, createdAt — and takes effect immediately with zero grace. Use this for routine refresh or after a suspected leak. Audit action: token.change.
Operational guidance:
| Situation | Recommended action |
|---|---|
| Routine refresh | POST /api/tokens/:id/change — swap the secret, copy the new raw value, update client configs. |
| Suspected or confirmed leak | DELETE /api/tokens/:id — hard revoke, instant. Then create a new token. |
| No-disruption migration (keep old token alive while updating clients) | Create a new token, update clients to it, then revoke the old token manually once confirmed. |
Schema note. The scheduledRevokeAt and rotatedFromId columns remain on the MCPToken table and the auth gate in apps/mcp-server/src/auth.ts still rejects tokens where scheduledRevokeAt <= NOW(). No current path sets these columns (rotate-with-grace was removed 2026-04-27), but the gate is kept as defense-in-depth. Re-enabling rotate is a UI + endpoint change — no schema migration required.
A token may be scoped to a specific project by setting projectId on creation. When scoped:
- The auth gate surfaces
auth.projectIdin every tool call'sAuthContext. - Any MCP write that targets a different project is rejected with
BrainError{code:"FORBIDDEN_PROJECT"}. - Blast radius of a leaked CI token is limited to a single project — even if the attacker can authenticate, they cannot write Knowledge/Sessions to any other project in the org.
Scoping is opt-in and backwards-compatible. Unscoped tokens (projectId = null) behave exactly as before (any project the user has access to). The solo-user experience is unchanged — the project picker on /settings/tokens is hidden when the user has only one project.
Scope is preserved on change. POST /api/tokens/:id/change is in-place — all metadata including organizationId and projectId is unchanged.
Recommended practice for CI/CD: scope each CI token to the project it deploys. If the token is compromised, the attacker cannot pivot to other projects in the org.
| Role | Can invite | Can manage members | Can revoke invites | Max grantable role |
|---|---|---|---|---|
| owner | yes | yes | yes | owner |
| admin | yes | yes | yes | admin |
| member | no | no | no | — |
Privilege checks are enforced in packages/core/src/org.ts — not at the HTTP layer — so they apply regardless of the API caller shape.
Every organization must have at least one owner. setOrgMemberRole and removeOrgMember count existing owners before proceeding and throw BrainError{code:"LAST_OWNER", status:409} if the operation would leave the org ownerless.
- Tokens are generated with
crypto.randomBytes(32).toString("base64url")— 256 bits of entropy, not guessable. - Tokens are stored plaintext in
OrganizationInvite.token. This is intentional: invite tokens are one-shot, low-value, and short-lived (7 days). They are not bearer credentials — accepting requires an authenticated session. - Tokens are single-use (
acceptedAtis set on first acceptance; re-use returns 409). - Tokens can be revoked at any time by any org owner/admin.
- No email is sent in Phase 3a — the invite link is delivered out-of-band by the operator.
| Threat | Mitigation |
|---|---|
| Link intercepted in transit | HTTPS required in production (Caddy TLS) |
| Link forwarded to wrong person | Invitee must have a valid session to accept |
| Expired link replayed | expiresAt checked server-side |
| Revoked link used | revokedAt checked before membership creation |
| Same link accepted twice | acceptedAt check returns 409 on re-use |
The visibility column on Knowledge ("private" / "project" / "org") is a listing and retrieval filter. It controls what appears in the UI, in GET /api/knowledge, and in KRA/Oracle retrieval. It does NOT bypass the token-level project ACL.
A project-scoped MCP token (MCPToken.projectId set) still cannot write to another project regardless of what visibility values exist on Knowledge rows. The hard write-time check is in the MCP tool handlers.
Concretely:
- An "org"-visible row does not grant a token the ability to write knowledge into another project — it can only read the org row when
accessibleProjectIdsincludes that project. - Promote requires ownership (
ownerUserId === current user). Fork requires org membership. A stolen token from a project member cannot promote a row they don't own.
Phase-3b extends the single-admin Credentials path to support multiple pilot users who sign up via invite link without needing a GitHub account. The admin path is unchanged — it remains env-based (ADMIN_USERNAME + ADMIN_PASSWORD_HASH).
| Admin (env-based) | Per-user (invite-based) | |
|---|---|---|
| Storage | .env.local bcrypt hash |
UserCredential DB row per user |
| Bootstrap | Operator sets ADMIN_USERNAME + ADMIN_PASSWORD_HASH |
Invite → /api/invites/signup |
| Identifier | Username (or email if ADMIN_EMAIL set) |
Email address (from invite) |
| Password change | Replace env var + restart | POST /api/me/password |
| Reset if forgotten | Operator updates env + restart | Self-service /forgot-password (when EMAIL_PROVIDER=resend); admin deletes UserCredential row + re-invites when email is disabled |
| Auth flow in NextAuth | First check in authorize() |
Second check (email lookup + verifyUserCredential) |
- Minimum 8 characters. No special-character requirements (anti-pattern).
- Enforced by
validatePasswordPolicy()in@brain/core/user-credentials. Called before bcrypt hashing so policy violations reject early without CPU cost. - Operators who want stronger policies can add a zxcvbn check at the UI layer.
Cost 12 (same as admin path). Each verify takes ~200ms on commodity hardware — the intentional brute-force rate cap.
POST /api/me/password requires the current password before accepting a new one. Fails with:
409 NO_CREDENTIAL— user signed in via OAuth or admin env; no credential to change.401 WRONG_PASSWORD— current password mismatch (audit not written — don't log wrong guesses).400 WEAK_PASSWORD— new password fails policy.
Success writes an user.password_change audit row.
POST /api/invites/signup executes a single DB transaction:
- Create
Userrow (email from invite, lowercase). - Create
UserCredentialwith bcrypt(password, 12). - Create
OrganizationMemberwith invite's role. - Mark
OrganizationInvite.acceptedAt.
If the email already has a User row (OAuth or prior admin creation), steps 2–4 run but step 1 is skipped. The response includes existingUser: true and the client is told to sign in via existing credentials instead of treating the call as a new registration.
Email is opt-in. Set EMAIL_PROVIDER=resend + EMAIL_API_KEY + EMAIL_FROM to enable. When disabled (default), invite links are returned in the API response only (manual handoff), and password reset requires operator intervention. See docs/RUNBOOK.md §"Configuring email".
| Property | Value |
|---|---|
| Entropy | 32 random bytes (crypto.randomBytes) encoded as base64url — ~256 bits |
| Storage | Stored plaintext in PasswordResetToken.token (acceptable for short-lived tokens; same posture as invite tokens) |
| Expiry | 1 hour from creation |
| One-shot | usedAt is set on first use; re-use returns invalid_token |
| Cascade-delete | onDelete: Cascade on userId — user deletion removes all outstanding tokens |
POST /api/auth/forgot-password ALWAYS returns HTTP 200 with a generic message regardless of whether the email exists, whether the user has a credential, or whether email delivery succeeded. This prevents user enumeration via timing or response differences.
POST /api/auth/forgot-password is rate-limited to 3 requests/hour per IP via the same check() + Store infrastructure as the Oracle and voucher flows. Redis-backed in multi-replica deployments; in-memory fallback for single-host.
| Threat | Mitigation |
|---|---|
| Link intercepted in transit | HTTPS required in production (Caddy TLS) |
| Link forwarded to wrong person | Token expires in 1 hour; usedAt prevents re-use |
| Exhaustive token search | 256-bit token space; rate-limit on forgot-password generation |
| Account takeover via enumeration | Always-200 response + generic message; no timing leak |
| Token left in DB after user deletion | ON DELETE CASCADE cleans up |
deploy/rclone.conf holds the storage-provider credentials used by the
backup-replicate sidecar. It is gitignored and must be created on every fresh
VM deployment (use scripts/setup-backup-replicate.sh).
Key hygiene rules:
- File mode 600.
stat -c '%a' deploy/rclone.confmust show600. Anyone with read access to the file can read your storage bucket or, depending on the key's permissions, delete objects.chmod 600 deploy/rclone.confafter every write. - Minimum permissions. Create a per-deployment access key with the narrowest
scope your provider allows:
PutObject,GetObject, andListBucketon the backup prefix only (e.g.my-bucket/brain-prod/*). Never use root/admin keys. - Rotate via the provider's console. Revoke the old key in the provider
dashboard, update
deploy/rclone.conf, then restart the sidecar:docker compose ... restart backup-replicate. The new key takes effect immediately — no service restart for the database or webapp is required. - Back up the config file itself. Because
rclone.confis gitignored, store a copy in your password manager (1Password, Bitwarden, etc.) alongside the VM's other secrets. Losing the file means re-runningscripts/setup-backup-replicate.shwith a fresh access key.
- No self-service admin role UI. Admin promotion is via
ADMIN_EMAILSenv; demotion is SQL. Tracked in KNOWN_ISSUES. - Vouchers are not scoped to emails. A voucher can be redeemed by any email that knows the code. An email-pinned voucher (
emailAllowlist String[]) is a natural next step; current model supports it with a schema addition. - No rate-limit on
/api/auth/callback/*. NextAuth endpoints are exempt from the proxy limiter. If this becomes an abuse vector, add a separate per-IP limiter inside the callback. - No CAPTCHA on voucher entry. A brute-forcer that enumerates short codes is plausible. Mitigated by the 8-char alphanumeric space (≥ 10^10 possibilities per prefix) but a rate-limit on voucher-validate attempts is cheap defense.
- No session revocation for rotating compromised JWTs. JWT expiry is the only mechanism today; a compromised token works until it expires. A
revokedAtfield on a Session table is the retrofit path.
apps/web/auth.ts— NextAuth config + signIn callback.apps/web/lib/brain/auth.ts— per-requestgetCurrentUserId()and the 503 path.apps/web/lib/brain/vouchers.ts—validateVoucher+claimVoucher(transactional).apps/web/lib/brain/admin-auth.ts—requireAdmin().packages/db/prisma/migrations/20260423_vouchers/— schema.packages/core/src/audit.ts— redaction rules + Action union.docs/RUNBOOK.md— operational recovery.docs/KNOWN_ISSUES.md— outstanding security gaps.