feat: relayauth core platform — Domains 1-7 (Foundation → SDK)#2
feat: relayauth core platform — Domains 1-7 (Foundation → SDK)#2khaliqgant merged 76 commits intomainfrom
Conversation
…gnore - worker.ts: Add global CORS and request-ID middleware, auth placeholder - verify.test.ts: Expand constructor tests, add TODO roadmap for verification - .gitignore: Add Python build artifact patterns Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extracted from autofix swarm workflow results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace broken TokenVerifier in scope middleware with inline HS256 verification - Add safeRegexTest to prevent ReDoS in policy evaluation engine - Add org isolation checks to all identity endpoints (GET/PATCH/DELETE/suspend/retire/reactivate) - Fix fail-open auth bypass in audit-webhooks (flip to fail-closed with 403) - Re-enable PyJWT built-in claim verification with leeway in Python SDK - Extract shared auth module to lib/auth.ts, deduplicate across routes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical: fail-closed org boundary checks (audit-webhooks), cross-org IDOR guards (identities), ReDoS mitigation with pattern/input limits (policy-eval), scope inheritance cross-tenant fix, mass assignment lockdown, Python JWT leeway, scope info leakage removal. High: deduplicated scope matching into shared export, SQL-level pagination, scope-based authorization on identity routes. Medium: CSV injection prevention, SSRF hardening (IPv4-mapped IPv6, octal/ decimal IPs, cloud metadata), audit failure counters with sensitive-op enforcement, Go EdDSA support, SDK verify test suite (24 tests), Python scopes aligned to return False. Low: webhook schema cleanup, no-op hydrateIdentity removed, error message sanitization. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
packages/sdk/src/verify.ts
Outdated
| if (claims.nbf !== undefined && claims.nbf > now) { | ||
| throw invalidTokenError(); | ||
| } | ||
|
|
||
| if (claims.exp <= now) { | ||
| throw new TokenExpiredError(); |
There was a problem hiding this comment.
🔴 Go middleware lacks clock skew leeway, causing cross-SDK token rejection
The Go middleware in relayauth.go validates nbf and exp claims with zero clock skew tolerance (lines 249-253), while both the TypeScript SDK (packages/sdk/src/verify.ts:205-209) and the Python SDK (packages/python-sdk/relayauth/verifier.py:240-243) use a 30-second leeway. This means a token that is valid according to the TypeScript/Python SDKs can be rejected by the Go middleware when server clocks differ by even 1 second. For nbf, Go uses *claims.Nbf > now (exact) vs TypeScript's claims.nbf > now + 30. For exp, Go uses claims.Exp <= now (exact) vs TypeScript's claims.exp <= now - 30. In a distributed system where relayauth tokens are verified by different services using different SDKs, this inconsistency will cause intermittent 401 errors.
Was this helpful? React with 👍 or 👎 to provide feedback.
1. scope-inheritance: fix function signature mismatch (3-param vs 2-param) - resolveInheritedScopes and getInheritanceChain now accept (db, identityId) - SQL query updated to look up identity by ID alone - Fixes runtime crash (identityId.trim() on undefined) 2. audit-webhook-dispatcher: reduce retry delays for CF Workers - Changed from [10s, 60s, 300s] to [500ms, 1s, 2s] - Previous delays would exceed Worker CPU time budget 3. verify.ts: add 30s leeway to match Python SDK - nbf and exp checks now use 30s clock-skew tolerance - Consistent behavior across TypeScript and Python SDKs 4. audit-webhooks: replace global mutable tableInitialized with WeakSet - Per-db instance tracking via WeakSet<D1Database> - Handles isolate recycling correctly in CF Workers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| "token.issued", | ||
| "token.refreshed", | ||
| "token.revoked", | ||
| "token.validated", | ||
| "identity.created", | ||
| "identity.updated", | ||
| "identity.suspended", | ||
| "identity.retired", | ||
| "scope.checked", | ||
| "scope.denied", | ||
| "role.assigned", | ||
| "role.removed", | ||
| "policy.created", | ||
| "policy.updated", | ||
| "policy.deleted", | ||
| "key.rotated", | ||
| ]); | ||
|
|
There was a problem hiding this comment.
🔴 Audit query API rejects extended audit actions that are actively written to the database
The AUDIT_ACTIONS set in audit-query.ts only includes the base AuditAction type values and is missing "budget.exceeded", "budget.alert", and "scope.escalation_denied". However, audit-logger.ts:46-64 and policy-evaluation.ts actively write entries with these extended actions to the audit_logs table. When a user tries to filter audit queries by these actions (e.g., GET /v1/audit?action=budget.exceeded), the parseAuditQuery function at line 124 returns a 400 error "invalid action". This also affects the audit export endpoint (audit-export.ts) which reuses the same parseAuditQuery function. As a result, budget breach events and scope escalation denial events are logged but cannot be queried or exported by action filter.
| "token.issued", | |
| "token.refreshed", | |
| "token.revoked", | |
| "token.validated", | |
| "identity.created", | |
| "identity.updated", | |
| "identity.suspended", | |
| "identity.retired", | |
| "scope.checked", | |
| "scope.denied", | |
| "role.assigned", | |
| "role.removed", | |
| "policy.created", | |
| "policy.updated", | |
| "policy.deleted", | |
| "key.rotated", | |
| ]); | |
| export const AUDIT_ACTIONS = new Set<AuditAction | 'budget.exceeded' | 'budget.alert' | 'scope.escalation_denied'>([ | |
| "token.issued", | |
| "token.refreshed", | |
| "token.revoked", | |
| "token.validated", | |
| "identity.created", | |
| "identity.updated", | |
| "identity.suspended", | |
| "identity.retired", | |
| "scope.checked", | |
| "scope.denied", | |
| "role.assigned", | |
| "role.removed", | |
| "policy.created", | |
| "policy.updated", | |
| "policy.deleted", | |
| "key.rotated", | |
| "budget.exceeded", | |
| "budget.alert", | |
| "scope.escalation_denied", | |
| ]); |
Was this helpful? React with 👍 or 👎 to provide feedback.
The Go middleware validated nbf and exp claims with zero tolerance, while TypeScript and Python SDKs use a 30-second leeway. This adds a matching clockSkewLeeway constant and applies it to both checks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
budget.exceeded, budget.alert, and scope.escalation_denied are actively written by audit-logger.ts and policy-evaluation.ts but were missing from the query validation set, making them impossible to filter on. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the initializedDbs WeakSet guard from ensureAuditWebhookTable. The DDL already uses CREATE TABLE/INDEX IF NOT EXISTS, making it idempotent and safe to run every time without a mutable flag, which is fragile in Cloudflare Workers environments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| if ( | ||
| this.options?.audience && | ||
| !this.options.audience.some((audience) => claims.aud.includes(audience)) | ||
| ) { | ||
| throw invalidTokenError(); |
There was a problem hiding this comment.
🔴 TokenVerifier rejects all tokens when audience is set to an empty array
In the TypeScript SDK TokenVerifier._validateClaims, the audience check at line 210 uses this.options?.audience && .... In JavaScript, an empty array [] is truthy, so new TokenVerifier({ audience: [] }) will always throw invalidTokenError() since [].some(...) returns false. This is a behavioral trap: consumers expect an empty audience to mean "skip audience validation" (which is what the Python SDK does, since [] is falsy in Python). One test (packages/sdk/src/__tests__/verify.test.ts:113) even validates audience: [] is accepted, but the verify path would reject every token.
Comparison with Python SDK
Python verifier at packages/python-sdk/relayauth/verifier.py:247:
if self.options.audience and not any(aud in claims.aud for aud in self.options.audience):In Python, [] is falsy, so empty audience skips validation — the correct behavior.
| if ( | |
| this.options?.audience && | |
| !this.options.audience.some((audience) => claims.aud.includes(audience)) | |
| ) { | |
| throw invalidTokenError(); | |
| if ( | |
| this.options?.audience && | |
| this.options.audience.length > 0 && | |
| !this.options.audience.some((audience) => claims.aud.includes(audience)) | |
| ) { | |
| throw invalidTokenError(); | |
| } |
Was this helpful? React with 👍 or 👎 to provide feedback.
| if (this.options?.maxAge !== undefined && claims.iat + this.options.maxAge < now) { | ||
| throw new TokenExpiredError(); |
There was a problem hiding this comment.
🟡 maxAge validation inconsistently applies clock-skew leeway across SDKs
The TypeScript SDK _validateClaims at line 217 checks claims.iat + this.options.maxAge < now without applying the 30-second leeway used for exp and nbf checks. This means a token issued exactly maxAge seconds ago will be rejected due to clock skew, even though exp and nbf both tolerate 30 seconds of drift. The Python SDK at packages/python-sdk/relayauth/verifier.py:249 applies leeway (claims.iat + self.options.max_age < now - leeway), making the two SDKs behave differently for the same token under the same clock conditions. Since maxAge is an expiry-like check, it should be consistent with the exp leeway.
| if (this.options?.maxAge !== undefined && claims.iat + this.options.maxAge < now) { | |
| throw new TokenExpiredError(); | |
| if (this.options?.maxAge !== undefined && claims.iat + this.options.maxAge < now - leeway) { | |
| throw new TokenExpiredError(); |
Was this helpful? React with 👍 or 👎 to provide feedback.
Relayauth Core Platform
Complete implementation of the relayauth identity and authorization plane — from project scaffold through SDK clients.
Stats
Domains Completed
✅ Domain 1: Foundation (WF007-010)
Error catalog, test helpers, dev environment (
wrangler dev), contract tests✅ Domain 2: Token System (WF011-020)
JWT signing (Ed25519), JWKS endpoint, token verification, issuance API, refresh, revocation (KV-backed), introspection, key rotation, E2E tests
✅ Domain 3: Identity Lifecycle (WF021-030)
Identity Durable Object, full CRUD (create/get/list/update), suspend, reactivate, retire, delete, lifecycle E2E tests
✅ Domain 4: Scopes & RBAC (WF031-040)
Scope parser + matcher, scope checker SDK, scope middleware, role CRUD, role assignment, policy CRUD, policy evaluation, scope inheritance, RBAC E2E tests
✅ Domain 5: API Routes (WF041-050)
Auth middleware, org CRUD, workspace CRUD, workspace membership, API key management, admin routes, rate limiting, error handling, CORS/headers, routes E2E tests
✅ Domain 6: Audit & Observability (WF051-058)
Audit logger, audit query API, audit export, retention policies, webhooks, identity activity API, dashboard stats API, audit E2E tests
✅ Domain 7: SDK & Verification (WF059-068)
SDK client (identities, tokens, roles, audit), complete verification module, Hono middleware, Express middleware, Go middleware, Python SDK, SDK E2E tests
Architecture
{plane}:{resource}:{action}:{path?}relayfile:fs:write:/src/*)Try It Locally
What's Next (PR #3: Domains 8-12)
relay auth init,relay wrap, shell hook)