Skip to content

feat: relayauth core platform — Domains 1-7 (Foundation → SDK)#2

Merged
khaliqgant merged 76 commits intomainfrom
domain/auto-workflows
Mar 25, 2026
Merged

feat: relayauth core platform — Domains 1-7 (Foundation → SDK)#2
khaliqgant merged 76 commits intomainfrom
domain/auto-workflows

Conversation

@khaliqgant
Copy link
Member

@khaliqgant khaliqgant commented Mar 24, 2026

Relayauth Core Platform

Complete implementation of the relayauth identity and authorization plane — from project scaffold through SDK clients.

Stats

  • 132 TypeScript files, ~31,800 lines
  • 39 test files with unit + integration + E2E coverage
  • 9 API route modules (Hono on Cloudflare Workers)
  • 61 workflow commits (sequential, automated)

Domains Completed

✅ Domain 1: Foundation (WF007-010)

Error catalog, test helpers, dev environment (wrangler dev), contract tests

✅ Domain 2: Token System (WF011-020)

JWT signing (Ed25519), JWKS endpoint, token verification, issuance API, refresh, revocation (KV-backed), introspection, key rotation, E2E tests

✅ Domain 3: Identity Lifecycle (WF021-030)

Identity Durable Object, full CRUD (create/get/list/update), suspend, reactivate, retire, delete, lifecycle E2E tests

✅ Domain 4: Scopes & RBAC (WF031-040)

Scope parser + matcher, scope checker SDK, scope middleware, role CRUD, role assignment, policy CRUD, policy evaluation, scope inheritance, RBAC E2E tests

✅ Domain 5: API Routes (WF041-050)

Auth middleware, org CRUD, workspace CRUD, workspace membership, API key management, admin routes, rate limiting, error handling, CORS/headers, routes E2E tests

✅ Domain 6: Audit & Observability (WF051-058)

Audit logger, audit query API, audit export, retention policies, webhooks, identity activity API, dashboard stats API, audit E2E tests

✅ Domain 7: SDK & Verification (WF059-068)

SDK client (identities, tokens, roles, audit), complete verification module, Hono middleware, Express middleware, Go middleware, Python SDK, SDK E2E tests

Architecture

  • Runtime: Cloudflare Workers + Durable Objects + D1 + KV
  • Framework: Hono
  • Auth model: JWT (Ed25519) with scope-based authorization
  • Scope format: {plane}:{resource}:{action}:{path?}
  • Key feature: Path-scoped file access control (relayfile:fs:write:/src/*)

Try It Locally

cd ~/Projects/AgentWorkforce/relayauth
npm install
wrangler dev
# → localhost:8787

curl localhost:8787/health
curl -X POST localhost:8787/v1/identities -H 'Content-Type: application/json' -d '{"name":"test-agent","type":"agent"}'
curl localhost:8787/v1/roles
curl localhost:8787/v1/audit

What's Next (PR #3: Domains 8-12)

  • Domain 8: CLI (relay auth init, relay wrap, shell hook)
  • Domain 9: Integration (relayfile, relaycast, cloud)
  • Domain 10: Hosted server (wrangler deploy, staging/prod)
  • Domain 11: Testing & CI pipelines
  • Domain 12: Docs & landing page

@khaliqgant khaliqgant marked this pull request as ready for review March 25, 2026 10:14
@khaliqgant khaliqgant changed the title feat: relayauth implementation (Domains 1-12, WF007-100) feat: relayauth core platform — Domains 1-7 (Foundation → SDK) Mar 25, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

khaliqgant and others added 3 commits March 25, 2026 14:32
…gnore

- worker.ts: Add global CORS and request-ID middleware, auth placeholder
- verify.test.ts: Expand constructor tests, add TODO roadmap for verification
- .gitignore: Add Python build artifact patterns

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extracted from autofix swarm workflow results.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace broken TokenVerifier in scope middleware with inline HS256 verification
- Add safeRegexTest to prevent ReDoS in policy evaluation engine
- Add org isolation checks to all identity endpoints (GET/PATCH/DELETE/suspend/retire/reactivate)
- Fix fail-open auth bypass in audit-webhooks (flip to fail-closed with 403)
- Re-enable PyJWT built-in claim verification with leeway in Python SDK
- Extract shared auth module to lib/auth.ts, deduplicate across routes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

Critical: fail-closed org boundary checks (audit-webhooks), cross-org IDOR
guards (identities), ReDoS mitigation with pattern/input limits (policy-eval),
scope inheritance cross-tenant fix, mass assignment lockdown, Python JWT
leeway, scope info leakage removal.

High: deduplicated scope matching into shared export, SQL-level pagination,
scope-based authorization on identity routes.

Medium: CSV injection prevention, SSRF hardening (IPv4-mapped IPv6, octal/
decimal IPs, cloud metadata), audit failure counters with sensitive-op
enforcement, Go EdDSA support, SDK verify test suite (24 tests), Python
scopes aligned to return False.

Low: webhook schema cleanup, no-op hydrateIdentity removed, error message
sanitization.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 4 new potential issues.

View 16 additional findings in Devin Review.

Open in Devin Review

Comment on lines +197 to +202
if (claims.nbf !== undefined && claims.nbf > now) {
throw invalidTokenError();
}

if (claims.exp <= now) {
throw new TokenExpiredError();
Copy link

@devin-ai-integration devin-ai-integration bot Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Go middleware lacks clock skew leeway, causing cross-SDK token rejection

The Go middleware in relayauth.go validates nbf and exp claims with zero clock skew tolerance (lines 249-253), while both the TypeScript SDK (packages/sdk/src/verify.ts:205-209) and the Python SDK (packages/python-sdk/relayauth/verifier.py:240-243) use a 30-second leeway. This means a token that is valid according to the TypeScript/Python SDKs can be rejected by the Go middleware when server clocks differ by even 1 second. For nbf, Go uses *claims.Nbf > now (exact) vs TypeScript's claims.nbf > now + 30. For exp, Go uses claims.Exp <= now (exact) vs TypeScript's claims.exp <= now - 30. In a distributed system where relayauth tokens are verified by different services using different SDKs, this inconsistency will cause intermittent 401 errors.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration[bot]

This comment was marked as resolved.

1. scope-inheritance: fix function signature mismatch (3-param vs 2-param)
   - resolveInheritedScopes and getInheritanceChain now accept (db, identityId)
   - SQL query updated to look up identity by ID alone
   - Fixes runtime crash (identityId.trim() on undefined)

2. audit-webhook-dispatcher: reduce retry delays for CF Workers
   - Changed from [10s, 60s, 300s] to [500ms, 1s, 2s]
   - Previous delays would exceed Worker CPU time budget

3. verify.ts: add 30s leeway to match Python SDK
   - nbf and exp checks now use 30s clock-skew tolerance
   - Consistent behavior across TypeScript and Python SDKs

4. audit-webhooks: replace global mutable tableInitialized with WeakSet
   - Per-db instance tracking via WeakSet<D1Database>
   - Handles isolate recycling correctly in CF Workers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 23 additional findings in Devin Review.

Open in Devin Review

Comment on lines +61 to +78
"token.issued",
"token.refreshed",
"token.revoked",
"token.validated",
"identity.created",
"identity.updated",
"identity.suspended",
"identity.retired",
"scope.checked",
"scope.denied",
"role.assigned",
"role.removed",
"policy.created",
"policy.updated",
"policy.deleted",
"key.rotated",
]);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Audit query API rejects extended audit actions that are actively written to the database

The AUDIT_ACTIONS set in audit-query.ts only includes the base AuditAction type values and is missing "budget.exceeded", "budget.alert", and "scope.escalation_denied". However, audit-logger.ts:46-64 and policy-evaluation.ts actively write entries with these extended actions to the audit_logs table. When a user tries to filter audit queries by these actions (e.g., GET /v1/audit?action=budget.exceeded), the parseAuditQuery function at line 124 returns a 400 error "invalid action". This also affects the audit export endpoint (audit-export.ts) which reuses the same parseAuditQuery function. As a result, budget breach events and scope escalation denial events are logged but cannot be queried or exported by action filter.

Suggested change
"token.issued",
"token.refreshed",
"token.revoked",
"token.validated",
"identity.created",
"identity.updated",
"identity.suspended",
"identity.retired",
"scope.checked",
"scope.denied",
"role.assigned",
"role.removed",
"policy.created",
"policy.updated",
"policy.deleted",
"key.rotated",
]);
export const AUDIT_ACTIONS = new Set<AuditAction | 'budget.exceeded' | 'budget.alert' | 'scope.escalation_denied'>([
"token.issued",
"token.refreshed",
"token.revoked",
"token.validated",
"identity.created",
"identity.updated",
"identity.suspended",
"identity.retired",
"scope.checked",
"scope.denied",
"role.assigned",
"role.removed",
"policy.created",
"policy.updated",
"policy.deleted",
"key.rotated",
"budget.exceeded",
"budget.alert",
"scope.escalation_denied",
]);
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

khaliqgant and others added 3 commits March 25, 2026 18:55
The Go middleware validated nbf and exp claims with zero tolerance,
while TypeScript and Python SDKs use a 30-second leeway. This adds
a matching clockSkewLeeway constant and applies it to both checks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
budget.exceeded, budget.alert, and scope.escalation_denied are actively
written by audit-logger.ts and policy-evaluation.ts but were missing
from the query validation set, making them impossible to filter on.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the initializedDbs WeakSet guard from ensureAuditWebhookTable.
The DDL already uses CREATE TABLE/INDEX IF NOT EXISTS, making it
idempotent and safe to run every time without a mutable flag, which
is fragile in Cloudflare Workers environments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@khaliqgant khaliqgant merged commit 36e3053 into main Mar 25, 2026
Copy link

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 27 additional findings in Devin Review.

Open in Devin Review

Comment on lines +210 to +214
if (
this.options?.audience &&
!this.options.audience.some((audience) => claims.aud.includes(audience))
) {
throw invalidTokenError();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 TokenVerifier rejects all tokens when audience is set to an empty array

In the TypeScript SDK TokenVerifier._validateClaims, the audience check at line 210 uses this.options?.audience && .... In JavaScript, an empty array [] is truthy, so new TokenVerifier({ audience: [] }) will always throw invalidTokenError() since [].some(...) returns false. This is a behavioral trap: consumers expect an empty audience to mean "skip audience validation" (which is what the Python SDK does, since [] is falsy in Python). One test (packages/sdk/src/__tests__/verify.test.ts:113) even validates audience: [] is accepted, but the verify path would reject every token.

Comparison with Python SDK

Python verifier at packages/python-sdk/relayauth/verifier.py:247:

if self.options.audience and not any(aud in claims.aud for aud in self.options.audience):

In Python, [] is falsy, so empty audience skips validation — the correct behavior.

Suggested change
if (
this.options?.audience &&
!this.options.audience.some((audience) => claims.aud.includes(audience))
) {
throw invalidTokenError();
if (
this.options?.audience &&
this.options.audience.length > 0 &&
!this.options.audience.some((audience) => claims.aud.includes(audience))
) {
throw invalidTokenError();
}
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +217 to +218
if (this.options?.maxAge !== undefined && claims.iat + this.options.maxAge < now) {
throw new TokenExpiredError();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 maxAge validation inconsistently applies clock-skew leeway across SDKs

The TypeScript SDK _validateClaims at line 217 checks claims.iat + this.options.maxAge < now without applying the 30-second leeway used for exp and nbf checks. This means a token issued exactly maxAge seconds ago will be rejected due to clock skew, even though exp and nbf both tolerate 30 seconds of drift. The Python SDK at packages/python-sdk/relayauth/verifier.py:249 applies leeway (claims.iat + self.options.max_age < now - leeway), making the two SDKs behave differently for the same token under the same clock conditions. Since maxAge is an expiry-like check, it should be consistent with the exp leeway.

Suggested change
if (this.options?.maxAge !== undefined && claims.iat + this.options.maxAge < now) {
throw new TokenExpiredError();
if (this.options?.maxAge !== undefined && claims.iat + this.options.maxAge < now - leeway) {
throw new TokenExpiredError();
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant