Skip to content

feat(gateway): per-session auth isolation + OAuth Bearer support#135

Merged
BYK merged 4 commits intomainfrom
feat/gateway-auth-isolation
May 7, 2026
Merged

feat(gateway): per-session auth isolation + OAuth Bearer support#135
BYK merged 4 commits intomainfrom
feat/gateway-auth-isolation

Conversation

@BYK
Copy link
Copy Markdown
Owner

@BYK BYK commented May 7, 2026

Summary

  • Replace global lastSeenApiKey singleton with a typed AuthCredential registry that stores credentials per session, preventing cross-session key mixups when multiple clients are connected simultaneously
  • Add Authorization: Bearer token support alongside x-api-key — fixes silent 401s for Claude Code subscription users (Max/Pro/Team) using OAuth
  • Include model + auth credential suffix in session fingerprints so key/model changes create new sessions rather than corrupting existing ones

Changes

New: packages/gateway/src/auth.ts

AuthCredential discriminated union (api-key | bearer), header extraction/formatting, per-session registry with resolveAuth() two-level lookup (per-session → global fallback).

Core: Thread sessionID through worker calls

  • LLMClient.prompt() opts gets sessionID?: string (backward-compatible)
  • distillation.ts, curator.ts, search.ts, recall.ts pass sessionID to every llm.prompt() call

Gateway: Session-aware auth

  • llm-adapter.ts uses AuthCredential instead of bare string, calls authHeaders() for fetch
  • pipeline.ts wires extractAuth()/setSessionAuth()/resolveAuth(), updates identifySession() fingerprint with model + auth suffix
  • batch-queue.ts snapshots credentials per-item at enqueue time, groups flushes by credential
  • translate/anthropic.ts forwards both x-api-key and Authorization: Bearer
  • translate/openai.ts handles both auth schemes (always sends as Bearer)

Testing

  • All 629 tests pass (508 core + 121 gateway)
  • Batch queue tests updated for new AuthCredential types

Follow-up

  • Disk persistence + long-TTL session eviction (30-90 days) for restart continuity — tracked separately
  • Verify Anthropic Batch API accepts Bearer tokens; if not, Bearer-authenticated items fall back to synchronous

BYK added 3 commits May 6, 2026 19:41
When the prompt cache goes cold after idle, reduce the cold-cache write
cost by distilling aggressively and using a smaller context window:

- Force-distill ALL pending messages on idle (force: true), even below
  the normal minMessages threshold. The cache is expiring anyway, and
  distilling now means less raw content in the next context.

- Allow meta-distillation on idle unconditionally — cache is cold so
  the row ID rewrites don't cause additional cache busts.

- Add post-idle compact layer: when onIdleResume() fires, skip layer 0
  (full-raw passthrough) and use a tighter raw budget (20% of usable
  instead of 40%) for layer 1. The distilled prefix covers the older
  history; the raw window only needs the current turn + minimal recent
  context. This reduces the total cold-cache write cost by up to 20%
  of usable (~29K tokens on a 200K context model).

- Add postIdleCompact flag to SessionState (one-shot, consumed by
  transformInner). Exposed in inspectSessionState for test visibility.
…savings)

Add BatchLLMClient wrapper in gateway that accumulates non-urgent worker
LLM calls (distillation, curation, consolidation, validation) and submits
them via Anthropic's Message Batches API for 50% cost reduction.

Key changes:
- Add urgent flag to LLMClient.prompt() opts for batch/immediate routing
- Thread urgent through distillation.run(), distillSegment(), metaDistill()
- Mark compaction, overflow recovery, and query expansion as urgent
- Background incremental distillation, idle curation, and worker validation
  are batch-safe (urgent unset)
- Flush timer (30s) + auto-flush at queue capacity (50 items)
- Poll timer (60s) checks batch status, streams JSONL results
- Fallback to synchronous on batch API errors or missing API key
- Graceful shutdown drains queue synchronously
- Disable via LORE_BATCH_DISABLED=1 env var
- 10 dedicated tests for batch queue behavior
Replace the global lastSeenApiKey singleton with a typed AuthCredential
registry that stores credentials per session. Background workers
(distillation, curation, batch queue) now use the correct session's
credential via a two-level lookup (per-session first, global fallback).

Key changes:
- New auth.ts module: AuthCredential type (api-key | bearer), extractAuth(),
  authHeaders(), per-session registry, resolveAuth() two-level lookup
- Thread sessionID through LLMClient.prompt() opts → core workers pass it
  to every prompt call (distillation, curator, expandQuery)
- Gateway LLM adapter + batch queue use resolveAuth(sessionID) instead of
  a bare global string
- Batch queue snapshots credentials per-item at enqueue time and groups
  flushes by credential for multi-tenant correctness
- Session fingerprint now includes model + auth suffix so key/model changes
  create new sessions instead of corrupting existing ones
- Translate layers (Anthropic, OpenAI) handle both x-api-key and
  Authorization: Bearer headers — fixes silent 401s for OAuth users
@BYK BYK enabled auto-merge (squash) May 7, 2026 09:23
…lation

# Conflicts:
#	.lore.md
#	packages/core/src/distillation.ts
#	packages/core/src/search.ts
#	packages/core/src/types.ts
#	packages/gateway/src/batch-queue.ts
#	packages/gateway/src/pipeline.ts
#	packages/gateway/test/batch-queue.test.ts
@BYK BYK merged commit bf07596 into main May 7, 2026
1 check passed
@BYK BYK deleted the feat/gateway-auth-isolation branch May 7, 2026 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant