feat(gateway): per-session auth isolation + OAuth Bearer support#135
Merged
feat(gateway): per-session auth isolation + OAuth Bearer support#135
Conversation
When the prompt cache goes cold after idle, reduce the cold-cache write cost by distilling aggressively and using a smaller context window: - Force-distill ALL pending messages on idle (force: true), even below the normal minMessages threshold. The cache is expiring anyway, and distilling now means less raw content in the next context. - Allow meta-distillation on idle unconditionally — cache is cold so the row ID rewrites don't cause additional cache busts. - Add post-idle compact layer: when onIdleResume() fires, skip layer 0 (full-raw passthrough) and use a tighter raw budget (20% of usable instead of 40%) for layer 1. The distilled prefix covers the older history; the raw window only needs the current turn + minimal recent context. This reduces the total cold-cache write cost by up to 20% of usable (~29K tokens on a 200K context model). - Add postIdleCompact flag to SessionState (one-shot, consumed by transformInner). Exposed in inspectSessionState for test visibility.
…savings) Add BatchLLMClient wrapper in gateway that accumulates non-urgent worker LLM calls (distillation, curation, consolidation, validation) and submits them via Anthropic's Message Batches API for 50% cost reduction. Key changes: - Add urgent flag to LLMClient.prompt() opts for batch/immediate routing - Thread urgent through distillation.run(), distillSegment(), metaDistill() - Mark compaction, overflow recovery, and query expansion as urgent - Background incremental distillation, idle curation, and worker validation are batch-safe (urgent unset) - Flush timer (30s) + auto-flush at queue capacity (50 items) - Poll timer (60s) checks batch status, streams JSONL results - Fallback to synchronous on batch API errors or missing API key - Graceful shutdown drains queue synchronously - Disable via LORE_BATCH_DISABLED=1 env var - 10 dedicated tests for batch queue behavior
Replace the global lastSeenApiKey singleton with a typed AuthCredential registry that stores credentials per session. Background workers (distillation, curation, batch queue) now use the correct session's credential via a two-level lookup (per-session first, global fallback). Key changes: - New auth.ts module: AuthCredential type (api-key | bearer), extractAuth(), authHeaders(), per-session registry, resolveAuth() two-level lookup - Thread sessionID through LLMClient.prompt() opts → core workers pass it to every prompt call (distillation, curator, expandQuery) - Gateway LLM adapter + batch queue use resolveAuth(sessionID) instead of a bare global string - Batch queue snapshots credentials per-item at enqueue time and groups flushes by credential for multi-tenant correctness - Session fingerprint now includes model + auth suffix so key/model changes create new sessions instead of corrupting existing ones - Translate layers (Anthropic, OpenAI) handle both x-api-key and Authorization: Bearer headers — fixes silent 401s for OAuth users
…lation # Conflicts: # .lore.md # packages/core/src/distillation.ts # packages/core/src/search.ts # packages/core/src/types.ts # packages/gateway/src/batch-queue.ts # packages/gateway/src/pipeline.ts # packages/gateway/test/batch-queue.test.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
lastSeenApiKeysingleton with a typedAuthCredentialregistry that stores credentials per session, preventing cross-session key mixups when multiple clients are connected simultaneouslyAuthorization: Bearertoken support alongsidex-api-key— fixes silent 401s for Claude Code subscription users (Max/Pro/Team) using OAuthChanges
New:
packages/gateway/src/auth.tsAuthCredentialdiscriminated union (api-key|bearer), header extraction/formatting, per-session registry withresolveAuth()two-level lookup (per-session → global fallback).Core: Thread
sessionIDthrough worker callsLLMClient.prompt()opts getssessionID?: string(backward-compatible)distillation.ts,curator.ts,search.ts,recall.tspasssessionIDto everyllm.prompt()callGateway: Session-aware auth
llm-adapter.tsusesAuthCredentialinstead of bare string, callsauthHeaders()for fetchpipeline.tswiresextractAuth()/setSessionAuth()/resolveAuth(), updatesidentifySession()fingerprint with model + auth suffixbatch-queue.tssnapshots credentials per-item at enqueue time, groups flushes by credentialtranslate/anthropic.tsforwards bothx-api-keyandAuthorization: Bearertranslate/openai.tshandles both auth schemes (always sends as Bearer)Testing
AuthCredentialtypesFollow-up