Skip to content

add ai gateway#1

Merged
jaredLunde merged 7 commits into
mainfrom
feat/ai-gateway
Jun 1, 2026
Merged

add ai gateway#1
jaredLunde merged 7 commits into
mainfrom
feat/ai-gateway

Conversation

@jaredLunde
Copy link
Copy Markdown
Contributor

No description provided.

jaredLunde and others added 7 commits May 31, 2026 11:07
…ve smoke

A centralized internal egress L7 proxy to LLM providers (Pingora + tokio). Apps
point their stock OpenAI/Anthropic SDK at it; the gateway authenticates, swaps in
the real provider key, relays the response untouched, and emits token-usage facts
for billing. Self-contained: no path deps into the beyond repo.

Auth branches on key format: bai_… is a stateless Ed25519-signed virtual key
(verify → deny-set check → swap to the pool key); anything else is BYO — the
user's own provider token, passed through unchanged.

Providers are data: a row in route::KNOWN_PROVIDERS (name, authority, base path,
auth scheme) or a config entry — adding an OpenAI-wire provider is one line, no
new code paths. Ships 10 known providers (openai, anthropic, openrouter,
fireworks, groq, deepseek, together, cerebras, mistral, xai), each with its
connection facts verified against the provider's official docs (cited inline in
route.rs). The client's /v1 prefix is rewritten to each provider's real mount
point (Groq /openai/v1, Fireworks /inference/v1, OpenRouter /api/v1) so a
verbatim passthrough can't 404.

Hardening: per-key rate guardrail (count-min, fixed memory), gap-free deny-set
seeding (resume-from-revision), optional on-disk snapshot for restart-before-NATS
enforcement, chunked-safe body-size cap, redacting/zeroizing Secret newtype,
TTL-cached async DNS, NATS-independent auth (fail-open deny-set).

Verification:
- 45 unit tests; e2e suite (real beyond-ai binary + real nats-server + mock
  upstream) covering managed key-swap, BYO passthrough, both dialects, usage
  metering, deny-set propagation, rate limiting, snapshot restart.
- Live smoke suite (tests/smoke.rs, mise run test:smoke): exercises the full
  managed path — Ed25519 verify → deny-check → key-swap → real TLS — against real
  providers, gated per API key (#[ignore] + key-presence). The Anthropic managed
  path is verified green against production.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The provider is now the request's first path segment (`/{provider}/…`); the rest
of the path is forwarded to the upstream verbatim (native passthrough — the
gateway holds no per-provider mount knowledge). A bare path with no provider
prefix that starts with `/v1` is the drop-in default: dialect picks
openai/anthropic, so those two are a host-only swap. An unknown first segment is
a 404.

This makes the gateway a true drop-in for a provider's base URL with any stock
tool (Codex, Cursor, the OpenAI/Anthropic SDKs) — provider is a base-URL concern,
not a per-request header that tools can't set. Removes the `x-beyond-provider`
header, the per-provider `base_path` rewrite table, `CLIENT_PREFIX`, and
`Provider::upstream_path` — a net simplification. `dialect` becomes a provider
attribute (drives usage parsing + stream_options injection eligibility, now a
prefix-agnostic suffix check); `dialect_for_path` survives only for the bare-path
default.

Auth swap / BYO passthrough, deny-set, rate limits, usage metering, model
provenance, and stream_options injection are unchanged.

Swept end to end per the plan: route/proxy/state/config + all inline comments;
e2e (reworked Fireworks → prefix-strip assertion; added `/openai`-prefix ==
bare-default and unknown→404 tests); all smoke tests (per-provider native paths,
Anthropic via `/anthropic/v1/messages`); ARCHITECTURE, README, config.example.

Verified: 68 lib + 18 e2e + 10 smoke pass, clippy clean across all targets, no
remaining x-beyond-provider/base_path/upstream_path/CLIENT_PREFIX references, and
the live Anthropic prefix route returns 200 (`mise run test:smoke`).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The repo was a Cargo workspace with exactly one member (`crates/gateway`) — a
remnant of a planned multi-crate layout (SDK, control-plane) that was dropped.
Over-structured: a workspace's only leverage (`[workspace.dependencies]`,
`[workspace.package]`, shared lints across crates) needs 2+ members to pay off.

Flattened to a single crate at the repo root:
- `crates/gateway/{src,tests,benches}` → `{src,tests,benches}` (git-tracked moves,
  history preserved).
- The two manifests merged into one root `Cargo.toml`: `[workspace.*]` tables
  inlined as `[package]`/`[dependencies]`/`[dev-dependencies]`, dep versions
  resolved from `[workspace.dependencies]`. Dropped the unused `async-nats`
  workspace dep (pulled transitively via slipstream; never named directly).

Rigor preserved exactly — same `[lints.rust]` (forbid unsafe, deny
unused_must_use) and `[lints.clippy]` panic-surface denies (unwrap/expect/panic/
todo/unimplemented), same `[profile.release]` overflow-checks. `[lints]` at the
package level still binds every target (lib/bin/tests/benches), so the bin-root
gap stays closed.

`mise check:rs` drops the now-meaningless `--workspace`; CI/ci.yml comments
updated `[workspace.lints]` → `[lints]`.

All CI steps pass locally (RUSTFLAGS=-D warnings): dprint check, cargo fmt
--check, clippy --all-targets -D warnings, 68 unit + 18 e2e (10 smoke ignored),
release build. Live Anthropic smoke green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jaredLunde jaredLunde merged commit 85cbfe5 into main Jun 1, 2026
1 check passed
@jaredLunde jaredLunde deleted the feat/ai-gateway branch June 1, 2026 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant