Skip to content
nexus edited this page May 28, 2026 · 2 revisions

Nexus Gateway

Nexus Gateway is an enterprise AI traffic gateway. It sits between every application or endpoint that calls a large language model and the LLM provider itself, and runs that traffic through one compliance engine, one audit pipeline, and one control plane. A single organization can apply unified policy, observability, cost control, and access management across every way its people and applications consume AI.

Three intercept layers

Nexus intercepts LLM traffic at three independent layers. Each layer runs the full compliance pipeline on its own traffic and sends it directly to the upstream provider.

Layer Where it intercepts
AI Gateway The SDK layer — virtual keys on /v1/chat/completions, /v1/responses, /v1/embeddings, and /v1/messages.
Compliance Proxy The network layer — a transparent TLS bump over HTTPS CONNECT.
Desktop Agent The operating-system layer — packet-level capture on macOS, Linux, and Windows endpoints.

When network policy routes Desktop Agent traffic out through the Compliance Proxy, the Agent stamps a signed attestation header on each request. The proxy verifies it and passes the connection through without re-inspecting it or recording a duplicate audit event, because the Agent already ran the pipeline.

What Nexus does

  • Provider abstraction. Applications speak the OpenAI SDK shape. Nexus normalizes each request to a canonical shape and translates it to the wire format of the actual provider. Twenty in-tree adapters ship today — eleven with full bidirectional translation (OpenAI, Anthropic, Gemini, Vertex, Azure, Bedrock, Cohere, MiniMax, GLM, Replicate, Voyage) and nine OpenAI-compatible passthrough providers (DeepSeek, Moonshot, Mistral, Groq, Fireworks, Together, Perplexity, xAI, Hugging Face). Function calls, vision input, structured outputs, and reasoning tokens survive the translation.
  • Multi-tier cache. An exact-match response cache, a semantic vector cache, provider-native cached-token accounting, and single-flight folding of concurrent identical prompts into one upstream call.
  • Cost and quota control. Multi-axis quotas (per organization, virtual key, provider, or model), token- or dollar-based budgets, soft limits that fire alerts and hard limits that reject with HTTP 429, and seven routing strategies: single, fallback, load-balance, conditional, A/B split, policy, and smart.
  • Compliance pipeline. PII detection, data classification, keyword filtering, content safety, rate limiting, IP allowlists, request-size validation, webhook forwarders, per-stage audit, body capture, SIEM forwarding, a three-tier kill switch, and emergency passthrough.
  • Enterprise governance. IAM with role- and attribute-based access control, virtual keys scoped per model, OIDC federation with just-in-time user provisioning, an organization and project hierarchy with per-organization quota, a credential vault with AES-256-GCM encryption and key rotation, and Desktop Agent fleet management with config sync and out-of-sync detection.

Architecture at a glance

Five Go services and one React console make up Nexus.

Component Purpose
Nexus Hub Node registry, target-config store, config sync, scheduled jobs, agent certificate authority, SIEM bridge.
Control Plane Admin API, IAM, single sign-on, analytics.
AI Gateway /v1 AI traffic, provider adapters, routing, quota.
Compliance Proxy HTTPS CONNECT, MITM TLS bump, compliance pipeline.
Desktop Agent Endpoint traffic interception on macOS, Linux, and Windows.
Control Plane UI React admin dashboard.

The four backend services register with the Nexus Hub as nodes and pull their configuration from it on boot and whenever it signals a change; the Hub never pushes full state. Durable state lives in PostgreSQL. Valkey backs sessions, the caches, rate limiting, and quota counters. NATS JetStream carries event streams and Hub coordination.

Who uses Nexus

  • Administrator / Compliance Officer — defines policy: providers and credentials, routing rules, virtual keys, hooks and rule packs, quotas, identity providers, IAM policies, and alerts.
  • Fleet Operator — runs the Infrastructure surface: nodes, config sync, scheduled jobs, the kill switch, error and crash reporting, proxy rollout, agent setup, observability, and SIEM.
  • Developer / Application Owner — holds a personal virtual key and calls the AI Gateway on /v1.
  • Endpoint User — has AI traffic intercepted by the Compliance Proxy or the Desktop Agent, with no direct Control Plane interaction.

See also

Clone this wiki locally