The fastest, lightest, and easiest-to-integrate AI Gateway on the market.
Built by the team at Helicone, open-sourced for the community.
π Quick Start β’ π Docs β’ π¬ Discord β’ π Website
Open-source, lightweight, and built on Rust.
Handle hundreds of models and millions of LLM requests with minimal latency and maximum reliability.
The NGINX of LLMs.
- Set up your
.env
file with yourPROVIDER_API_KEY
s
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
- Run locally in your terminal
npx @helicone/ai-gateway@latest
- Make your requests using any OpenAI SDK:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/ai",
api_key="placeholder-api-key" # Gateway handles API keys
)
# Route to any LLM provider through the same interface, we handle the rest.
response = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet", # Or other 100+ models..
messages=[{"role": "user", "content": "Hello from Helicone AI Gateway!"}]
)
That's it. No new SDKs to learn, no integrations to maintain. Fully-featured and open-sourced.
-- For advanced config, check out our configuration guide and the providers we support.
Request any LLM provider using familiar OpenAI syntax. Stop rewriting integrationsβuse one API for OpenAI, Anthropic, Google, AWS Bedrock, and 20+ more providers.
Load balance to always hit the fastest, cheapest, or most reliable option. Built-in strategies include latency-based P2C + PeakEWMA, weighted distribution, and cost optimization. Always aware of provider uptime and rate limits.
Rate limit to prevent runaway costs and usage abuse. Set limits per user, team, or globally with support for request counts, token usage, and dollar amounts.
Cache responses to reduce costs and latency by up to 95%. Supports Redis and S3 backends with intelligent cache invalidation.
Monitor performance and debug issues with built-in Helicone integration, plus OpenTelemetry support for logs, metrics, and traces.
Deploy in seconds to your own infrastructure by using our Docker or binary download following our deployment guides.
Launch.Final.1.1.1.mp4
Metric | Helicone AI Gateway | Typical Setup |
---|---|---|
P95 Latency | <10ms | ~60-100ms |
Memory Usage | ~64MB | ~512MB |
Requests/sec | ~2,000 | ~500 |
Binary Size | ~15MB | ~200MB |
Cold Start | ~100ms | ~2s |
Note: These are preliminary performance metrics. See benchmarks/README.md for detailed benchmarking methodology and results.
AI.Gateway.Demo.mp4
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Your App βββββΆβ Helicone AI βββββΆβ LLM Providers β
β β β Gateway β β β
β OpenAI SDK β β β β β’ OpenAI β
β (any language) β β β’ Load Balance β β β’ Anthropic β
β β β β’ Rate Limit β β β’ AWS Bedrock β
β β β β’ Cache β β β’ Google Vertex β
β β β β’ Trace β β β’ 20+ more β
β β β β’ Fallbacks β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Helicone β
β Observability β
β β
β β’ Dashboard β
β β’ Observability β
β β’ Monitoring β
β β’ Debugging β
βββββββββββββββββββ
Include your PROVIDER_API_KEY
s in your .env
file.
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
HELICONE_API_KEY=sk-...
Note: This is a sample config.yaml
file. Please refer to our configuration guide for the full list of options, examples, and defaults.
See our full provider list here.
helicone: # Include your HELICONE_API_KEY in your .env file
observability: true
authentication: true
cache-store:
in-memory: {}
global: # Global settings for all routers
cache:
directive: "max-age=3600, max-stale=1800"
routers:
your-router-name: # Single router configuration
load-balance:
chat:
strategy: latency
targets:
- openai
- anthropic
rate-limit:
per-api-key:
capacity: 1000
refill-frequency: 1m # 1000 requests per minute
npx @helicone/ai-gateway@latest --config config.yaml
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/router/your-router-name",
api_key="placeholder-api-key" # Gateway handles API keys
)
# Route to any LLM provider through the same interface, we handle the rest.
response = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet", # Or other 100+ models..
messages=[{"role": "user", "content": "Hello from Helicone AI Gateway!"}]
)
from openai import OpenAI
client = OpenAI(
- api_key=os.getenv("OPENAI_API_KEY")
+ api_key="placeholder-api-key" # Gateway handles API keys
+ base_url="http://localhost:8080/router/your-router-name"
)
# No other changes needed!
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
import { OpenAI } from "openai";
const client = new OpenAI({
- apiKey: os.getenv("OPENAI_API_KEY")
+ apiKey: "placeholder-api-key", // Gateway handles API keys
+ baseURL: "http://localhost:8080/router/your-router-name",
});
const response = await client.chat.completions.create({
model: "openai/gpt-4o",
messages: [{ role: "user", content: "Hello from Helicone AI Gateway!" }],
});
- π Full Documentation - Complete guides and API reference
- π Quickstart Guide - Get up and running in 1 minute
- π¬ Advanced Configurations - Configuration reference & examples
- π¬ Discord Server - Our community of passionate AI engineers
- π GitHub Discussions - Q&A and feature requests
- π¦ Twitter - Latest updates and announcements
- π§ Newsletter - Tips and tricks to deploying AI applications
- π« Report bugs: Github issues
- πΌ Enterprise Support: Book a discovery call with our team
The Helicone AI Gateway is licensed under the Apache License - see the file for details.
Made with β€οΈ by Helicone.