Skip to content

Helicone/ai-gateway

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Helicone AI Gateway

Helicone AI Gateway

GitHub stars Downloads Docker pulls License Public Beta

The fastest, lightest, and easiest-to-integrate AI Gateway on the market.

Built by the team at Helicone, open-sourced for the community.

πŸš€ Quick Start β€’ πŸ“– Docs β€’ πŸ’¬ Discord β€’ 🌐 Website


πŸš† 1 API. 100+ models.

Open-source, lightweight, and built on Rust.

Handle hundreds of models and millions of LLM requests with minimal latency and maximum reliability.

The NGINX of LLMs.


πŸ‘©πŸ»β€πŸ’» Set up in seconds

  1. Set up your .env file with your PROVIDER_API_KEYs
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
  1. Run locally in your terminal
npx @helicone/ai-gateway@latest
  1. Make your requests using any OpenAI SDK:
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/ai",
    api_key="placeholder-api-key" # Gateway handles API keys
)

# Route to any LLM provider through the same interface, we handle the rest.
response = client.chat.completions.create(
    model="anthropic/claude-3-5-sonnet",  # Or other 100+ models..
    messages=[{"role": "user", "content": "Hello from Helicone AI Gateway!"}]
)

That's it. No new SDKs to learn, no integrations to maintain. Fully-featured and open-sourced.

-- For advanced config, check out our configuration guide and the providers we support.


Why Helicone AI Gateway?

🌐 Unified interface

Request any LLM provider using familiar OpenAI syntax. Stop rewriting integrationsβ€”use one API for OpenAI, Anthropic, Google, AWS Bedrock, and 20+ more providers.

⚑ Smart provider selection

Load balance to always hit the fastest, cheapest, or most reliable option. Built-in strategies include latency-based P2C + PeakEWMA, weighted distribution, and cost optimization. Always aware of provider uptime and rate limits.

πŸ’° Control your spending

Rate limit to prevent runaway costs and usage abuse. Set limits per user, team, or globally with support for request counts, token usage, and dollar amounts.

πŸš€ Improve performance

Cache responses to reduce costs and latency by up to 95%. Supports Redis and S3 backends with intelligent cache invalidation.

πŸ“Š Simplified tracing

Monitor performance and debug issues with built-in Helicone integration, plus OpenTelemetry support for logs, metrics, and traces.

☁️ One-click deployment

Deploy in seconds to your own infrastructure by using our Docker or binary download following our deployment guides.

Launch.Final.1.1.1.mp4

⚑ Scalable for production

Metric Helicone AI Gateway Typical Setup
P95 Latency <10ms ~60-100ms
Memory Usage ~64MB ~512MB
Requests/sec ~2,000 ~500
Binary Size ~15MB ~200MB
Cold Start ~100ms ~2s

Note: These are preliminary performance metrics. See benchmarks/README.md for detailed benchmarking methodology and results.


πŸŽ₯ Demo

AI.Gateway.Demo.mp4

πŸ—οΈ How it works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Your App      │───▢│ Helicone AI     │───▢│  LLM Providers  β”‚
β”‚                 β”‚    β”‚ Gateway         β”‚    β”‚                 β”‚
β”‚ OpenAI SDK      β”‚    β”‚                 β”‚    β”‚ β€’ OpenAI        β”‚
β”‚ (any language)  β”‚    β”‚ β€’ Load Balance  β”‚    β”‚ β€’ Anthropic     β”‚
β”‚                 β”‚    β”‚ β€’ Rate Limit    β”‚    β”‚ β€’ AWS Bedrock   β”‚
β”‚                 β”‚    β”‚ β€’ Cache         β”‚    β”‚ β€’ Google Vertex β”‚
β”‚                 β”‚    β”‚ β€’ Trace         β”‚    β”‚ β€’ 20+ more      β”‚
β”‚                 β”‚    β”‚ β€’ Fallbacks     β”‚    β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
                               β–Ό
                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                      β”‚ Helicone        β”‚
                      β”‚ Observability   β”‚
                      β”‚                 β”‚
                      β”‚ β€’ Dashboard     β”‚
                      β”‚ β€’ Observability β”‚
                      β”‚ β€’ Monitoring    β”‚
                      β”‚ β€’ Debugging     β”‚
                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

βš™οΈ Custom configuration

1. Set up your environment variables

Include your PROVIDER_API_KEYs in your .env file.

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
HELICONE_API_KEY=sk-...

2. Customize your config file

Note: This is a sample config.yaml file. Please refer to our configuration guide for the full list of options, examples, and defaults. See our full provider list here.

helicone: # Include your HELICONE_API_KEY in your .env file
  observability: true
  authentication: true

cache-store:
  in-memory: {}

global: # Global settings for all routers
  cache:
    directive: "max-age=3600, max-stale=1800"

routers:
  your-router-name: # Single router configuration
    load-balance:
      chat:
        strategy: latency
        targets:
          - openai
          - anthropic
    rate-limit:
      per-api-key:
        capacity: 1000
        refill-frequency: 1m # 1000 requests per minute

3. Run with your custom configuration

npx @helicone/ai-gateway@latest --config config.yaml

4. Make your requests

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/router/your-router-name",
    api_key="placeholder-api-key" # Gateway handles API keys
)

# Route to any LLM provider through the same interface, we handle the rest.
response = client.chat.completions.create(
    model="anthropic/claude-3-5-sonnet",  # Or other 100+ models..
    messages=[{"role": "user", "content": "Hello from Helicone AI Gateway!"}]
)

πŸ“š Migration guide

From OpenAI (Python)

from openai import OpenAI

client = OpenAI(
-   api_key=os.getenv("OPENAI_API_KEY")
+   api_key="placeholder-api-key" # Gateway handles API keys
+   base_url="http://localhost:8080/router/your-router-name"
)

# No other changes needed!
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

From OpenAI (TypeScript)

import { OpenAI } from "openai";

const client = new OpenAI({
-   apiKey: os.getenv("OPENAI_API_KEY")
+   apiKey: "placeholder-api-key", // Gateway handles API keys
+   baseURL: "http://localhost:8080/router/your-router-name",
});

const response = await client.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Hello from Helicone AI Gateway!" }],
});

πŸ“š Resources

Documentation

Community

Support


πŸ“„ License

The Helicone AI Gateway is licensed under the Apache License - see the file for details.


Made with ❀️ by Helicone.

Website β€’ Docs β€’ Twitter β€’ Discord

About

The fastest, lightest, and easiest-to-integrate AI gateway on the market. Fully open-sourced.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Contributors 8