This is the fastest way to get started.
- Download the latest release from the GitHub Releases page.
- Unzip the downloaded file.
- Double-click
setup_env.bat
. A window will open to help you add your API keys. Follow the on-screen instructions. - Double-click
proxy_app.exe
. This will start the proxy server.
Your proxy is now running! You can now use it in your applications.
This project provides a powerful solution for developers building complex applications, such as agentic systems, that interact with multiple Large Language Model (LLM) providers. It consists of two distinct but complementary components:
- A Universal API Proxy: A self-hosted FastAPI application that provides a single, OpenAI-compatible endpoint for all your LLM requests. Powered by
litellm
, it allows you to seamlessly switch between different providers and models without altering your application's code. - A Resilience & Key Management Library: The core engine that powers the proxy. This reusable Python library intelligently manages a pool of API keys to ensure your application is highly available and resilient to transient provider errors or performance issues.
- Universal API Endpoint: Simplifies development by providing a single, OpenAI-compatible interface for diverse LLM providers.
- High Availability: The underlying library ensures your application remains operational by gracefully handling transient provider errors and API key-specific issues.
- Resilient Performance: A global timeout on all requests prevents your application from hanging on unresponsive provider APIs.
- Efficient Concurrency: Maximizes throughput by allowing a single API key to handle multiple concurrent requests to different models.
- Intelligent Key Management: Optimizes request distribution across your pool of keys by selecting the best available one for each call.
- Escalating Per-Model Cooldowns: If a key fails for a specific model, it's placed on a temporary, escalating cooldown for that model, allowing it to be used with others.
- Automatic Daily Resets: Cooldowns and usage statistics are automatically reset daily, making the system self-maintaining.
- Detailed Request Logging: Enable comprehensive logging for debugging. Each request gets its own directory with full request/response details, streaming chunks, and performance metadata.
- Provider Agnostic: Compatible with any provider supported by
litellm
. - OpenAI-Compatible Proxy: Offers a familiar API interface with additional endpoints for model and provider discovery.
This is the fastest way to get started for most users on Windows.
- Download the latest release from the GitHub Releases page.
- Unzip the downloaded file.
- Run
setup_env.bat
. A window will open to help you add your API keys. Follow the on-screen instructions. - Run
proxy_app.exe
. This will start the proxy server in a new terminal window.
Your proxy is now running and ready to use at http://127.0.0.1:8000
.
This guide is for users who want to run the proxy from the source code on any operating system.
First, clone the repository and install the required dependencies into a virtual environment.
Linux/macOS:
# Clone the repository
git clone https://github.com/Mirrowel/LLM-API-Key-Proxy.git
cd LLM-API-Key-Proxy
# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
Windows:
# Clone the repository
git clone https://github.com/Mirrowel/LLM-API-Key-Proxy.git
cd LLM-API-Key-Proxy
# Create and activate a virtual environment
python -m venv venv
.\venv\Scripts\Activate.ps1
# Install dependencies
pip install -r requirements.txt
Create a .env
file to store your secret keys. You can do this by copying the example file.
Linux/macOS:
cp .env.example .env
Windows:
copy .env.example .env
Now, open the new .env
file and add your keys.
Refer to the .env.example
file for the correct format and a full list of supported providers.
PROXY_API_KEY
: This is a secret key you create. It is used to authorize requests to your proxy, preventing unauthorized use.- Provider Keys: These are the API keys you get from LLM providers (like Gemini, OpenAI, etc.). The proxy automatically finds them based on their name (e.g.,
GEMINI_API_KEY_1
).
Example .env
configuration:
# A secret key for your proxy server to authenticate requests.
# This can be any secret string you choose.
PROXY_API_KEY="a-very-secret-and-unique-key"
# --- Provider API Keys ---
# Add your keys from various providers below.
# You can add multiple keys for one provider by numbering them (e.g., _1, _2).
GEMINI_API_KEY_1="YOUR_GEMINI_API_KEY_1"
GEMINI_API_KEY_2="YOUR_GEMINI_API_KEY_2"
OPENROUTER_API_KEY_1="YOUR_OPENROUTER_API_KEY_1"
NVIDIA_NIM_API_KEY_1="YOUR_NVIDIA_NIM_API_KEY_1"
CHUTES_API_KEY_1="YOUR_CHUTES_API_KEY_1"
You can run the proxy in two ways:
A) Using the Compiled Executable (Recommended)
A pre-compiled, standalone executable for Windows is available on the latest GitHub Release. This is the easiest way to get started as it requires no setup.
For the simplest experience, follow the Easy Setup for Beginners guide at the top of this document.
B) Running from Source
Start the server by running the main.py
script directly.
python src/proxy_app/main.py
The proxy is now running and available at http://127.0.0.1:8000
.
You can now send requests to the proxy. The endpoint is http://127.0.0.1:8000/v1/chat/completions
.
Remember to:
- Set the
Authorization
header toBearer your-super-secret-proxy-key
. - Specify the
model
in the formatprovider/model_name
.
Here is an example using curl
:
curl -X POST http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-super-secret-proxy-key" \
-d '{
"model": "gemini/gemini-2.5-flash",
"messages": [{"role": "user", "content": "What is the capital of France?"}]
}'
The proxy is OpenAI-compatible, so you can use it directly with the openai
Python client.
import openai
# Point the client to your local proxy
client = openai.OpenAI(
base_url="http://127.0.0.1:8000/v1",
api_key="a-very-secret-and-unique-key" # Use your PROXY_API_KEY here
)
# Make a request
response = client.chat.completions.create(
model="gemini/gemini-2.5-flash", # Specify provider and model
messages=[
{"role": "user", "content": "Write a short poem about space."}
]
)
print(response.choices[0].message.content)
You can also send requests directly using tools like `curl`.
```bash
curl -X POST http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer a-very-secret-and-unique-key" \
-d '{
"model": "gemini/gemini-2.5-flash",
"messages": [{"role": "user", "content": "What is the capital of France?"}]
}'
POST /v1/chat/completions
: The main endpoint for making chat requests.POST /v1/embeddings
: The endpoint for creating embeddings.GET /v1/models
: Returns a list of all available models from your configured providers.GET /v1/providers
: Returns a list of all configured providers.POST /v1/token-count
: Calculates the token count for a given message payload.
When a request is made to the proxy, the application uses its core resilience library to ensure the request is handled reliably:
- Selects an Optimal Key: The
UsageManager
selects the best available key from your pool. It uses a tiered locking strategy to find a healthy, available key, prioritizing those with the least recent usage. This allows for concurrent requests to different models using the same key, maximizing efficiency. - Makes the Request: The proxy uses the acquired key to make the API call to the target provider via
litellm
. - Manages Errors Gracefully:
- It uses a
classify_error
function to determine the failure type. - For transient server errors, it retries the request with the same key using exponential backoff.
- For key-specific issues (e.g., authentication or provider-side limits), it temporarily places that key on a cooldown for the specific model and seamlessly retries the request with the next available key from the pool.
- It uses a
- Tracks Usage & Releases Key: On a successful request, it records usage stats. The key is then released back into the available pool, ready for the next request.
The proxy server can be configured at runtime using the following command-line arguments:
--host
: The IP address to bind the server to. Defaults to0.0.0.0
(accessible from your local network).--port
: The port to run the server on. Defaults to8000
.--enable-request-logging
: A flag to enable detailed, per-request logging. When active, the proxy creates a unique directory for each transaction in thelogs/detailed_logs/
folder, containing the full request, response, streaming chunks, and performance metadata. This is highly recommended for debugging.
Example:
python src/proxy_app/main.py --host 127.0.0.1 --port 9999 --enable-request-logging
For convenience on Windows, you can use the provided .bat
scripts in the root directory to run the proxy with common configurations:
start_proxy.bat
: Starts the proxy on0.0.0.0:8000
with default settings.start_proxy_debug_logging.bat
: Starts the proxy and automatically enables request logging.
401 Unauthorized
: Ensure yourPROXY_API_KEY
is set correctly in the.env
file and included in theAuthorization: Bearer <key>
header of your request.500 Internal Server Error
: Check the console logs of theuvicorn
server for detailed error messages. This could indicate an issue with one of your provider API keys (e.g., it's invalid or has been revoked) or a problem with the provider's service. If you have logging enabled (--enable-request-logging
), inspect thefinal_response.json
andmetadata.json
files in the corresponding log directory underlogs/detailed_logs/
for the specific error returned by the upstream provider.- All keys on cooldown: If you see a message that all keys are on cooldown, it means all your keys for a specific provider have recently failed. If you have logging enabled (
--enable-request-logging
), check thelogs/detailed_logs/
directory to find the logs for the failed requests and inspect thefinal_response.json
to see the underlying error from the provider.
- Using the Library: For documentation on how to use the
api-key-manager
library directly in your own Python projects, please refer to its README.md. - Technical Details: For a more in-depth technical explanation of the library's architecture, components, and internal workings, please refer to the Technical Documentation.