A high-performance Rust-based proxy API that streams content from Google's Gemini API with intelligent chunking and load balancing across multiple API keys.
- Streaming Response Processing: Receives streaming data from Gemini's
streamGenerateContentendpoint - Intelligent Chunking: Accumulates and sends responses in configurable token-based chunks
- Load Balancing: Round-robin distribution across multiple Gemini API keys
- Authentication: Password-based API protection
- Docker Support: Containerized with multi-stage builds for optimal size
- Self-contained: Uses rustls for TLS, no system SSL dependencies
Client → Rust Proxy (Port 1231) → Gemini API (Streaming)
↓
Chunk Processing & Load Balancing
↓
Accumulated Response Chunks
| Variable | Default | Description |
|---|---|---|
MODEL_ID |
gemini-2.5-flash |
Gemini model to use |
chunk_token |
50 |
Token threshold for sending chunks |
api_password |
kkb |
API authentication password |
GEMINI_API_KEY_LIST |
Required | Comma-separated list of Gemini API keys |
The proxy accumulates streaming responses and sends complete chunks based on:
- Token Count: When
totalTokenCount >= (chunk_number * chunk_token) - Completion: When
finishReasonis "STOP"
Example with chunk_token=50:
- Chunk 1: Send when tokens ≥ 50
- Chunk 2: Send when tokens ≥ 100
- Final: Send when
finishReason: "STOP"
POST http://localhost:1231/generate?password=kkb
{
"contents": [
{
"role": "user",
"parts": [
{
"text": "What are you good at in a clear way?"
}
]
}
],
"generationConfig": {
"thinkingConfig": {
"thinkingBudget": 0
}
}
}[
{
"content": "Complete accumulated response text...",
"is_complete": false,
"chunk_number": 1,
"total_tokens": 65
},
{
"content": "Complete accumulated response text including previous chunks...",
"is_complete": true,
"chunk_number": 2,
"total_tokens": 125
}
]- Rust 1.70+
- Docker & Docker Compose
# Set environment variables
export GEMINI_API_KEY_LIST="key1,key2,key3"
export MODEL_ID="gemini-2.5-flash"
export chunk_token=50
export api_password="kkb"
# Run locally
cargo run- Update docker-compose.yml with your API keys:
environment:
- GEMINI_API_KEY_LIST=your_actual_key_1,your_actual_key_2- Build and run:
docker-compose up -d- Test the API:
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"contents": [
{
"role": "user",
"parts": [{"text": "Hello, what can you do?"}]
}
]
}' \
"http://localhost:1231/generate?password=kkb"The proxy automatically rotates through provided API keys using atomic counters:
- Key 1 → Request 1, 4, 7, ...
- Key 2 → Request 2, 5, 8, ...
- Key 3 → Request 3, 6, 9, ...
This ensures even distribution and helps avoid rate limiting.
- Authentication: All requests require the
api_password - Non-root Container: Runs as unprivileged user
- TLS: Uses rustls for secure HTTPS connections
- No Disk I/O: All request processing happens in memory
- Memory-only Processing: No temporary files created
- Streaming Architecture: Immediate response processing
- Connection Reuse: HTTP client with connection pooling
- Minimal Docker Image: Multi-stage build with debian:slim base
- Zero-copy: Efficient byte buffer handling
- Invalid API Keys: Returns 500 with error details
- Authentication Failure: Returns 401 Unauthorized
- Gemini API Errors: Proxies original status codes
- Network Issues: Automatic timeout handling (300s)
The application provides structured logging with:
- Request tracing with key rotation info
- Token count and chunk generation details
- Error details for debugging
- Performance metrics
- API Key Management: Use environment variables or secrets management
- Rate Limiting: Monitor Gemini API quotas across keys
- Health Checks: Built-in health endpoint for load balancers
- Monitoring: Structured logs for observability platforms
- Scaling: Stateless design allows horizontal scaling