Skip to content

donequant/thinking-gemini-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gemini Streaming Proxy API

A high-performance Rust-based proxy API that streams content from Google's Gemini API with intelligent chunking and load balancing across multiple API keys.

Features

  • Streaming Response Processing: Receives streaming data from Gemini's streamGenerateContent endpoint
  • Intelligent Chunking: Accumulates and sends responses in configurable token-based chunks
  • Load Balancing: Round-robin distribution across multiple Gemini API keys
  • Authentication: Password-based API protection
  • Docker Support: Containerized with multi-stage builds for optimal size
  • Self-contained: Uses rustls for TLS, no system SSL dependencies

Architecture

Client → Rust Proxy (Port 1231) → Gemini API (Streaming)
                ↓
         Chunk Processing & Load Balancing
                ↓
         Accumulated Response Chunks

Configuration

Environment Variables

Variable Default Description
MODEL_ID gemini-2.5-flash Gemini model to use
chunk_token 50 Token threshold for sending chunks
api_password kkb API authentication password
GEMINI_API_KEY_LIST Required Comma-separated list of Gemini API keys

Chunking Logic

The proxy accumulates streaming responses and sends complete chunks based on:

  1. Token Count: When totalTokenCount >= (chunk_number * chunk_token)
  2. Completion: When finishReason is "STOP"

Example with chunk_token=50:

  • Chunk 1: Send when tokens ≥ 50
  • Chunk 2: Send when tokens ≥ 100
  • Final: Send when finishReason: "STOP"

API Usage

Endpoint

POST http://localhost:1231/generate?password=kkb

Request Format

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "What are you good at in a clear way?"
        }
      ]
    }
  ],
  "generationConfig": {
    "thinkingConfig": {
      "thinkingBudget": 0
    }
  }
}

Response Format

[
  {
    "content": "Complete accumulated response text...",
    "is_complete": false,
    "chunk_number": 1,
    "total_tokens": 65
  },
  {
    "content": "Complete accumulated response text including previous chunks...",
    "is_complete": true,
    "chunk_number": 2,
    "total_tokens": 125
  }
]

Development

Prerequisites

  • Rust 1.70+
  • Docker & Docker Compose

Local Development

# Set environment variables
export GEMINI_API_KEY_LIST="key1,key2,key3"
export MODEL_ID="gemini-2.5-flash"
export chunk_token=50
export api_password="kkb"

# Run locally
cargo run

Docker Deployment

  1. Update docker-compose.yml with your API keys:
environment:
  - GEMINI_API_KEY_LIST=your_actual_key_1,your_actual_key_2
  1. Build and run:
docker-compose up -d
  1. Test the API:
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "role": "user", 
        "parts": [{"text": "Hello, what can you do?"}]
      }
    ]
  }' \
  "http://localhost:1231/generate?password=kkb"

Load Balancing

The proxy automatically rotates through provided API keys using atomic counters:

  • Key 1 → Request 1, 4, 7, ...
  • Key 2 → Request 2, 5, 8, ...
  • Key 3 → Request 3, 6, 9, ...

This ensures even distribution and helps avoid rate limiting.

Security

  • Authentication: All requests require the api_password
  • Non-root Container: Runs as unprivileged user
  • TLS: Uses rustls for secure HTTPS connections
  • No Disk I/O: All request processing happens in memory

Performance Optimizations

  1. Memory-only Processing: No temporary files created
  2. Streaming Architecture: Immediate response processing
  3. Connection Reuse: HTTP client with connection pooling
  4. Minimal Docker Image: Multi-stage build with debian:slim base
  5. Zero-copy: Efficient byte buffer handling

Error Handling

  • Invalid API Keys: Returns 500 with error details
  • Authentication Failure: Returns 401 Unauthorized
  • Gemini API Errors: Proxies original status codes
  • Network Issues: Automatic timeout handling (300s)

Logs

The application provides structured logging with:

  • Request tracing with key rotation info
  • Token count and chunk generation details
  • Error details for debugging
  • Performance metrics

Production Considerations

  1. API Key Management: Use environment variables or secrets management
  2. Rate Limiting: Monitor Gemini API quotas across keys
  3. Health Checks: Built-in health endpoint for load balancers
  4. Monitoring: Structured logs for observability platforms
  5. Scaling: Stateless design allows horizontal scaling

About

thinking-gemini-rs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors