A high-performance Rust proxy that connects OpenAI-compatible clients (like Cursor, VSCode) to Google Gemini (via Google AI Studio or Vertex AI).
You have two ways to authenticate:
-
Option A (Personal, Recommended): A Google AI Studio API Key for Gemini π This is created in Google AI Studio, not in βAPIs & Services β Credentialsβ.
-
Option B (Enterprise / Production): A Google Cloud Service Account with the
Vertex AI Userrole.
- Open Google AI Studio: https://aistudio.google.com/app/apikey
- Sign in with the Google account that owns your Gemini / Vertex trial.
- Click βCreate API keyβ and choose / confirm a Cloud Project.
- Copy the key that looks like
AIzaSy...β this is what you use asGOOGLE_API_KEY.
If you have the $300 Vertex trial: just select that same project when creating the API key. The key is still created in AI Studio, but billing/quotas go through that GCP project.
Set the environment variable:
export GOOGLE_API_KEY="YOUR_API_KEY"# Required
GOOGLE_API_KEY=YOUR_API_KEY
# Optional: Auth for the proxy itself
APP_AUTH__REQUIRE_AUTH=true
APP_AUTH__MASTER_KEY=sk-your-secret-keyWhen an API key is present, the bridge talks to
generativelanguage.googleapis.com (Google AI Studio Gemini API).
-
Create a service account on GCP and grant it the
Vertex AI Userrole. -
Download the JSON key.
Recommended: Store in secure location outside project root:
mkdir -p ~/.config/fkllmproxy mv service-account.json ~/.config/fkllmproxy/ chmod 600 ~/.config/fkllmproxy/service-account.json
-
Set environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="$HOME/.config/fkllmproxy/service-account.json" export GOOGLE_CLOUD_PROJECT="your-project-id"
Security: See Deployment Guide for credential security best practices.
In this mode the bridge talks to aiplatform.googleapis.com (Vertex AI).
cargo runServer starts at http://127.0.0.1:4000 (default). Use APP_SERVER__HOST=0.0.0.0 to bind to all interfaces.
β οΈ Security Warning: If binding to0.0.0.0, always enable authentication (APP_AUTH__REQUIRE_AUTH=true) and use a strong master key.
- Go to Cursor Settings β Models.
- Add a custom model, e.g.
gemini-flash-latest(orgemini-pro-latest). - Set OpenAI Base URL to:
http://localhost:4000/v1. - Set API Key (the client-side key) to the value you set for
APP_AUTH__MASTER_KEYin your.envfile.
This "API Key" is just for your local bridge and unrelated to the Google API key. The bridge itself uses
GOOGLE_API_KEYor the service account credentials to talk to Google.
β οΈ Security: Generate a strong key withopenssl rand -hex 32and never commit it to version control.
# Replace YOUR_MASTER_KEY with your actual APP_AUTH__MASTER_KEY value
curl -X POST http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_MASTER_KEY" \
-d '{
"model": "gemini-2.5-flash",
"messages": [{"role": "user", "content": "Hello!"}]
}'The project includes a comprehensive test suite with 25 passing tests covering critical paths:
# Run smoke tests (< 2 minutes)
./scripts/test-smoke.sh
# Run all critical tests
./scripts/test-critical.sh
# Run specific test category
cargo test --test integration smoke_
cargo test --test integration auth_testTest Coverage:
- β Health endpoint
- β Auth middleware (6 scenarios)
- β Error handling (OpenAI-compatible format)
- β Rate limiting
- β Metrics endpoint
β οΈ Chat completions (2 E2E tests require credentials - auto-skip in local dev)
E2E tests that require real API credentials are automatically skipped when credentials are missing:
# Tests will auto-skip if no credentials
cargo test --test integration -- --ignored
# With credentials (one of these):
export VERTEX_API_KEY="AIzaSy..."
# OR
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
export GOOGLE_CLOUD_PROJECT="your-project-id"
# Now E2E tests will run
cargo test --test integration -- --ignoredCredential Detection: Tests automatically detect credentials and skip gracefully in local development. In CI environments, set credentials as secrets to enable E2E tests.
| Variable | Purpose | Required For |
|---|---|---|
VERTEX_API_KEY |
Google AI Studio API key | E2E tests with API key auth |
GOOGLE_APPLICATION_CREDENTIALS |
Path to service account JSON | E2E tests with service account |
GOOGLE_CLOUD_PROJECT |
GCP project ID | E2E tests with service account |
VERTEX_REGION |
GCP region | E2E tests (default: us-central1) |
FORCE_E2E_TESTS |
Force E2E tests in CI even without credentials | CI (not recommended) |
See Testing Guide for full documentation.
The bridge routes requests to providers based on model name prefixes. Here's how to check supported models:
Models are automatically routed by prefix:
| Prefix | Provider | Examples |
|---|---|---|
gemini-* |
Google Vertex AI | gemini-3.0-pro, gemini-2.5-flash, gemini-2.5-pro, gemini-2.5-flash-lite |
claude-* |
Anthropic CLI | claude-3-5-sonnet, claude-3-opus, claude-3-haiku |
gpt-* |
OpenAI (via Harvester) | gpt-4, gpt-3.5-turbo, gpt-4-turbo |
deepseek-* |
DeepSeek | β Not Implemented - Only routing enum exists |
ollama-* |
Ollama | β Not Implemented - Only routing enum exists |
Default: Unknown models default to Vertex AI (gemini-*).
Method 1: Test Request
curl -X POST http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-vertex-bridge-dev" \
-d '{
"model": "gemini-2.5-flash",
"messages": [{"role": "user", "content": "test"}],
"max_tokens": 10
}'Method 2: Check Provider Documentation
- Gemini: See Google Vertex AI Models Documentation or Google AI Studio
- Claude: See Anthropic Model Documentation
- OpenAI: Standard GPT models (
gpt-4,gpt-3.5-turbo)
Method 3: Review Code Model routing logic:
- OpenAI models (
gpt-*): Handled first insrc/handlers/chat.rsviais_openai_model()check - Other models: Routed by prefix via
ProviderRegistry::route_by_model()insrc/services/providers/mod.rs:
// Routing is handled via ProviderRegistry which checks model prefixes:
// - "gemini-*" β Vertex AI
// - "claude-*" β Anthropic (via bridge)
// - Other β Returns None (model not supported)
let registry = ProviderRegistry::with_config(Some(bridge_url));
match registry.route_by_model("gemini-2.5-flash") {
Some(provider) => // Use provider
None => // Model not supported
}
β οΈ Not Implemented: DeepSeek and Ollama providers are defined in the enum but not yet implemented. Requests to these models will fail.
Gemini (Vertex AI / Google AI Studio):
Note: Model IDs may vary. Always check the official Google documentation for the exact model names available in your region.
Current Models (as of November 2025):
gemini-3.0-pro- Latest advanced model for complex multimodal tasks and reasoninggemini-3.0-deep-think- Optimized for agentic workflows and autonomous coding (1M context)gemini-2.5-pro- High-capability model for complex reasoning and coding (1M context, adaptive thinking)gemini-2.5-flash- Fast and highly capable, balanced speed and pricegemini-2.5-flash-lite- Cost-effective for high-throughput tasks (1M context, multimodal)gemini-2.5-flash-image- Optimized for rapid creative workflows with image generation
Legacy Models (still supported):
gemini-1.5-pro- Previous generation high-capability modelgemini-1.5-flash- Previous generation fast modelgemini-pro- Standard model (may be deprecated)
Claude (Anthropic):
claude-3-5-sonnet- Latest, best performanceclaude-3-opus- Highest capabilityclaude-3-sonnet- Balancedclaude-3-haiku- Fast, cost-effective
OpenAI (requires Harvester):
gpt-4- Latest GPT-4gpt-4-turbo- GPT-4 Turbogpt-3.5-turbo- Fast, cost-effective
The bridge exposes two metrics endpoints:
JSON Metrics (/metrics):
curl http://localhost:4000/metricsReturns JSON with:
cache_hit_rate: Token cache hit percentagewaf_block_rate: WAF block percentagearkose_solves: Number of Arkose tokens generatedavg_arkose_solve_time_ms: Average Arkose solve timetotal_requests: Total requests processedsuccess_rate: Request success percentage
Prometheus Metrics (/metrics/prometheus):
curl http://localhost:4000/metrics/prometheusReturns Prometheus-formatted metrics (text/plain) for scraping by monitoring systems.
| Variable | Required | Description |
|---|---|---|
GOOGLE_API_KEY |
Yes* | Google AI Studio API key |
GOOGLE_APPLICATION_CREDENTIALS |
Yes* | Path to service account JSON (alternative to API key) |
APP_SERVER__HOST |
No | Bind address (default: 127.0.0.1) |
APP_SERVER__PORT |
No | Port (default: 4000) |
APP_SERVER__MAX_REQUEST_SIZE |
No | Max request body size in bytes (default: 10485760 = 10MB) |
APP_AUTH__REQUIRE_AUTH |
No | Enable auth (default: false) |
APP_AUTH__MASTER_KEY |
No | API key for clients to use |
APP_VERTEX__PROJECT_ID |
No | GCP project ID (required if using service account) |
APP_VERTEX__REGION |
No | GCP region (default: us-central1) |
APP_VERTEX__API_KEY_BASE_URL |
No | Override API key base URL (for testing/mocking) |
APP_VERTEX__OAUTH_BASE_URL |
No | Override OAuth base URL (for testing/mocking) |
APP_LOG__LEVEL |
No | Log level (default: info) |
APP_OPENAI__HARVESTER_URL |
No | Harvester service URL (default: http://localhost:3001) |
APP_OPENAI__ACCESS_TOKEN_TTL_SECS |
No | Access token cache TTL in seconds (default: 3600) |
APP_OPENAI__ARKOSE_TOKEN_TTL_SECS |
No | Arkose token cache TTL in seconds (default: 120) |
APP_ANTHROPIC__BRIDGE_URL |
No | Anthropic bridge service URL (default: http://localhost:4001) |
APP_RATE_LIMIT__CAPACITY |
No | Rate limit bucket capacity (default: 100 requests) |
APP_RATE_LIMIT__REFILL_PER_SECOND |
No | Rate limit refill rate (default: 10 requests/second) |
APP_CIRCUIT_BREAKER__FAILURE_THRESHOLD |
No | Circuit breaker failure threshold (default: 10) |
APP_CIRCUIT_BREAKER__TIMEOUT_SECS |
No | Circuit breaker timeout in seconds (default: 60) |
APP_CIRCUIT_BREAKER__SUCCESS_THRESHOLD |
No | Circuit breaker success threshold (default: 3) |
APP_CACHE__ENABLED |
No | Enable response caching (default: false) |
APP_CACHE__DEFAULT_TTL_SECS |
No | Cache TTL in seconds (default: 3600 = 1 hour) |
APP_LOG__FORMAT |
No | Log format: json or pretty (default: pretty) |
VERTEX_API_KEY |
No* | Google AI Studio API key (for E2E tests) |
VERTEX_REGION |
No | GCP region for Vertex AI (default: us-central1, for E2E tests) |
* One of GOOGLE_API_KEY or GOOGLE_APPLICATION_CREDENTIALS is required for runtime.
** Test-specific variables (VERTEX_API_KEY, etc.) are only needed for E2E tests.
Note: For CI/CD setup, see Deployment Guide for environment configuration.
The bridge now supports OpenAI models (gpt-4, gpt-3.5-turbo) via the ChatGPT Web Interface.
- Harvester Service: A separate TypeScript service manages browser sessions and token extraction.
- Valid ChatGPT Session: You need to be logged into ChatGPT in the browser.
-
Start the Harvester:
cd harvester npm install npm run devThe Harvester runs on
http://localhost:3001by default. -
Configure the Bridge:
APP_OPENAI__HARVESTER_URL=http://localhost:3001
-
Use OpenAI Models: Point Cursor to the same base URL (
http://localhost:4000/v1) and use models like:gpt-4gpt-3.5-turbo
All services can run via Docker Compose:
docker-compose up -dThis starts:
- vertex-bridge (Rust proxy) on port 4000
- harvester (OpenAI session manager) on port 3001
- anthropic-bridge (Anthropic CLI bridge) on port 4001
Testing the Setup:
After starting services, verify everything works:
# Run the Docker Compose test script
./scripts/test-docker-compose.shThis script:
- Verifies all services start correctly
- Checks health endpoints
- Validates service connectivity
- Shows recent logs from each service
Note for Anthropic Bridge: The Anthropic CLI (claude) must be authenticated before use. You can:
- Authenticate inside the container:
docker exec -it anthropic-bridge claude login - Mount your host's CLI config (uncomment volumes in docker-compose.yml)
- TLS Fingerprinting: Currently using standard
reqwest(WAF may block). TLS impersonation requiresreqwest-impersonatewhich needs manual setup. - Session Management: Requires manual login in browser initially. Cookies are persisted for session recovery.
- Arkose Tokens: Required for GPT-4, generated automatically via browser automation.
The bridge supports Anthropic Claude models via the official CLI.
-
Anthropic CLI: Install and authenticate the official CLI:
npm install -g @anthropic-ai/claude-code claude login
-
Bridge Service: A separate TypeScript service bridges CLI stdio to HTTP.
-
Start the Bridge Service:
cd bridge npm install npm run devThe Bridge runs on
http://localhost:4001by default. -
Configure the Proxy (optional):
The proxy defaults to
http://localhost:4001for the Anthropic bridge. To use a different URL, set:APP_ANTHROPIC__BRIDGE_URL=http://localhost:4001
-
Use Claude Models: Point Cursor to the same base URL (
http://localhost:4000/v1) and use models like:claude-3-5-sonnetclaude-3-opusclaude-3-sonnetclaude-3-haiku
- The Rust proxy routes
claude-*models to the Anthropic bridge service - The bridge service spawns
claude -pCLI command with the prompt - CLI output (with ANSI codes stripped) is converted to OpenAI-format SSE chunks
- Uses your Pro subscription quota directly (0% ban risk)
- Context Window: Full conversation history is sent each time (CLI is stateless)
- Token Consumption: Slightly higher than API mode due to history resending
- Requires CLI: Must have
claudecommand available in PATH
Current Status: Credential files are secured in .gitignore and not tracked.
Best Practices:
- Store credentials outside project root (recommended:
~/.config/fkllmproxy/) - Set file permissions to
600(read/write owner only) - Use environment variables, not hardcoded paths
- Rotate credentials regularly (every 90 days)
Quick Migration:
mkdir -p ~/.config/fkllmproxy
mv service-account.json ~/.config/fkllmproxy/
chmod 600 ~/.config/fkllmproxy/service-account.json
export GOOGLE_APPLICATION_CREDENTIALS="$HOME/.config/fkllmproxy/service-account.json"Documentation:
- Deployment Guide - Complete deployment guide with security best practices
- Operational Runbook - Day-to-day operations and troubleshooting
For production, use:
- Secret management systems (Kubernetes Secrets, Vault, etc.)
- Systemd environment variables
- Separate credentials per environment
- Regular credential rotation
See Deployment Guide for detailed instructions.
- Rust / Axum: High-performance async web server.
- Dual Auth Mode: Supports both Google AI Studio (API Key) and Vertex AI (Service Account).
- Transformer: Maps OpenAI-compatible JSON to Gemini / Vertex JSON on the fly.
- Multi-Provider Support:
- Vertex AI: Direct HTTP integration
- OpenAI: Split-process design (Rust Enforcer + TypeScript Harvester)
- Anthropic: Split-process design (Rust Enforcer + TypeScript Bridge)
- Production Ready: Kubernetes manifests, monitoring, security audit tools, performance testing
- Observability: Prometheus metrics, structured logging, health checks
- Docker Compose:
docker-compose.prod.ymlfor production - Kubernetes: Complete manifests in
k8s/directory - Monitoring: Prometheus metrics, Grafana dashboards, alerting rules
- Security: Automated security audit scripts, dependency scanning
- TLS Fingerprinting: Configuration structure for OpenAI WAF bypass (see ADR 005)
- Performance Testing: Load testing scripts in
scripts/load-test.sh - Security Audit: Automated security scanning in
scripts/security-audit.sh
See Deployment Guide and Monitoring Guide for details.