An intelligent LLM routing service that automatically discovers free language models and routes requests to the best-performing ones.
Robin LLM scrapes OpenRouter's website for free LLM options, continuously tests their performance, and provides an OpenAI-compatible API that intelligently routes requests to the best available model. Think of it as a smart load balancer for free LLMs.
- Automatic Discovery: Scans OpenRouter for free models, adds them to the pool automatically
- Performance Monitoring: Continuously tests and measures model performance (latency, success rate, errors)
- Intelligent Routing: Routes requests to the best-performing models using a weighted scoring algorithm
- OpenAI Compatible: Drop-in replacement for OpenAI API with standard
/v1/chat/completionsendpoint - Zero Configuration: Works out of the box with automatic model discovery
- Built with Java 21: Uses virtual threads for high-performance concurrent operations
- Lightweight: Built on Quarkus for minimal resource usage and fast startup
- Scraping: Every hour, Robin LLM scrapes OpenRouter's model page to discover new free models
- Testing: Each model is tested with standardized prompts to measure performance
- Scoring: Models are scored based on response time (60%), success rate (30%), and rate limit proximity (10%)
- Routing: Incoming requests are automatically routed to the best-performing available model
- Failover: If a model fails or degrades, requests automatically failover to the next best model
- Java 21: Latest LTS with virtual threads
- Quarkus: Fast, lightweight framework for low-latency API
- SQLite: Embedded database for metrics persistence
- Maven: Build and dependency management
- Jsoup: HTML scraping for model discovery
- Retrofit/OkHttp: HTTP client for LLM API communication
- RestEasy Reactive: Reactive REST API framework
- Circuit Breaker: Automatically stops routing to failing models and retries after cooldown
- Automatic Failover: Seamlessly switches to next best model on failure
- Round-Robin Load Balancing: Distributes requests across top-performing models
- Performance Metrics: Tracks latency, success rate, and requests per second
- Configurable Weights: Customize the scoring algorithm for model selection
- Java 21 or later
- Maven 3.8+
- OpenRouter API key (get one at https://openrouter.ai/keys)
# Clone the repository
git clone https://github.com/yourusername/robinllm.git
cd robinllm
# Build the project
mvn clean package
# Set your OpenRouter API key (get one at https://openrouter.ai/keys)
export OPENROUTER_API_KEY=your_api_key_here
# Run the application
java -jar target/quarkus-app/quarkus-run.jarOnce running, use the OpenAI-compatible API:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}'Use "model": "auto" to let Robin LLM automatically select the best model, or specify a model ID from /v1/models.
Additional examples:
# List available models
curl http://localhost:8080/v1/models
# Get model metrics
curl http://localhost:8080/v1/models/{model_id}/metrics
# Get system statistics
curl http://localhost:8080/v1/statsNote: All endpoints are prefixed with /v1 (e.g., health is at /v1/health)
All endpoints are prefixed with /v1.
Send chat completion requests (OpenAI compatible)
List all available free models with their performance metrics
Get details for a specific model
Get performance metrics for a specific model
Get routing statistics and system health
Reset statistics and circuit breakers
Health check endpoint (returns "OK")
Service information and available endpoints
Robin LLM can be configured via environment variables or application.properties:
scraper.enabled=true # Enable/disable model discovery
scraper.interval=1h # Scraping interval
scraper.openrouter.url=https://openrouter.ai/models
scraper.filter=free # Model filter criteriametrics.enabled=true # Enable/disable metrics collection
metrics.interval=1h # Testing interval
metrics.test.prompts=What is 2+2?,Explain photosynthesis
metrics.top-models=3 # Number of top models to testrouter.weight.latency=0.6 # Weight for latency in scoring
router.weight.success=0.3 # Weight for success rate in scoring
router.weight.rate-limit=0.1 # Weight for rate limit proximity
router.circuit-breaker.threshold=0.5 # Failure rate threshold for circuit breaker
router.retry.max=3 # Maximum retry attempts
router.retry.backoff=1000 # Backoff time in millisecondsapi.compatibility=openai # API compatibility mode
api.max-tokens=4096 # Maximum tokens per request
api.timeout=30000 # Request timeout in millisecondsopenrouter.api-key=your_api_key_here # OpenRouter API key (set via env var)
openrouter.base-url=https://openrouter.ai/api/v1curl http://localhost:8080/v1/healthcurl http://localhost:8080/v1/statsResponse includes:
- Total models available
- Active models
- Free models
- Total requests served
- Total failures
- Success rate
- Service uptime
curl http://localhost:8080/v1/models/{model_id}/metricsResponse includes:
- Average latency (ms)
- Success rate
- Error rate
- P95/P99 latency
- Requests per second
curl -X POST http://localhost:8080/v1/stats/resetNo models available:
- Verify OpenRouter API key is set correctly
- Check that scraper is enabled in configuration
- Review logs for scraping errors
High error rates:
- Check
/v1/statsfor model-specific metrics - Review circuit breaker status
- Ensure network connectivity to OpenRouter
Slow responses:
- Check model latency metrics via
/v1/models/{id}/metrics - Consider adjusting router weights for faster models
- Verify network connectivity
See RobinLLM.md for the complete development plan and technical details.
# Run in development mode with hot reload
mvn quarkus:dev
# Run tests
mvn test
# Build production JAR
mvn clean package
# Build native image (requires GraalVM)
mvn package -PnativeThe project includes comprehensive unit and integration tests. To run tests:
# Run all tests
mvn test
# Run specific test class
mvn test -Dtest=OpenRouterClientTestMIT License - see LICENSE file for details
Contributions welcome! Please read RobinLLM.md for the detailed implementation plan and architecture.
✅ Fully functional and ready for use - See RobinLLM.md for detailed technical documentation