A powerful framework that provides a unified interface for multiple LLM providers, allowing developers to seamlessly switch between different AI models while maintaining consistent API interactions.
- π Unified Interface: Access multiple LLM providers through a single, consistent API
- π Proxy Support: Configure HTTP/SOCKS5 proxies for all LLM calls
- πΊ Streaming: Real-time streaming responses for better user experience
- π§ Reasoning Models: Special support for reasoning models with thinking steps
- π‘οΈ Temperature Control: Fine-tune creativity and randomness when supported
- π’ Token Management: Control costs with maximum output token limits
- π§ MCP Integration: Model Context Protocol support when available
- π― OpenAI Protocol: Prefer OpenAI-compatible APIs for consistency
- βοΈ JSON Configuration: Easy configuration management through JSON files
| Provider | Status | Streaming | Reasoning | MCP | OpenAI Protocol |
|---|---|---|---|---|---|
| OpenAI | β Ready | β Yes | β Yes | β Yes | β Yes |
| Anthropic | β Ready | β Yes | β No | β Yes | β No |
| Google Gemini | π§ Planned | β Yes | β No | β No | β No |
| Qwen (DashScope) | β Ready | β Yes | β Yes | β No | β Yes |
| DeepSeek | β Ready | β Yes | β Yes | β No | β Yes |
| Volcengine | π§ Planned | β Yes | β No | β No | β Yes |
- Python 3.13+ (required)
- uv (recommended) or pip
# Clone the repository
git clone https://github.com/cyborgoat/MonoLLM.git
cd MonoLLM
# Install with uv (recommended)
uv sync
uv pip install -e .
# Or install with pip
pip install -e .# Check CLI is working
monollm --help
# List available providers
monollm list-providers# Set API keys for the providers you want to use
export DASHSCOPE_API_KEY="your-dashscope-api-key" # For Qwen
export ANTHROPIC_API_KEY="your-anthropic-api-key" # For Claude
export OPENAI_API_KEY="your-openai-api-key" # For GPT modelsimport asyncio
from monollm import UnifiedLLMClient, RequestConfig
async def main():
async with UnifiedLLMClient() as client:
config = RequestConfig(
model="qwq-32b", # Qwen's reasoning model
temperature=0.7,
max_tokens=1000,
)
response = await client.generate(
"Explain quantum computing in simple terms.",
config
)
print(response.content)
if response.usage:
print(f"Tokens used: {response.usage.total_tokens}")
asyncio.run(main())# Generate text with streaming
monollm generate "What is artificial intelligence?" --model qwen-plus --stream
# Use reasoning model with thinking steps
monollm generate "Solve: 2x + 5 = 13" --model qwq-32b --thinking
# List available models
monollm list-models --provider qwen- π Full Documentation - Comprehensive guides and API reference
- π Quick Start Guide - Get up and running in minutes
- βοΈ Configuration Guide - Advanced configuration options
- π» CLI Documentation - Command-line interface guide
- π€ Machine Interface - JSON API for programmatic usage and Tauri sidecars
- π§ Examples - Practical usage examples
MonoLLM provides a powerful machine-friendly JSON API perfect for integration with external applications, automation scripts, and Tauri sidecars:
# All commands support --machine flag for JSON output
monollm list-providers --machine
monollm generate "Hello world" --model gpt-4o --machine
monollm generate-stream "Tell a story" --model qwq-32b --thinking// Rust code for Tauri app
use std::process::Command;
let output = Command::new("monollm")
.args(&["generate", "What is AI?", "--model", "gpt-4o", "--machine"])
.output()
.expect("Failed to execute command");
let response: serde_json::Value = serde_json::from_slice(&output.stdout)?;
println!("AI Response: {}", response["content"]);- π Structured JSON: All responses in consistent JSON format
- π‘ Streaming Support: Real-time JSON chunks for streaming responses
- βοΈ Configuration API: Programmatic model defaults and proxy management
- π‘οΈ Error Handling: Consistent error format with detailed context
- π§ Validation: Parameter validation before API calls
- π Usage Tracking: Token usage and performance metrics
π Complete Machine Interface Documentation
config = RequestConfig(model="qwen-plus", temperature=0.8, max_tokens=1000)
response = await client.generate("Write a blog post about renewable energy", config)config = RequestConfig(model="qwq-32b", temperature=0.2)
response = await client.generate("Explain this Python function: def fibonacci(n):", config)config = RequestConfig(model="qwq-32b", show_thinking=True)
response = await client.generate("Analyze this data and find trends", config)MonoLLM supports reasoning models that can show their internal thought process:
# Enable thinking mode to see step-by-step reasoning
config = RequestConfig(
model="qwq-32b", # QwQ reasoning model
show_thinking=True, # Show internal reasoning
temperature=0.7
)
response = await client.generate(
"Solve this step by step: If a train travels 120 km in 2 hours, then 180 km in 3 hours, what is its average speed?",
config
)
# Access the thinking process
if response.thinking:
print("π Thinking Process:")
print(response.thinking)
print("\n" + "="*50)
print("π― Final Answer:")
print(response.content)Supported Reasoning Models:
- QwQ-32B (
qwq-32b) - Stream-only reasoning model - QwQ-Plus (
qwq-plus) - Stream-only reasoning model - Qwen3 Series (
qwen3-32b,qwen3-8b, etc.) - Support both modes - OpenAI o1 (
o1,o1-mini) - Advanced reasoning models - DeepSeek R1 (
deepseek-reasoner) - Reasoning model
config = RequestConfig(model="qwen-plus", temperature=1.0, max_tokens=2000)
response = await client.generate("Write a science fiction short story", config)async for chunk in await client.generate_stream(prompt, config):
if chunk.content:
print(chunk.content, end="", flush=True)messages = [
Message(role="system", content="You are a helpful assistant."),
Message(role="user", content="Hello!"),
]
response = await client.generate(messages, config)from monollm.core.exceptions import MonoLLMError, ProviderError
try:
response = await client.generate(prompt, config)
except ProviderError as e:
print(f"Provider error: {e}")
except MonoLLMError as e:
print(f"MonoLLM error: {e}")Configure HTTP/SOCKS5 proxies:
export PROXY_ENABLED=true
export PROXY_TYPE=http
export PROXY_HOST=127.0.0.1
export PROXY_PORT=7890We welcome contributions! Please see our Contributing Guide for details.
# Clone and install in development mode
git clone https://github.com/cyborgoat/MonoLLM.git
cd MonoLLM
uv sync --dev
# Install pre-commit hooks
pre-commit install
# Run tests
pytest
# Build documentation
cd docs && make htmlThis project is licensed under the MIT License - see the LICENSE file for details.
- GitHub: https://github.com/cyborgoat/MonoLLM
- Documentation: https://cyborgoat.github.io/MonoLLM/
- Issues: https://github.com/cyborgoat/MonoLLM/issues
- Discussions: https://github.com/cyborgoat/MonoLLM/discussions
- Thanks to all the LLM providers for their amazing APIs
- Inspired by the need for a unified interface across multiple AI providers
- Built with modern Python async/await patterns for optimal performance
Created and maintained by cyborgoat
Made with β€οΈ by cyborgoat