A unified async Python wrapper for multiple LLM providers with a consistent interface.
- Unified Interface - Single API for multiple LLM providers (OpenAI, AWS Bedrock)
- Async/Await - Built on asyncio for high-performance concurrent requests
- Smart Caching - Automatic response caching to reduce costs and latency
- Auto Retry - Exponential backoff retry logic for transient failures
- Structured Output - Native Pydantic model support for type-safe responses
- Streaming - Real-time streaming responses for better UX
- Rate Limiting - Built-in concurrency control per model
- Colored Logging - Beautiful console output for debugging
- OpenAI Response API - Full support for OpenAI's primary API including reasoning models
pip install smartllmInstall only the providers you need:
# For OpenAI
pip install smartllm[openai]
# For AWS Bedrock
pip install smartllm[bedrock]
# For all providers
pip install smartllm[all]import asyncio
from smartllm import LLMClient, TextRequest
async def main():
# Auto-detects provider from environment variables
async with LLMClient(provider="openai") as client:
response = await client.generate_text(
TextRequest(prompt="What is the capital of France?")
)
print(response.text)
asyncio.run(main())from smartllm import LLMClient, MessageRequest, Message
async with LLMClient(provider="openai") as client:
messages = [
Message(role="user", content="My name is Alice."),
Message(role="assistant", content="Nice to meet you, Alice!"),
Message(role="user", content="What's my name?"),
]
response = await client.send_message(
MessageRequest(messages=messages)
)
print(response.text) # "Your name is Alice."from smartllm import LLMClient, TextRequest
async with LLMClient(provider="openai") as client:
request = TextRequest(
prompt="Write a short poem about Python.",
stream=True
)
async for chunk in client.generate_text_stream(request):
print(chunk.text, end="", flush=True)from pydantic import BaseModel
from smartllm import LLMClient, TextRequest
class Person(BaseModel):
name: str
age: int
occupation: str
async with LLMClient(provider="openai") as client:
response = await client.generate_text(
TextRequest(
prompt="Generate a person profile for a software engineer named John, age 30.",
response_format=Person
)
)
person = response.structured_data
print(f"{person.name} is a {person.age} year old {person.occupation}")OpenAI:
export OPENAI_API_KEY="your-api-key"
export OPENAI_MODEL="gpt-4o-mini" # OptionalAWS Bedrock:
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_REGION="us-east-1"
export BEDROCK_MODEL="anthropic.claude-3-sonnet-20240229-v1:0" # Optionalfrom smartllm import LLMClient, LLMConfig
config = LLMConfig(
provider="openai",
api_key="your-api-key",
default_model="gpt-4o",
temperature=0.7,
max_tokens=2048,
max_retries=3,
)
async with LLMClient(config) as client:
# Use client...
passfrom smartllm import defaults
# Modify global defaults
defaults.DEFAULT_TEMPERATURE = 0.7
defaults.DEFAULT_MAX_TOKENS = 4096
defaults.DEFAULT_MAX_RETRIES = 5SmartLLM supports both OpenAI APIs via the api_type parameter:
"responses"(default) - OpenAI's primary Response API, recommended for all modern models"chat_completions"- Legacy Chat Completions API, supported indefinitely
# Response API (default)
response = await client.generate_text(
TextRequest(prompt="Hello", api_type="responses")
)
# Chat Completions API (legacy)
response = await client.generate_text(
TextRequest(prompt="Hello", api_type="chat_completions")
)For models that support reasoning (e.g. GPT-5.x), use reasoning_effort to control how much the model reasons before responding. Reasoning tokens are returned in response.metadata:
response = await client.generate_text(
TextRequest(
prompt="Solve: what is the 100th Fibonacci number?",
reasoning_effort="high", # "low", "medium", or "high"
)
)
print(response.text)
print(f"Reasoning tokens used: {response.metadata.get('reasoning_tokens', 0)}")Note: reasoning models do not support temperature. Passing a value other than 1 will raise a ValueError.
from pydantic import BaseModel
from smartllm import LLMClient, TextRequest
class Solution(BaseModel):
answer: float
unit: str
explanation: str
async with LLMClient(provider="openai") as client:
response = await client.generate_text(
TextRequest(
prompt="A train leaves city A at 60mph toward city B (300 miles away). Another leaves B at 90mph. When do they meet?",
response_format=Solution,
reasoning_effort="medium",
)
)
solution = response.structured_data
print(f"{solution.answer} {solution.unit}: {solution.explanation}")
print(f"Reasoning tokens: {response.metadata.get('reasoning_tokens', 0)}")Responses are automatically cached when temperature=0:
# First call - hits API
response1 = await client.generate_text(
TextRequest(prompt="What is 2+2?", temperature=0)
)
# Second call - uses cache (instant, free)
response2 = await client.generate_text(
TextRequest(prompt="What is 2+2?", temperature=0)
)
# Clear cache for specific request
response3 = await client.generate_text(
TextRequest(prompt="What is 2+2?", temperature=0, clear_cache=True)
)import asyncio
from smartllm import LLMClient, TextRequest
async with LLMClient(provider="openai") as client:
prompts = ["Question 1", "Question 2", "Question 3"]
tasks = [
client.generate_text(TextRequest(prompt=p))
for p in prompts
]
responses = await asyncio.gather(*tasks)# Limit concurrent requests
client = LLMClient(provider="openai", max_concurrent=5)For advanced use cases, access provider-specific clients:
from smartllm.openai import OpenAILLMClient, OpenAIConfig
from smartllm.bedrock import BedrockLLMClient, BedrockConfig
# OpenAI-specific features
openai_config = OpenAIConfig(api_key="...", organization="...")
async with OpenAILLMClient(openai_config) as client:
models = await client.list_available_models()
# Bedrock-specific features
bedrock_config = BedrockConfig(aws_region="us-east-1")
async with BedrockLLMClient(bedrock_config) as client:
models = await client.list_available_model_ids()- OpenAI - GPT models via OpenAI API
- AWS Bedrock - Claude, Llama, Mistral, and Titan models
LLMClient- Unified client for all providersLLMConfig- Unified configurationTextRequest- Single prompt requestMessageRequest- Multi-turn conversation requestTextResponse- LLM response with metadataMessage- Conversation messageStreamChunk- Streaming response chunk
| Parameter | Type | Description | Default |
|---|---|---|---|
prompt |
str | Input text prompt | Required |
model |
str | Model ID to use | Config default |
temperature |
float | Sampling temperature (0-1) | 0 |
max_tokens |
int | Maximum output tokens | 2048 |
top_p |
float | Nucleus sampling | 1.0 |
system_prompt |
str | System context | None |
stream |
bool | Enable streaming | False |
response_format |
BaseModel | Pydantic model for structured output | None |
use_cache |
bool | Enable caching | True |
clear_cache |
bool | Clear cache before request | False |
api_type |
str | OpenAI API type ("responses" or "chat_completions") |
"responses" |
reasoning_effort |
str | Reasoning effort ("low", "medium", "high") |
None |
from smartllm import LLMClient, TextRequest
async with LLMClient(provider="openai") as client:
try:
response = await client.generate_text(
TextRequest(prompt="Hello")
)
except ValueError as e:
print(f"Configuration error: {e}")
except Exception as e:
print(f"API error: {e}")git clone https://github.com/Redundando/smartllm.git
cd smartllm
pip install -r requirements-dev.txt# Unit tests
pytest tests/unit/ -v
# Integration tests (requires API keys)
export OPENAI_API_KEY="your-key"
pytest tests/integration/ -v
# All tests
pytest tests/ -vContributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Initial public release
- Unified interface for multiple providers
- OpenAI support (GPT models)
- AWS Bedrock support (Claude, Llama, Mistral, Titan)
- Async/await architecture
- Smart caching with temperature=0
- Auto retry with exponential backoff
- Structured output with Pydantic models
- Streaming responses
- Rate limiting and concurrency control
- OpenAI Response API support (primary interface)
- Reasoning model support with
reasoning_effortparameter - Comprehensive test suite
- Issues: GitHub Issues
- Email: arved.kloehn@gmail.com
Built with love using:
- Pydantic for data validation
- aioboto3 for AWS async support
- OpenAI Python SDK for OpenAI integration