# LiteLLM Tutorial: Unified API for 100+ LLMs

LiteLLM is a Python library that provides a **unified interface** for calling 100+ Language Model APIs using the OpenAI format. It simplifies integration across providers like OpenAI, Anthropic, Google, Azure, AWS Bedrock, and more.

## Key Features
- **Unified API**: Same interface for all providers
- **Load Balancing**: Router with retry/fallback logic
- **Cost Tracking**: Built-in spend monitoring
- **Streaming Support**: Real-time response streaming
- **Error Handling**: Consistent exception handling
- **Async Support**: Full async/await compatibility

## 1. Installation & Setup

In [None]:
# Uncomment to install LiteLLM and python-dotenv with compatible versions
#!pip install "litellm>=1.0.0" "python-dotenv>=1.0.0" "pydantic>=2.0.0,<3.0.0" -q

In [15]:
# Import required modules
import litellm
from litellm import completion, Router
import os
import asyncio
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Load API keys from .env file and create explicit variables
openai_api_key = os.getenv("OPENAI_API_KEY")
anthropic_api_key = os.getenv("ANTHROPIC_API_KEY")
replicate_api_token = os.getenv("REPLICATE_API_TOKEN")
huggingface_api_key = os.getenv("HUGGINGFACE_API_KEY")

## 2. Basic Completion Calls

LiteLLM uses the **same `completion()` function** for all providers. Just change the model name:

In [8]:
# OpenAI GPT-4
if openai_api_key:
    response = completion(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello, how are you?"}]
    )
    print("OpenAI:", response.choices[0].message.content)
else:
    print("❌ OpenAI API key not found")

OpenAI: Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?


## 3. Multi-Provider Examples

LiteLLM supports **100+ providers** with consistent formatting:

In [13]:
print("🔄 Testing multiple LLM providers...")
print("=" * 50)

# OpenAI GPT-4
if openai_api_key:
    try:
        response = completion(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": "What is AI?"}]
        )
        print("🟢 OpenAI GPT-4:", response.choices[0].message.content[:60] + "...")
    except Exception as e:
        print(f"🔴 OpenAI Error: {type(e).__name__}: {e}")
else:
    print("🟡 OpenAI API key not found")

print()  # Add space between providers

# Anthropic Claude
if anthropic_api_key:
    try:
        response = completion(
            model="claude-3-haiku-20240307",
            messages=[{"role": "user", "content": "What is AI?"}]
        )
        print("🟢 Anthropic Claude:", response.choices[0].message.content[:60] + "...")
    except Exception as e:
        print(f"🔴 Claude Error: {type(e).__name__}: {e}")
else:
    print("🟡 Anthropic API key not found")

print()  # Add space between providers

# HuggingFace
if huggingface_api_key:
    try:
        hf_response = completion(
            model="huggingface/HuggingFaceTB/SmolLM3-3B",
            messages=[{"role": "user", "content": "What is AI?"}],
            api_key=huggingface_api_key
        )
        print("🟢 HuggingFace:", hf_response.choices[0].message.content[:60] + "...")
    except Exception as e:
        print(f"🔴 HuggingFace Error: {type(e).__name__}: {e}")
else:
    print("🟡 HuggingFace API key not found")

print()  # Add space between providers

# Local Ollama (no API key needed)
try:
    ollama_response = completion(
        model="ollama/gemma:2b",
        messages=[{"role": "user", "content": "What is AI?"}],
        api_base="http://localhost:11434"
    )
    print("🟢 Ollama (Local):", ollama_response.choices[0].message.content[:60] + "...")
except Exception as e:
    print(f"🔴 Ollama not available locally: {str(e)[:60]}...")

🔄 Testing multiple LLM providers...
🟢 OpenAI GPT-4: Artificial Intelligence (AI) refers to the simulation of hum...

🟢 OpenAI GPT-4: Artificial Intelligence (AI) refers to the simulation of hum...

🟢 Anthropic Claude: AI, or Artificial Intelligence, refers to the field of compu...

🟢 Anthropic Claude: AI, or Artificial Intelligence, refers to the field of compu...

🟢 HuggingFace: <think>
Okay, so I need to explain what AI is. Let me start ...

🟢 HuggingFace: <think>
Okay, so I need to explain what AI is. Let me start ...

🟢 Ollama (Local): **Artificial Intelligence (AI)** is a rapidly developing bra...
🟢 Ollama (Local): **Artificial Intelligence (AI)** is a rapidly developing bra...


## 4. Streaming Support

Get **real-time streaming responses** by setting `stream=True`:

In [None]:
# Streaming response
response = completion(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Write a short poem about AI"}],
    stream=True
)

print("Streaming response:")
for chunk in response:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)
print("\n")

## 5. Router: Load Balancing & Fallbacks

The **Router** enables load balancing, retries, and fallbacks across multiple deployments:

In [None]:
# Configure multiple model deployments
model_list = [
    {
        "model_name": "gpt-3.5-turbo",
        "litellm_params": {
            "model": "azure/gpt-35-turbo",
            "api_key": os.getenv("AZURE_API_KEY"),
            "api_base": os.getenv("AZURE_API_BASE"),
            "rpm": 100  # requests per minute
        }
    },
    {
        "model_name": "gpt-3.5-turbo",
        "litellm_params": {
            "model": "gpt-3.5-turbo",
            "api_key": os.getenv("OPENAI_API_KEY"),
            "rpm": 200
        }
    }
]

# Create router with fallbacks
router = Router(
    model_list=model_list,
    num_retries=3,
    timeout=30,
    fallbacks=[{"gpt-3.5-turbo": ["gpt-4"]}]  # fallback to GPT-4 if needed
)

# Make request through router
response = router.completion(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello from router!"}]
)
print("Router response:", response.choices[0].message.content)

## 6. Cost Tracking & Callbacks

LiteLLM provides **built-in cost tracking** and supports custom callbacks for monitoring:

In [None]:
# Custom callback to track costs
def track_cost_callback(kwargs, completion_response, start_time, end_time):
    """Custom callback to log costs and usage"""
    try:
        response_cost = kwargs.get("response_cost", 0)
        model = kwargs.get("model", "unknown")
        print(f"Model: {model}, Cost: ${response_cost:.6f}")
    except Exception as e:
        print(f"Error in callback: {e}")

# Set callback
litellm.success_callback = [track_cost_callback]

# Test cost tracking
response = completion(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Calculate 2+2"}]
)

# Built-in logging to external services
# litellm.success_callback = ["langfuse", "helicone", "lunary"]

## 7. Error Handling & Retries

LiteLLM standardizes **error handling** across all providers and supports automatic retries:

In [None]:
from openai import OpenAIError

try:
    # This will retry 3 times on failure
    response = completion(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": "Test message"}],
        num_retries=3
    )
    print("Success:", response.choices[0].message.content)
    
except OpenAIError as e:
    print(f"LiteLLM Error: {e}")
except Exception as e:
    print(f"General Error: {e}")

## 8. Async Support

LiteLLM supports **async/await** for concurrent operations:

In [None]:
import asyncio
from litellm import acompletion

async def async_completion_example():
    """Example of async completion"""
    response = await acompletion(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": "Hello async world!"}]
    )
    return response.choices[0].message.content

async def multiple_async_calls():
    """Make multiple concurrent API calls"""
    tasks = [
        acompletion(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": f"What is {topic}?"}]
        )
        for topic in ["AI", "Machine Learning", "Deep Learning"]
    ]
    
    responses = await asyncio.gather(*tasks)
    for i, response in enumerate(responses):
        print(f"Response {i+1}: {response.choices[0].message.content[:50]}...")

# Run async examples
# asyncio.run(multiple_async_calls())

## 9. Advanced Configuration

Additional features for production use:

In [None]:
# Set global configuration
litellm.set_verbose = True  # Enable debug logging
litellm.max_budget = 100.0  # Set spending limit ($100)

# Context window fallbacks
response = completion(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Long message..."}],
    fallbacks=["gpt-4-32k"]  # Use model with larger context window
)

# Custom metadata
response = completion(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello!"}],
    metadata={
        "user_id": "user123",
        "session_id": "session456",
        "custom_tag": "tutorial"
    }
)

print("Advanced configuration complete!")

## Summary

**LiteLLM** simplifies LLM integration by providing:

1. **Unified API** - Same interface for 100+ providers
2. **Reliability** - Built-in retries, fallbacks, and load balancing
3. **Observability** - Cost tracking, logging, and monitoring
4. **Performance** - Streaming and async support
5. **Production-ready** - Error handling and advanced routing
