# 🏭 09: Production Patterns

Learn production-ready patterns for building robust, reliable LLM applications with proper error handling, timeouts, and retry logic.

## 📋 Learning Objectives

By the end of this notebook, you will be able to:

- [ ] Implement comprehensive error handling with try/except
- [ ] Configure timeouts to prevent hanging operations
- [ ] Build retry logic with exponential backoff
- [ ] Manage environment-specific configuration
- [ ] Implement production-grade logging
- [ ] Create robust API wrappers
- [ ] Handle rate limiting and API errors gracefully
- [ ] Build resilient agent workflows

## 🎯 Prerequisites

- Completed notebooks 02-07 (basic usage through agents)
- Understanding of Python error handling
- Familiarity with environment variables
- Basic knowledge of logging

## ⏱️ Estimated Time: 20 minutes

## 1️⃣ Why Production Patterns Matter

**Development vs. Production:**

| Development | Production |
|-------------|------------|
| ❌ Assume everything works | ✅ Expect failures |
| ❌ Let errors crash | ✅ Handle errors gracefully |
| ❌ Wait indefinitely | ✅ Use timeouts |
| ❌ Retry manually | ✅ Automatic retry with backoff |
| ❌ Print statements | ✅ Structured logging |
| ❌ Hardcoded values | ✅ Environment configuration |

**Common Production Issues:**
- Network failures
- API rate limits
- Timeouts
- Invalid responses
- Model unavailability
- Resource exhaustion

**Goal:** Build systems that degrade gracefully and recover automatically.

## 2️⃣ Error Handling Patterns

Proper error handling is the foundation of production systems.

In [None]:
from local_llm_sdk import LocalLLMClient
from local_llm_sdk.exceptions import (
    LLMError,
    TimeoutError,
    APIError,
    ValidationError
)
import logging

# Setup logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# Create client
client = LocalLLMClient(
    base_url="http://169.254.83.107:1234/v1",
    model="your-model-name",
    timeout=300  # 5 minutes
)

print("✅ Client configured with production settings")

### Basic Error Handling

In [None]:
def safe_chat(prompt: str, client: LocalLLMClient) -> str:
    """
    Chat with comprehensive error handling.
    
    Returns response on success, error message on failure.
    """
    try:
        response = client.chat(prompt)
        logger.info(f"Chat successful for prompt: {prompt[:50]}...")
        return response
        
    except TimeoutError as e:
        logger.error(f"Timeout occurred: {e}")
        return "Error: Request timed out. Please try again with a simpler prompt."
        
    except APIError as e:
        logger.error(f"API error: {e}")
        return f"Error: API request failed. Details: {str(e)}"
        
    except ValidationError as e:
        logger.error(f"Validation error: {e}")
        return f"Error: Invalid request. Details: {str(e)}"
        
    except LLMError as e:
        logger.error(f"LLM error: {e}")
        return f"Error: LLM service error. Details: {str(e)}"
        
    except Exception as e:
        logger.exception(f"Unexpected error: {e}")
        return f"Error: An unexpected error occurred. Please contact support."

# Test it
response = safe_chat("What is 5 + 5?", client)
print(f"\nResponse: {response}")

### Error Handling with Context

In [None]:
from typing import Optional, Dict, Any
from dataclasses import dataclass

@dataclass
class ChatResult:
    """Structured result with success status and optional error."""
    success: bool
    response: Optional[str] = None
    error: Optional[str] = None
    error_type: Optional[str] = None
    metadata: Dict[str, Any] = None

def robust_chat(prompt: str, client: LocalLLMClient) -> ChatResult:
    """
    Chat with structured error handling.
    
    Returns ChatResult with success flag and details.
    """
    try:
        response = client.chat(prompt)
        return ChatResult(
            success=True,
            response=response,
            metadata={"prompt_length": len(prompt)}
        )
        
    except TimeoutError as e:
        return ChatResult(
            success=False,
            error=str(e),
            error_type="timeout"
        )
        
    except APIError as e:
        return ChatResult(
            success=False,
            error=str(e),
            error_type="api_error"
        )
        
    except Exception as e:
        return ChatResult(
            success=False,
            error=str(e),
            error_type="unknown"
        )

# Test it
result = robust_chat("Calculate 123 * 456", client)

if result.success:
    print(f"✅ Success: {result.response}")
else:
    print(f"❌ Failed: {result.error_type} - {result.error}")

## 3️⃣ Timeout Configuration

Timeouts prevent operations from hanging indefinitely.

In [None]:
# Configure different timeouts for different operations

# Short timeout for simple queries
quick_client = LocalLLMClient(
    base_url="http://169.254.83.107:1234/v1",
    model="your-model-name",
    timeout=30  # 30 seconds
)

# Standard timeout
standard_client = LocalLLMClient(
    base_url="http://169.254.83.107:1234/v1",
    model="your-model-name",
    timeout=120  # 2 minutes
)

# Long timeout for complex operations
patient_client = LocalLLMClient(
    base_url="http://169.254.83.107:1234/v1",
    model="your-model-name",
    timeout=300  # 5 minutes
)

print("✅ Clients configured with different timeouts:")
print("   - Quick: 30s for simple queries")
print("   - Standard: 120s for typical operations")
print("   - Patient: 300s for complex tasks")

### Operation-Specific Timeouts

In [None]:
def chat_with_timeout(
    prompt: str,
    client: LocalLLMClient,
    timeout: int = 60
) -> ChatResult:
    """
    Chat with configurable timeout.
    """
    import signal
    from contextlib import contextmanager
    
    @contextmanager
    def time_limit(seconds):
        def signal_handler(signum, frame):
            raise TimeoutError(f"Operation exceeded {seconds}s timeout")
        signal.signal(signal.SIGALRM, signal_handler)
        signal.alarm(seconds)
        try:
            yield
        finally:
            signal.alarm(0)
    
    try:
        with time_limit(timeout):
            response = client.chat(prompt)
            return ChatResult(success=True, response=response)
    except TimeoutError as e:
        return ChatResult(success=False, error=str(e), error_type="timeout")
    except Exception as e:
        return ChatResult(success=False, error=str(e), error_type="error")

# Use with different timeouts
quick_result = chat_with_timeout("What is 2+2?", client, timeout=10)
print(f"Quick query: {quick_result.success}")

complex_result = chat_with_timeout(
    "Explain quantum computing in detail",
    client,
    timeout=120
)
print(f"Complex query: {complex_result.success}")

## 4️⃣ Retry Logic with Exponential Backoff

Automatically retry failed operations with increasing delays.

In [None]:
import time
from typing import Callable, Any

def exponential_backoff_retry(
    func: Callable,
    max_retries: int = 3,
    initial_delay: float = 1.0,
    exponential_base: float = 2.0,
    max_delay: float = 60.0,
    retriable_exceptions: tuple = (APIError, TimeoutError)
) -> Any:
    """
    Retry function with exponential backoff.
    
    Delays: 1s, 2s, 4s, 8s, etc. (capped at max_delay)
    """
    for attempt in range(max_retries):
        try:
            return func()
            
        except retriable_exceptions as e:
            if attempt == max_retries - 1:
                # Last attempt, re-raise
                logger.error(f"All {max_retries} retries exhausted")
                raise
            
            # Calculate delay with exponential backoff
            delay = min(
                initial_delay * (exponential_base ** attempt),
                max_delay
            )
            
            logger.warning(
                f"Attempt {attempt + 1}/{max_retries} failed: {e}. "
                f"Retrying in {delay}s..."
            )
            time.sleep(delay)
        
        except Exception as e:
            # Non-retriable error, fail immediately
            logger.error(f"Non-retriable error: {e}")
            raise

# Example usage
def make_chat_call():
    return client.chat("What is artificial intelligence?")

try:
    response = exponential_backoff_retry(make_chat_call, max_retries=3)
    print(f"✅ Success: {response[:100]}...")
except Exception as e:
    print(f"❌ Failed after retries: {e}")

### Decorator Pattern for Retries

In [None]:
from functools import wraps

def retry_with_backoff(
    max_retries: int = 3,
    initial_delay: float = 1.0,
    exponential_base: float = 2.0
):
    """
    Decorator for automatic retries with exponential backoff.
    """
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_retries):
                try:
                    return func(*args, **kwargs)
                except (APIError, TimeoutError) as e:
                    if attempt == max_retries - 1:
                        raise
                    delay = initial_delay * (exponential_base ** attempt)
                    logger.warning(f"Retry {attempt + 1}/{max_retries} after {delay}s")
                    time.sleep(delay)
            return None
        return wrapper
    return decorator

# Use as decorator
@retry_with_backoff(max_retries=3, initial_delay=1.0)
def resilient_chat(prompt: str) -> str:
    """Chat with automatic retries."""
    return client.chat(prompt)

# Now this will automatically retry on failures
response = resilient_chat("Explain machine learning briefly")
print(f"Response: {response[:150]}...")

## 5️⃣ Environment Configuration

Use environment variables for configuration, not hardcoded values.

In [None]:
import os
from typing import Optional

class Config:
    """Centralized configuration management."""
    
    # LLM Configuration
    LLM_BASE_URL: str = os.getenv(
        "LLM_BASE_URL",
        "http://169.254.83.107:1234/v1"  # default
    )
    LLM_MODEL: str = os.getenv("LLM_MODEL", "default-model")
    LLM_TIMEOUT: int = int(os.getenv("LLM_TIMEOUT", "300"))
    LLM_TEMPERATURE: float = float(os.getenv("LLM_TEMPERATURE", "0.7"))
    
    # Retry Configuration
    MAX_RETRIES: int = int(os.getenv("MAX_RETRIES", "3"))
    RETRY_INITIAL_DELAY: float = float(os.getenv("RETRY_INITIAL_DELAY", "1.0"))
    RETRY_MAX_DELAY: float = float(os.getenv("RETRY_MAX_DELAY", "60.0"))
    
    # Feature Flags
    ENABLE_TRACING: bool = os.getenv("ENABLE_TRACING", "false").lower() == "true"
    ENABLE_CACHING: bool = os.getenv("ENABLE_CACHING", "false").lower() == "true"
    
    # Environment
    ENV: str = os.getenv("ENV", "development")  # development, staging, production
    
    @classmethod
    def is_production(cls) -> bool:
        return cls.ENV == "production"
    
    @classmethod
    def is_development(cls) -> bool:
        return cls.ENV == "development"

# Use configuration
config_client = LocalLLMClient(
    base_url=Config.LLM_BASE_URL,
    model=Config.LLM_MODEL,
    timeout=Config.LLM_TIMEOUT,
    temperature=Config.LLM_TEMPERATURE,
    enable_tracing=Config.ENABLE_TRACING
)

print("✅ Client configured from environment:")
print(f"   Base URL: {Config.LLM_BASE_URL}")
print(f"   Timeout: {Config.LLM_TIMEOUT}s")
print(f"   Max Retries: {Config.MAX_RETRIES}")
print(f"   Environment: {Config.ENV}")
print(f"   Tracing: {Config.ENABLE_TRACING}")

### .env File Support

In [None]:
# Install python-dotenv if needed:
# !pip install python-dotenv

from dotenv import load_dotenv

# Load from .env file
load_dotenv()

# Now environment variables are loaded from .env
print("✅ Environment variables loaded from .env file")

# Example .env file:
"""
# .env
LLM_BASE_URL=http://localhost:1234/v1
LLM_MODEL=my-model
LLM_TIMEOUT=300
LLM_TEMPERATURE=0.7
MAX_RETRIES=5
ENABLE_TRACING=true
ENV=production
"""
print("\n💡 Create a .env file with the above configuration")

## 6️⃣ Production Logging

Structured logging for production systems.

In [None]:
import logging
import json
from datetime import datetime

class ProductionLogger:
    """Structured logging for production."""
    
    def __init__(self, name: str):
        self.logger = logging.getLogger(name)
        self.setup_logging()
    
    def setup_logging(self):
        """Configure structured logging."""
        handler = logging.StreamHandler()
        
        # JSON formatter for structured logs
        class JSONFormatter(logging.Formatter):
            def format(self, record):
                log_data = {
                    'timestamp': datetime.utcnow().isoformat(),
                    'level': record.levelname,
                    'logger': record.name,
                    'message': record.getMessage(),
                }
                
                # Add exception info if present
                if record.exc_info:
                    log_data['exception'] = self.formatException(record.exc_info)
                
                # Add custom fields
                if hasattr(record, 'extra_data'):
                    log_data.update(record.extra_data)
                
                return json.dumps(log_data)
        
        handler.setFormatter(JSONFormatter())
        self.logger.addHandler(handler)
        self.logger.setLevel(logging.INFO)
    
    def log_llm_call(
        self,
        prompt: str,
        response: Optional[str],
        duration_ms: float,
        success: bool,
        error: Optional[str] = None
    ):
        """Log LLM API call with metadata."""
        extra_data = {
            'event_type': 'llm_call',
            'prompt_length': len(prompt),
            'response_length': len(response) if response else 0,
            'duration_ms': duration_ms,
            'success': success,
        }
        
        if error:
            extra_data['error'] = error
        
        log_record = self.logger.makeRecord(
            self.logger.name,
            logging.INFO if success else logging.ERROR,
            "", 0, "LLM call completed", (), None
        )
        log_record.extra_data = extra_data
        self.logger.handle(log_record)

# Usage
prod_logger = ProductionLogger("llm_app")

# Log a call
start_time = time.time()
response = client.chat("Hello!")
duration = (time.time() - start_time) * 1000

prod_logger.log_llm_call(
    prompt="Hello!",
    response=response,
    duration_ms=duration,
    success=True
)

print("✅ Call logged with structured metadata")

## 7️⃣ Complete Production Wrapper

Putting it all together: a production-ready API wrapper.

In [None]:
class ProductionLLMClient:
    """
    Production-ready LLM client wrapper.
    
    Features:
    - Automatic retries with exponential backoff
    - Comprehensive error handling
    - Structured logging
    - Timeout management
    - Environment configuration
    """
    
    def __init__(self):
        self.client = LocalLLMClient(
            base_url=Config.LLM_BASE_URL,
            model=Config.LLM_MODEL,
            timeout=Config.LLM_TIMEOUT,
            temperature=Config.LLM_TEMPERATURE,
            enable_tracing=Config.ENABLE_TRACING
        )
        self.logger = ProductionLogger("production_llm_client")
        self.max_retries = Config.MAX_RETRIES
    
    def chat(
        self,
        prompt: str,
        timeout: Optional[int] = None,
        max_retries: Optional[int] = None
    ) -> ChatResult:
        """
        Chat with full production features.
        """
        max_retries = max_retries or self.max_retries
        start_time = time.time()
        
        for attempt in range(max_retries):
            try:
                response = self.client.chat(prompt)
                duration_ms = (time.time() - start_time) * 1000
                
                # Log success
                self.logger.log_llm_call(
                    prompt=prompt,
                    response=response,
                    duration_ms=duration_ms,
                    success=True
                )
                
                return ChatResult(
                    success=True,
                    response=response,
                    metadata={"duration_ms": duration_ms, "attempts": attempt + 1}
                )
                
            except (APIError, TimeoutError) as e:
                duration_ms = (time.time() - start_time) * 1000
                
                if attempt == max_retries - 1:
                    # Last attempt failed
                    self.logger.log_llm_call(
                        prompt=prompt,
                        response=None,
                        duration_ms=duration_ms,
                        success=False,
                        error=str(e)
                    )
                    
                    return ChatResult(
                        success=False,
                        error=f"Failed after {max_retries} attempts: {str(e)}",
                        error_type=type(e).__name__
                    )
                
                # Retry with backoff
                delay = Config.RETRY_INITIAL_DELAY * (2 ** attempt)
                delay = min(delay, Config.RETRY_MAX_DELAY)
                
                logger.warning(
                    f"Attempt {attempt + 1}/{max_retries} failed. "
                    f"Retrying in {delay}s..."
                )
                time.sleep(delay)
                
            except Exception as e:
                # Non-retriable error
                duration_ms = (time.time() - start_time) * 1000
                
                self.logger.log_llm_call(
                    prompt=prompt,
                    response=None,
                    duration_ms=duration_ms,
                    success=False,
                    error=str(e)
                )
                
                return ChatResult(
                    success=False,
                    error=f"Non-retriable error: {str(e)}",
                    error_type="unknown"
                )
        
        return ChatResult(
            success=False,
            error="Unexpected: exhausted retries without exception",
            error_type="unknown"
        )

# Create production client
prod_client = ProductionLLMClient()

print("✅ Production client initialized with:")
print(f"   - Automatic retries ({Config.MAX_RETRIES})")
print(f"   - Timeout management ({Config.LLM_TIMEOUT}s)")
print(f"   - Structured logging")
print(f"   - Error handling")
print(f"   - Environment config")

### Test the Production Client

In [None]:
# Test successful call
result = prod_client.chat("What is the capital of France?")

print("\n🧪 Test Results:\n")
print(f"Success: {result.success}")

if result.success:
    print(f"Response: {result.response}")
    print(f"Duration: {result.metadata.get('duration_ms', 0):.2f}ms")
    print(f"Attempts: {result.metadata.get('attempts', 0)}")
else:
    print(f"Error Type: {result.error_type}")
    print(f"Error: {result.error}")

## 🏋️ Exercise: Build a Robust Agent Wrapper

**Challenge:** Create a production-ready agent wrapper with:

1. Error handling for agent failures
2. Retry logic for transient errors
3. Progress monitoring
4. Timeout management
5. Detailed logging

**Requirements:**
- Wrap the `client.react()` method
- Handle agent failures gracefully
- Log each step of agent execution
- Return structured result with success/failure status
- Support custom stop conditions

Try it yourself first!

In [None]:
# Your code here:



<details>
<summary>Click to see solution</summary>

```python
# Solution: Production Agent Wrapper

from dataclasses import dataclass
from typing import Optional
import time

@dataclass
class AgentJobResult:
    """Result of an agent job execution."""
    success: bool
    task: str
    result: Optional[str] = None
    error: Optional[str] = None
    steps_taken: int = 0
    duration_ms: float = 0
    attempts: int = 1

class ProductionAgentRunner:
    """
    Production-ready agent execution wrapper.
    """
    
    def __init__(self, client: LocalLLMClient):
        self.client = client
        self.logger = logging.getLogger("agent_runner")
    
    def run_agent(
        self,
        task: str,
        max_steps: int = 10,
        max_retries: int = 2,
        timeout: int = 300
    ) -> AgentJobResult:
        """
        Run agent task with production features.
        """
        self.logger.info(f"Starting agent task: {task[:100]}...")
        start_time = time.time()
        
        for attempt in range(max_retries):
            try:
                self.logger.info(f"Attempt {attempt + 1}/{max_retries}")
                
                # Run agent
                result = self.client.react(
                    task,
                    max_iterations=max_steps
                )
                
                duration_ms = (time.time() - start_time) * 1000
                
                if result.status == "success":
                    self.logger.info(
                        f"Agent succeeded in {result.iterations} steps, "
                        f"{duration_ms:.2f}ms"
                    )
                    
                    return AgentJobResult(
                        success=True,
                        task=task,
                        result=result.final_response,
                        steps_taken=result.iterations,
                        duration_ms=duration_ms,
                        attempts=attempt + 1
                    )
                else:
                    self.logger.warning(
                        f"Agent did not succeed: {result.stop_reason}"
                    )
                    
                    if attempt < max_retries - 1:
                        # Retry
                        delay = 2 ** attempt
                        self.logger.info(f"Retrying in {delay}s...")
                        time.sleep(delay)
                    else:
                        # Last attempt, return failure
                        return AgentJobResult(
                            success=False,
                            task=task,
                            error=f"Agent stopped: {result.stop_reason}",
                            steps_taken=result.iterations,
                            duration_ms=duration_ms,
                            attempts=attempt + 1
                        )
                        
            except Exception as e:
                duration_ms = (time.time() - start_time) * 1000
                self.logger.error(f"Agent error: {e}")
                
                if attempt < max_retries - 1:
                    delay = 2 ** attempt
                    self.logger.info(f"Retrying in {delay}s...")
                    time.sleep(delay)
                else:
                    return AgentJobResult(
                        success=False,
                        task=task,
                        error=str(e),
                        duration_ms=duration_ms,
                        attempts=attempt + 1
                    )
        
        return AgentJobResult(
            success=False,
            task=task,
            error="Unexpected: exhausted retries"
        )

# Test it
test_client = LocalLLMClient(
    base_url="http://169.254.83.107:1234/v1",
    model="mistralai/magistral-small-2509"
)
# Register built-in tools
test_client.register_tools_from(None)

runner = ProductionAgentRunner(test_client)

result = runner.run_agent(
    "Calculate 15 factorial, then count how many digits it has",
    max_iterations=8,
    max_retries=2
)

print("\n🤖 Agent Job Results:\n")
print(f"Success: {result.success}")
print(f"Task: {result.task}")
print(f"Iterations: {result.iterations}")
print(f"Duration: {result.duration_ms:.2f}ms")
print(f"Attempts: {result.attempts}")

if result.success:
    print(f"\nResult: {result.result}")
else:
    print(f"\nError: {result.error}")
```
</details>

In [None]:
# Solution cell (run to see the answer)
# [Solution code from above would go here]
print("See the solution in the dropdown above!")

## ⚠️ Common Pitfalls

### 1. Not Handling Specific Exceptions
```python
# ❌ Bad: Catch-all exception handling
try:
    response = client.chat(prompt)
except Exception:
    pass  # Silently fails, no insight

# ✅ Good: Handle specific exceptions
try:
    response = client.chat(prompt)
except TimeoutError:
    # Handle timeout specifically
except APIError:
    # Handle API errors specifically
```

### 2. Infinite Retries
```python
# ❌ Bad: Infinite retry loop
while True:
    try:
        return client.chat(prompt)
    except:
        continue  # Could run forever!

# ✅ Good: Limited retries
for attempt in range(max_retries):
    try:
        return client.chat(prompt)
    except:
        if attempt == max_retries - 1:
            raise
```

### 3. No Timeout
```python
# ⚠️ Warning: Could hang forever
client = LocalLLMClient(base_url="...", model="...")
# No timeout set!

# ✅ Good: Always set timeout
client = LocalLLMClient(
    base_url="...",
    model="...",
    timeout=300
)
```

### 4. Hardcoded Configuration
```python
# ❌ Bad: Hardcoded values
client = LocalLLMClient(
    base_url="http://localhost:1234/v1",
    model="my-model"
)

# ✅ Good: Environment-based config
client = LocalLLMClient(
    base_url=os.getenv("LLM_BASE_URL"),
    model=os.getenv("LLM_MODEL")
)
```

### 5. Poor Logging
```python
# ❌ Bad: Print statements
print("Calling LLM...")
response = client.chat(prompt)
print("Done")

# ✅ Good: Structured logging
logger.info("Calling LLM", extra={"prompt_length": len(prompt)})
response = client.chat(prompt)
logger.info("LLM call completed", extra={"response_length": len(response)})
```

## 🎓 What You Learned

✅ **Error Handling**: Comprehensive try/except patterns for different failure modes

✅ **Timeouts**: Preventing hanging operations with configurable timeouts

✅ **Retry Logic**: Exponential backoff for transient failures

✅ **Environment Config**: Using environment variables and .env files

✅ **Structured Logging**: JSON logging with metadata for production

✅ **Production Wrapper**: Complete production-ready client implementation

✅ **Best Practices**: When and how to use each pattern effectively

## 🚀 Next Steps

You've mastered production patterns! Now let's apply everything you've learned in hands-on projects.

➡️ Continue to [10-mini-project-code-helper.ipynb](./10-mini-project-code-helper.ipynb) to build:
- A complete code review assistant agent
- Filesystem operations for loading code
- Code analysis with execute_python
- Automated testing and bug detection
- Fix suggestions and application