# LSM-006: Production Monitoring - Enterprise-Grade Operations

## üéØ Learning Objectives

By the end of this notebook, you will:
- Implement comprehensive production monitoring for LLM applications
- Set up real-time alerts and performance dashboards
- Master cost tracking and optimization strategies
- Build automated quality monitoring systems
- Integrate LangSmith with enterprise monitoring infrastructure

## üè≠ Production Monitoring Overview

Production monitoring for LLM applications requires a multi-layered approach:

### üìä Key Monitoring Dimensions

**Performance Metrics**:
- Latency (P50, P95, P99)
- Throughput (requests per second)
- Error rates and failure patterns
- Token usage and processing speed

**Quality Metrics**:
- Response quality scores
- Semantic drift detection
- User satisfaction ratings
- Safety and compliance violations

**Cost Metrics**:
- Token consumption per model
- Cost per request/user/session
- Resource utilization efficiency
- Budget threshold alerts

**Operational Metrics**:
- System availability
- Deployment health
- Data pipeline status
- Infrastructure performance

## üõ†Ô∏è Environment Setup

Let's set up our production monitoring environment with comprehensive instrumentation.

In [None]:
import os
from datetime import datetime, timedelta
import asyncio
from typing import Dict, List, Optional, Any
import json
from dataclasses import dataclass
from enum import Enum
import time
import random

# LangSmith and LangChain imports
from langsmith import Client, traceable
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Monitoring and alerting
import logging
from collections import defaultdict, deque
import statistics
import threading
from concurrent.futures import ThreadPoolExecutor

In [None]:
# Configure environment
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "production-monitoring-demo"

# Initialize clients
client = Client()
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)

print("‚úÖ Production monitoring environment configured")

## üìà Real-Time Performance Monitoring

Let's build a comprehensive performance monitoring system that tracks key metrics in real-time.

In [None]:
class AlertSeverity(Enum):
    INFO = "info"
    WARNING = "warning"
    CRITICAL = "critical"

@dataclass
class MetricPoint:
    timestamp: datetime
    value: float
    metadata: Dict[str, Any]

@dataclass
class Alert:
    severity: AlertSeverity
    message: str
    timestamp: datetime
    metric: str
    value: float
    threshold: float

class ProductionMonitor:
    def __init__(self, window_size: int = 100):
        self.window_size = window_size
        self.metrics = defaultdict(lambda: deque(maxlen=window_size))
        self.alerts = deque(maxlen=1000)
        self.thresholds = {
            'latency_p95': 5.0,  # seconds
            'error_rate': 0.05,  # 5%
            'token_usage_spike': 2.0,  # 2x normal
            'cost_per_hour': 100.0,  # dollars
            'quality_score': 0.7  # minimum acceptable
        }
        self.baseline_metrics = {}
        self._lock = threading.Lock()
    
    def record_metric(self, metric_name: str, value: float, metadata: Dict = None):
        """Record a metric point with timestamp and metadata"""
        with self._lock:
            point = MetricPoint(
                timestamp=datetime.now(),
                value=value,
                metadata=metadata or {}
            )
            self.metrics[metric_name].append(point)
            
            # Check for alerts
            self._check_alerts(metric_name, value)
    
    def _check_alerts(self, metric_name: str, value: float):
        """Check if metric value triggers any alerts"""
        threshold = self.thresholds.get(metric_name)
        if not threshold:
            return
        
        severity = None
        message = None
        
        if metric_name == 'latency_p95' and value > threshold:
            severity = AlertSeverity.WARNING if value < threshold * 1.5 else AlertSeverity.CRITICAL
            message = f"High latency detected: {value:.2f}s (threshold: {threshold}s)"
        
        elif metric_name == 'error_rate' and value > threshold:
            severity = AlertSeverity.WARNING if value < threshold * 2 else AlertSeverity.CRITICAL
            message = f"High error rate: {value:.1%} (threshold: {threshold:.1%})"
        
        elif metric_name == 'quality_score' and value < threshold:
            severity = AlertSeverity.WARNING if value > threshold * 0.8 else AlertSeverity.CRITICAL
            message = f"Low quality score: {value:.2f} (threshold: {threshold})"
        
        if severity and message:
            alert = Alert(
                severity=severity,
                message=message,
                timestamp=datetime.now(),
                metric=metric_name,
                value=value,
                threshold=threshold
            )
            self.alerts.append(alert)
            print(f"üö® {severity.value.upper()}: {message}")
    
    def get_statistics(self, metric_name: str) -> Dict[str, float]:
        """Get statistical summary of a metric"""
        with self._lock:
            points = list(self.metrics[metric_name])
            
        if not points:
            return {}
        
        values = [p.value for p in points]
        
        return {
            'count': len(values),
            'mean': statistics.mean(values),
            'median': statistics.median(values),
            'std': statistics.stdev(values) if len(values) > 1 else 0,
            'min': min(values),
            'max': max(values),
            'p95': self._percentile(values, 0.95),
            'p99': self._percentile(values, 0.99)
        }
    
    def _percentile(self, values: List[float], p: float) -> float:
        """Calculate percentile of values"""
        sorted_values = sorted(values)
        index = int(p * len(sorted_values))
        return sorted_values[min(index, len(sorted_values) - 1)]
    
    def get_recent_alerts(self, hours: int = 24) -> List[Alert]:
        """Get recent alerts within specified hours"""
        cutoff = datetime.now() - timedelta(hours=hours)
        return [alert for alert in self.alerts if alert.timestamp > cutoff]

# Initialize production monitor
monitor = ProductionMonitor()
print("üìä Production monitor initialized")

## üîç Instrumented Application

Let's create a production-ready application with comprehensive instrumentation.

In [None]:
class ProductionLLMService:
    def __init__(self, monitor: ProductionMonitor):
        self.monitor = monitor
        self.llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)
        self.request_count = 0
        self.error_count = 0
        
        # Quality evaluation prompt
        self.quality_evaluator = ChatOpenAI(model="gpt-4o", temperature=0)
        
    @traceable(name="production_customer_service")
    def handle_customer_inquiry(self, user_message: str, context: Dict = None) -> Dict[str, Any]:
        """Handle customer service inquiry with full monitoring"""
        start_time = time.time()
        self.request_count += 1
        
        try:
            # Create system prompt
            system_prompt = """
            You are a helpful customer service representative. 
            Provide accurate, empathetic, and professional responses.
            If you cannot help with something, clearly explain limitations.
            """
            
            # Generate response
            messages = [
                SystemMessage(content=system_prompt),
                HumanMessage(content=user_message)
            ]
            
            response = self.llm.invoke(messages)
            response_text = response.content
            
            # Calculate metrics
            latency = time.time() - start_time
            token_count = len(user_message.split()) + len(response_text.split())
            
            # Estimate cost (approximate)
            cost = self._estimate_cost(token_count, "gpt-4o-mini")
            
            # Record performance metrics
            self.monitor.record_metric('latency', latency, {
                'model': 'gpt-4o-mini',
                'tokens': token_count
            })
            
            self.monitor.record_metric('token_usage', token_count, {
                'model': 'gpt-4o-mini',
                'request_type': 'customer_service'
            })
            
            self.monitor.record_metric('cost', cost, {
                'model': 'gpt-4o-mini'
            })
            
            # Evaluate quality asynchronously
            quality_score = self._evaluate_response_quality(user_message, response_text)
            self.monitor.record_metric('quality_score', quality_score)
            
            # Calculate and record error rate
            error_rate = self.error_count / self.request_count
            self.monitor.record_metric('error_rate', error_rate)
            
            return {
                'response': response_text,
                'metrics': {
                    'latency': latency,
                    'tokens': token_count,
                    'cost': cost,
                    'quality_score': quality_score
                },
                'status': 'success'
            }
            
        except Exception as e:
            self.error_count += 1
            error_rate = self.error_count / self.request_count
            self.monitor.record_metric('error_rate', error_rate)
            
            return {
                'error': str(e),
                'status': 'error',
                'metrics': {
                    'latency': time.time() - start_time
                }
            }
    
    def _estimate_cost(self, token_count: int, model: str) -> float:
        """Estimate cost based on token usage"""
        # Approximate pricing (as of 2025)
        rates = {
            'gpt-4o-mini': {'input': 0.00015, 'output': 0.0006},  # per 1K tokens
            'gpt-4o': {'input': 0.005, 'output': 0.015}
        }
        
        if model not in rates:
            return 0.0
        
        # Assume 60% input, 40% output tokens
        input_tokens = int(token_count * 0.6)
        output_tokens = int(token_count * 0.4)
        
        cost = (
            (input_tokens / 1000) * rates[model]['input'] +
            (output_tokens / 1000) * rates[model]['output']
        )
        
        return cost
    
    def _evaluate_response_quality(self, question: str, response: str) -> float:
        """Evaluate response quality using LLM-as-judge"""
        try:
            eval_prompt = f"""
            Evaluate the quality of this customer service response on a scale of 0-1:
            
            Customer Question: {question}
            Response: {response}
            
            Consider:
            - Relevance and accuracy
            - Helpfulness and completeness
            - Professional tone
            - Clarity and coherence
            
            Respond with only a number between 0 and 1.
            """
            
            result = self.quality_evaluator.invoke([HumanMessage(content=eval_prompt)])
            score = float(result.content.strip())
            return max(0, min(1, score))  # Clamp to [0,1]
            
        except:
            return 0.5  # Default neutral score if evaluation fails

# Initialize production service
service = ProductionLLMService(monitor)
print("üè≠ Production LLM service initialized")

## üß™ Load Testing and Monitoring

Let's simulate production load to see our monitoring system in action.

In [None]:
# Sample customer inquiries for load testing
sample_inquiries = [
    "I need help with my order #12345. It hasn't arrived yet.",
    "Can you explain your return policy?",
    "I was charged twice for the same item. Please help.",
    "How do I change my shipping address?",
    "Is this product compatible with iPhone 15?",
    "I want to cancel my subscription.",
    "The product I received is damaged. What should I do?",
    "Can I get a refund for this item?",
    "When will this product be back in stock?",
    "I forgot my password. How do I reset it?"
]

async def simulate_load_test(num_requests: int = 20):
    """Simulate production load with concurrent requests"""
    print(f"üöÄ Starting load test with {num_requests} requests...")
    
    async def make_request(request_id: int):
        # Random delay to simulate real traffic patterns
        await asyncio.sleep(random.uniform(0, 2))
        
        inquiry = random.choice(sample_inquiries)
        
        # Simulate occasional errors (5% failure rate)
        if random.random() < 0.05:
            # Simulate timeout or API error
            time.sleep(random.uniform(8, 12))  # Long delay
            raise Exception("API timeout")
        
        result = service.handle_customer_inquiry(inquiry)
        print(f"‚úÖ Request {request_id}: {result['status']} (latency: {result['metrics']['latency']:.2f}s)")
        return result
    
    # Execute requests concurrently
    tasks = [make_request(i) for i in range(num_requests)]
    
    try:
        results = await asyncio.gather(*tasks, return_exceptions=True)
        
        # Count successes and failures
        successes = sum(1 for r in results if isinstance(r, dict) and r.get('status') == 'success')
        failures = len(results) - successes
        
        print(f"\nüìä Load test completed: {successes} successes, {failures} failures")
        
    except Exception as e:
        print(f"‚ùå Load test error: {e}")

# Run load test
await simulate_load_test(15)

## üìä Dashboard and Analytics

Let's create a comprehensive dashboard to visualize our production metrics.

In [None]:
def display_production_dashboard():
    """Display comprehensive production dashboard"""
    print("\n" + "="*60)
    print("üè≠ PRODUCTION DASHBOARD")
    print("="*60)
    
    # Performance metrics
    print("\nüìà PERFORMANCE METRICS")
    print("-" * 30)
    
    latency_stats = monitor.get_statistics('latency')
    if latency_stats:
        print(f"Latency (seconds):")
        print(f"  Mean: {latency_stats['mean']:.3f}s")
        print(f"  P95:  {latency_stats['p95']:.3f}s")
        print(f"  P99:  {latency_stats['p99']:.3f}s")
        print(f"  Max:  {latency_stats['max']:.3f}s")
    
    error_stats = monitor.get_statistics('error_rate')
    if error_stats:
        current_error_rate = error_stats['mean']
        print(f"\nError Rate: {current_error_rate:.1%}")
        if current_error_rate > 0.05:
            print("  ‚ö†Ô∏è Above threshold (5%)")
        else:
            print("  ‚úÖ Within acceptable range")
    
    # Quality metrics
    print("\nüéØ QUALITY METRICS")
    print("-" * 30)
    
    quality_stats = monitor.get_statistics('quality_score')
    if quality_stats:
        print(f"Quality Score:")
        print(f"  Mean: {quality_stats['mean']:.3f}")
        print(f"  Min:  {quality_stats['min']:.3f}")
        print(f"  Max:  {quality_stats['max']:.3f}")
        
        if quality_stats['mean'] < 0.7:
            print("  ‚ö†Ô∏è Below target (0.7)")
        else:
            print("  ‚úÖ Meeting quality targets")
    
    # Cost metrics
    print("\nüí∞ COST METRICS")
    print("-" * 30)
    
    cost_stats = monitor.get_statistics('cost')
    token_stats = monitor.get_statistics('token_usage')
    
    if cost_stats:
        total_cost = sum(point.value for point in monitor.metrics['cost'])
        avg_cost_per_request = cost_stats['mean']
        print(f"Total Cost: ${total_cost:.4f}")
        print(f"Average Cost/Request: ${avg_cost_per_request:.4f}")
    
    if token_stats:
        total_tokens = sum(point.value for point in monitor.metrics['token_usage'])
        print(f"Total Tokens: {total_tokens:,}")
        print(f"Average Tokens/Request: {token_stats['mean']:.0f}")
    
    # Recent alerts
    print("\nüö® RECENT ALERTS (Last 24 hours)")
    print("-" * 30)
    
    recent_alerts = monitor.get_recent_alerts(24)
    if recent_alerts:
        for alert in recent_alerts[-5:]:  # Show last 5 alerts
            severity_icon = {
                AlertSeverity.INFO: "‚ÑπÔ∏è",
                AlertSeverity.WARNING: "‚ö†Ô∏è",
                AlertSeverity.CRITICAL: "üö®"
            }[alert.severity]
            
            print(f"{severity_icon} {alert.timestamp.strftime('%H:%M:%S')}: {alert.message}")
    else:
        print("‚úÖ No alerts in the last 24 hours")
    
    # System health summary
    print("\nüè• SYSTEM HEALTH SUMMARY")
    print("-" * 30)
    
    health_score = calculate_health_score()
    health_icon = "üü¢" if health_score > 0.8 else "üü°" if health_score > 0.6 else "üî¥"
    print(f"Overall Health: {health_icon} {health_score:.1%}")
    
    if health_score < 0.8:
        print("\nüîß RECOMMENDED ACTIONS:")
        if latency_stats and latency_stats['p95'] > 3:
            print("  ‚Ä¢ Investigate high latency issues")
        if error_stats and error_stats['mean'] > 0.05:
            print("  ‚Ä¢ Review and fix error patterns")
        if quality_stats and quality_stats['mean'] < 0.7:
            print("  ‚Ä¢ Improve prompt engineering and model selection")

def calculate_health_score() -> float:
    """Calculate overall system health score"""
    scores = []
    
    # Latency score (0-1, where 1 is best)
    latency_stats = monitor.get_statistics('latency')
    if latency_stats:
        latency_score = max(0, min(1, 1 - (latency_stats['p95'] - 1) / 4))  # 1s = 1.0, 5s = 0.0
        scores.append(latency_score)
    
    # Error rate score
    error_stats = monitor.get_statistics('error_rate')
    if error_stats:
        error_score = max(0, min(1, 1 - error_stats['mean'] / 0.1))  # 0% = 1.0, 10% = 0.0
        scores.append(error_score)
    
    # Quality score
    quality_stats = monitor.get_statistics('quality_score')
    if quality_stats:
        scores.append(quality_stats['mean'])
    
    return statistics.mean(scores) if scores else 0.5

# Display dashboard
display_production_dashboard()

## üîî Advanced Alerting System

Let's implement a sophisticated alerting system with different notification channels.

In [None]:
from abc import ABC, abstractmethod
from typing import List, Callable
import smtplib
from email.mime.text import MIMEText
import requests

class AlertChannel(ABC):
    """Abstract base class for alert notification channels"""
    
    @abstractmethod
    def send_alert(self, alert: Alert) -> bool:
        """Send alert notification"""
        pass

class ConsoleAlertChannel(AlertChannel):
    """Console-based alert notifications"""
    
    def send_alert(self, alert: Alert) -> bool:
        severity_icons = {
            AlertSeverity.INFO: "‚ÑπÔ∏è",
            AlertSeverity.WARNING: "‚ö†Ô∏è",
            AlertSeverity.CRITICAL: "üö®"
        }
        
        icon = severity_icons.get(alert.severity, "‚ùì")
        print(f"\n{icon} ALERT [{alert.severity.value.upper()}] - {alert.timestamp.strftime('%Y-%m-%d %H:%M:%S')}")
        print(f"   Metric: {alert.metric}")
        print(f"   Message: {alert.message}")
        print(f"   Value: {alert.value} (Threshold: {alert.threshold})")
        return True

class SlackAlertChannel(AlertChannel):
    """Slack webhook alert notifications"""
    
    def __init__(self, webhook_url: str):
        self.webhook_url = webhook_url
    
    def send_alert(self, alert: Alert) -> bool:
        """Send alert to Slack (simulated)"""
        # In real implementation, you would send to actual Slack webhook
        color = {
            AlertSeverity.INFO: "good",
            AlertSeverity.WARNING: "warning", 
            AlertSeverity.CRITICAL: "danger"
        }[alert.severity]
        
        payload = {
            "attachments": [{
                "color": color,
                "title": f"Production Alert - {alert.severity.value.title()}",
                "fields": [
                    {"title": "Metric", "value": alert.metric, "short": True},
                    {"title": "Value", "value": f"{alert.value}", "short": True},
                    {"title": "Threshold", "value": f"{alert.threshold}", "short": True},
                    {"title": "Message", "value": alert.message, "short": False}
                ],
                "timestamp": alert.timestamp.timestamp()
            }]
        }
        
        print(f"üì± [SIMULATED] Slack alert sent: {alert.message}")
        # requests.post(self.webhook_url, json=payload)
        return True

class EmailAlertChannel(AlertChannel):
    """Email alert notifications"""
    
    def __init__(self, smtp_host: str, smtp_port: int, username: str, password: str, recipients: List[str]):
        self.smtp_host = smtp_host
        self.smtp_port = smtp_port
        self.username = username
        self.password = password
        self.recipients = recipients
    
    def send_alert(self, alert: Alert) -> bool:
        """Send alert via email (simulated)"""
        subject = f"Production Alert [{alert.severity.value.upper()}] - {alert.metric}"
        
        body = f"""
Production Alert Details:

Severity: {alert.severity.value.upper()}
Metric: {alert.metric}
Value: {alert.value}
Threshold: {alert.threshold}
Timestamp: {alert.timestamp}

Message: {alert.message}

Please investigate and take appropriate action.

Best regards,
Production Monitoring System
        """
        
        print(f"üìß [SIMULATED] Email alert sent to {len(self.recipients)} recipients")
        print(f"   Subject: {subject}")
        # In real implementation:
        # msg = MIMEText(body)
        # msg['Subject'] = subject
        # msg['From'] = self.username
        # msg['To'] = ', '.join(self.recipients)
        # 
        # with smtplib.SMTP(self.smtp_host, self.smtp_port) as server:
        #     server.starttls()
        #     server.login(self.username, self.password)
        #     server.send_message(msg)
        
        return True

class AlertManager:
    """Manages alert routing and notification channels"""
    
    def __init__(self):
        self.channels: List[AlertChannel] = []
        self.rules: List[Callable[[Alert], bool]] = []
        self.alert_history = deque(maxlen=1000)
        self.suppression_rules = {}
    
    def add_channel(self, channel: AlertChannel):
        """Add alert notification channel"""
        self.channels.append(channel)
    
    def add_suppression_rule(self, metric: str, min_interval_minutes: int):
        """Add rule to suppress duplicate alerts"""
        self.suppression_rules[metric] = min_interval_minutes
    
    def should_suppress_alert(self, alert: Alert) -> bool:
        """Check if alert should be suppressed due to recent similar alerts"""
        if alert.metric not in self.suppression_rules:
            return False
        
        min_interval = timedelta(minutes=self.suppression_rules[alert.metric])
        cutoff_time = alert.timestamp - min_interval
        
        # Check for recent similar alerts
        for recent_alert in reversed(list(self.alert_history)):
            if recent_alert.timestamp < cutoff_time:
                break
            
            if (recent_alert.metric == alert.metric and 
                recent_alert.severity == alert.severity):
                return True
        
        return False
    
    def send_alert(self, alert: Alert):
        """Process and send alert through appropriate channels"""
        # Check suppression rules
        if self.should_suppress_alert(alert):
            print(f"üîá Alert suppressed: {alert.message}")
            return
        
        # Add to history
        self.alert_history.append(alert)
        
        # Send through all channels based on severity
        for channel in self.channels:
            try:
                if self._should_send_to_channel(alert, channel):
                    channel.send_alert(alert)
            except Exception as e:
                print(f"‚ùå Failed to send alert via {type(channel).__name__}: {e}")
    
    def _should_send_to_channel(self, alert: Alert, channel: AlertChannel) -> bool:
        """Determine if alert should be sent to specific channel"""
        # Send all alerts to console for demo
        if isinstance(channel, ConsoleAlertChannel):
            return True
        
        # Send WARNING and CRITICAL to Slack
        if isinstance(channel, SlackAlertChannel):
            return alert.severity in [AlertSeverity.WARNING, AlertSeverity.CRITICAL]
        
        # Send only CRITICAL to email
        if isinstance(channel, EmailAlertChannel):
            return alert.severity == AlertSeverity.CRITICAL
        
        return True

# Set up alert manager with multiple channels
alert_manager = AlertManager()
alert_manager.add_channel(ConsoleAlertChannel())
alert_manager.add_channel(SlackAlertChannel("https://hooks.slack.com/services/fake/webhook"))
alert_manager.add_channel(EmailAlertChannel(
    smtp_host="smtp.company.com",
    smtp_port=587,
    username="alerts@company.com",
    password="password",
    recipients=["devops@company.com", "team-lead@company.com"]
))

# Add suppression rules
alert_manager.add_suppression_rule('latency_p95', 15)  # Max 1 alert per 15 minutes
alert_manager.add_suppression_rule('error_rate', 10)   # Max 1 alert per 10 minutes

print("üîî Advanced alerting system configured")

# Test alerting system with sample alerts
test_alerts = [
    Alert(AlertSeverity.WARNING, "Test warning alert", datetime.now(), "test_metric", 0.8, 0.7),
    Alert(AlertSeverity.CRITICAL, "Test critical alert", datetime.now(), "test_metric_critical", 0.9, 0.5)
]

print("\nüß™ Testing alert system...")
for alert in test_alerts:
    alert_manager.send_alert(alert)

## üîó Integration with Enterprise Infrastructure

Let's explore how to integrate LangSmith with existing enterprise monitoring and observability tools.

In [None]:
# Integration patterns for enterprise infrastructure

class OpenTelemetryIntegration:
    """Integration with OpenTelemetry for distributed tracing"""
    
    def __init__(self):
        # In real implementation, you would configure OpenTelemetry
        # from opentelemetry import trace, metrics
        # from opentelemetry.exporter.jaeger.thrift import JaegerExporter
        # from opentelemetry.sdk.trace import TracerProvider
        # from opentelemetry.sdk.trace.export import BatchSpanProcessor
        pass
    
    def configure_tracing(self):
        """Configure OpenTelemetry tracing"""
        print("üîß Configuring OpenTelemetry tracing...")
        
        # Example configuration
        config_example = """
        # OpenTelemetry Configuration Example
        
        import os
        from opentelemetry import trace
        from opentelemetry.exporter.jaeger.thrift import JaegerExporter
        from opentelemetry.sdk.trace import TracerProvider
        from opentelemetry.sdk.trace.export import BatchSpanProcessor
        from opentelemetry.instrumentation.langchain import LangChainInstrumentor
        
        # Configure tracer
        trace.set_tracer_provider(TracerProvider())
        tracer = trace.get_tracer(__name__)
        
        # Configure Jaeger exporter
        jaeger_exporter = JaegerExporter(
            agent_host_name="jaeger-agent",
            agent_port=6831,
        )
        
        # Add span processor
        span_processor = BatchSpanProcessor(jaeger_exporter)
        trace.get_tracer_provider().add_span_processor(span_processor)
        
        # Auto-instrument LangChain
        LangChainInstrumentor().instrument()
        
        # Configure environment variables
        os.environ["LANGSMITH_TRACING_V2"] = "true"
        os.environ["LANGCHAIN_PROJECT"] = "production-app"
        """
        
        print(config_example)
        return config_example

class PrometheusIntegration:
    """Integration with Prometheus metrics collection"""
    
    def __init__(self):
        self.metrics_registry = {}
    
    def setup_metrics_export(self):
        """Setup Prometheus metrics export"""
        print("üìä Setting up Prometheus metrics export...")
        
        prometheus_config = """
        # Prometheus Integration Configuration
        
        from prometheus_client import Counter, Histogram, Gauge, start_http_server
        import time
        
        # Define metrics
        request_count = Counter('llm_requests_total', 
                               'Total LLM requests', 
                               ['model', 'status'])
        
        request_duration = Histogram('llm_request_duration_seconds',
                                   'LLM request duration',
                                   ['model'])
        
        token_usage = Counter('llm_tokens_total',
                            'Total tokens processed',
                            ['model', 'type'])
        
        quality_score = Gauge('llm_quality_score',
                            'Current quality score',
                            ['model'])
        
        # Start metrics server
        start_http_server(8000)
        
        # Example usage in your LLM application:
        @traceable(name="monitored_llm_call")
        def make_llm_call(prompt, model="gpt-4o-mini"):
            start_time = time.time()
            
            try:
                # Your LLM call logic here
                response = llm.invoke(prompt)
                
                # Record metrics
                request_count.labels(model=model, status='success').inc()
                request_duration.labels(model=model).observe(time.time() - start_time)
                token_usage.labels(model=model, type='input').inc(len(prompt.split()))
                token_usage.labels(model=model, type='output').inc(len(response.split()))
                
                return response
                
            except Exception as e:
                request_count.labels(model=model, status='error').inc()
                raise
        """
        
        print(prometheus_config)
        return prometheus_config

class DatadogIntegration:
    """Integration with Datadog APM and logging"""
    
    def setup_datadog_integration(self):
        """Setup Datadog integration"""
        print("üêï Setting up Datadog integration...")
        
        datadog_config = """
        # Datadog Integration Configuration
        
        from datadog import initialize, statsd
        import logging
        
        # Initialize Datadog
        options = {
            'api_key': os.getenv('DATADOG_API_KEY'),
            'app_key': os.getenv('DATADOG_APP_KEY')
        }
        initialize(**options)
        
        # Configure logging
        logging.basicConfig(
            format='%(asctime)s %(levelname)s %(name)s %(message)s',
            level=logging.INFO
        )
        
        # Custom metrics function
        def send_metrics(metric_name, value, tags=None):
            statsd.gauge(f'langsmith.{metric_name}', value, tags=tags or [])
        
        # Example usage:
        @traceable(name="datadog_monitored_call")
        def monitored_llm_call(prompt):
            start_time = time.time()
            
            try:
                response = llm.invoke(prompt)
                
                # Send custom metrics to Datadog
                duration = time.time() - start_time
                send_metrics('llm.duration', duration, ['model:gpt-4o-mini'])
                send_metrics('llm.tokens', len(prompt.split()), ['type:input'])
                send_metrics('llm.requests', 1, ['status:success'])
                
                # Log structured data
                logging.info('LLM call completed', extra={
                    'duration': duration,
                    'model': 'gpt-4o-mini',
                    'tokens': len(prompt.split()),
                    'dd.trace_id': get_trace_id(),  # Datadog trace correlation
                    'dd.span_id': get_span_id()
                })
                
                return response
                
            except Exception as e:
                send_metrics('llm.requests', 1, ['status:error'])
                logging.error(f'LLM call failed: {e}')
                raise
        """
        
        print(datadog_config)
        return datadog_config

# Demonstrate integrations
print("üè¢ Enterprise Infrastructure Integrations\n")

# OpenTelemetry
otel = OpenTelemetryIntegration()
otel.configure_tracing()

print("\n" + "="*60 + "\n")

# Prometheus
prometheus = PrometheusIntegration()
prometheus.setup_metrics_export()

print("\n" + "="*60 + "\n")

# Datadog
datadog = DatadogIntegration()
datadog.setup_datadog_integration()

print("\n‚úÖ All enterprise integrations configured")

## üí∞ Cost Optimization and Budget Management

Let's implement advanced cost tracking and optimization strategies.

In [None]:
from datetime import date, timedelta
from typing import Dict, List, Tuple
import calendar

class CostManager:
    """Advanced cost tracking and optimization"""
    
    def __init__(self):
        self.cost_history = deque(maxlen=10000)
        self.model_pricing = {
            'gpt-4o': {'input': 0.005, 'output': 0.015},
            'gpt-4o-mini': {'input': 0.00015, 'output': 0.0006},
            'claude-3-sonnet': {'input': 0.003, 'output': 0.015},
            'claude-3-haiku': {'input': 0.00025, 'output': 0.00125}
        }
        self.budgets = {}
        self.cost_alerts = []
    
    def set_budget(self, period: str, amount: float, alert_threshold: float = 0.8):
        """Set budget for a time period (daily, weekly, monthly)"""
        self.budgets[period] = {
            'amount': amount,
            'alert_threshold': alert_threshold,
            'start_date': date.today()
        }
    
    def track_cost(self, model: str, input_tokens: int, output_tokens: int, 
                   user_id: str = None, project: str = None):
        """Track cost for a specific model usage"""
        if model not in self.model_pricing:
            return 0
        
        pricing = self.model_pricing[model]
        cost = (
            (input_tokens / 1000) * pricing['input'] +
            (output_tokens / 1000) * pricing['output']
        )
        
        cost_record = {
            'timestamp': datetime.now(),
            'model': model,
            'input_tokens': input_tokens,
            'output_tokens': output_tokens,
            'cost': cost,
            'user_id': user_id,
            'project': project
        }
        
        self.cost_history.append(cost_record)
        self._check_budget_alerts()
        
        return cost
    
    def _check_budget_alerts(self):
        """Check if current spending exceeds budget thresholds"""
        today = date.today()
        
        for period, budget_info in self.budgets.items():
            if period == 'daily':
                period_cost = self.get_daily_cost(today)
            elif period == 'weekly':
                period_cost = self.get_weekly_cost(today)
            elif period == 'monthly':
                period_cost = self.get_monthly_cost(today.year, today.month)
            else:
                continue
            
            threshold_amount = budget_info['amount'] * budget_info['alert_threshold']
            
            if period_cost > threshold_amount:
                alert = {
                    'timestamp': datetime.now(),
                    'period': period,
                    'current_cost': period_cost,
                    'budget': budget_info['amount'],
                    'threshold': threshold_amount,
                    'percentage': (period_cost / budget_info['amount']) * 100
                }
                
                if period_cost > budget_info['amount']:
                    print(f"üö® BUDGET EXCEEDED: {period} spending (${period_cost:.2f}) exceeded budget (${budget_info['amount']:.2f})")
                else:
                    print(f"‚ö†Ô∏è BUDGET WARNING: {period} spending (${period_cost:.2f}) is {alert['percentage']:.1f}% of budget")
    
    def get_daily_cost(self, target_date: date) -> float:
        """Get total cost for a specific day"""
        start = datetime.combine(target_date, datetime.min.time())
        end = start + timedelta(days=1)
        
        return sum(record['cost'] for record in self.cost_history 
                  if start <= record['timestamp'] < end)
    
    def get_weekly_cost(self, target_date: date) -> float:
        """Get total cost for the week containing target_date"""
        week_start = target_date - timedelta(days=target_date.weekday())
        week_end = week_start + timedelta(days=7)
        
        start = datetime.combine(week_start, datetime.min.time())
        end = datetime.combine(week_end, datetime.min.time())
        
        return sum(record['cost'] for record in self.cost_history 
                  if start <= record['timestamp'] < end)
    
    def get_monthly_cost(self, year: int, month: int) -> float:
        """Get total cost for a specific month"""
        start = datetime(year, month, 1)
        if month == 12:
            end = datetime(year + 1, 1, 1)
        else:
            end = datetime(year, month + 1, 1)
        
        return sum(record['cost'] for record in self.cost_history 
                  if start <= record['timestamp'] < end)
    
    def get_cost_breakdown(self) -> Dict[str, Any]:
        """Get detailed cost breakdown"""
        if not self.cost_history:
            return {}
        
        total_cost = sum(record['cost'] for record in self.cost_history)
        
        # By model
        model_costs = defaultdict(float)
        model_tokens = defaultdict(lambda: {'input': 0, 'output': 0})
        
        for record in self.cost_history:
            model_costs[record['model']] += record['cost']
            model_tokens[record['model']]['input'] += record['input_tokens']
            model_tokens[record['model']]['output'] += record['output_tokens']
        
        # By project
        project_costs = defaultdict(float)
        for record in self.cost_history:
            project = record.get('project', 'unknown')
            project_costs[project] += record['cost']
        
        # By user
        user_costs = defaultdict(float)
        for record in self.cost_history:
            user = record.get('user_id', 'unknown')
            user_costs[user] += record['cost']
        
        return {
            'total_cost': total_cost,
            'total_records': len(self.cost_history),
            'by_model': dict(model_costs),
            'by_project': dict(project_costs),
            'by_user': dict(user_costs),
            'token_usage': dict(model_tokens)
        }
    
    def get_optimization_recommendations(self) -> List[str]:
        """Get cost optimization recommendations"""
        recommendations = []
        breakdown = self.get_cost_breakdown()
        
        if not breakdown:
            return recommendations
        
        # Check model usage patterns
        model_costs = breakdown['by_model']
        total_cost = breakdown['total_cost']
        
        # Recommend cheaper models if expensive ones dominate
        if model_costs.get('gpt-4o', 0) > total_cost * 0.6:
            recommendations.append(
                "Consider using gpt-4o-mini for simpler tasks to reduce costs by up to 95%"
            )
        
        # Check for high token usage
        token_usage = breakdown['token_usage']
        for model, tokens in token_usage.items():
            avg_tokens_per_request = (tokens['input'] + tokens['output']) / max(1, breakdown['total_records'])
            if avg_tokens_per_request > 2000:
                recommendations.append(
                    f"High token usage detected for {model} (avg: {avg_tokens_per_request:.0f} tokens/request). "
                    "Consider prompt optimization or input truncation."
                )
        
        # Check for uneven project distribution
        project_costs = breakdown['by_project']
        if len(project_costs) > 1:
            max_project_cost = max(project_costs.values())
            if max_project_cost > total_cost * 0.8:
                expensive_project = max(project_costs.keys(), key=project_costs.get)
                recommendations.append(
                    f"Project '{expensive_project}' accounts for {(max_project_cost/total_cost)*100:.1f}% "
                    "of total costs. Consider review and optimization."
                )
        
        return recommendations

# Initialize cost manager
cost_manager = CostManager()

# Set budgets
cost_manager.set_budget('daily', 50.0, alert_threshold=0.8)   # $50/day
cost_manager.set_budget('weekly', 300.0, alert_threshold=0.8)  # $300/week
cost_manager.set_budget('monthly', 1200.0, alert_threshold=0.8) # $1200/month

print("üí∞ Cost management system initialized")
print("üìä Budgets set: Daily $50, Weekly $300, Monthly $1200")

# Simulate some usage to demonstrate cost tracking
print("\nüß™ Simulating API usage for cost tracking...")

# Simulate various model usage patterns
usage_patterns = [
    ('gpt-4o-mini', 150, 200, 'user1', 'project-a'),
    ('gpt-4o', 300, 150, 'user2', 'project-b'),
    ('gpt-4o-mini', 200, 300, 'user1', 'project-a'),
    ('gpt-4o-mini', 100, 150, 'user3', 'project-c'),
    ('gpt-4o', 250, 180, 'user2', 'project-b'),
]

for model, input_tokens, output_tokens, user, project in usage_patterns:
    cost = cost_manager.track_cost(model, input_tokens, output_tokens, user, project)
    print(f"  üí∏ {model}: {input_tokens}+{output_tokens} tokens = ${cost:.4f} (User: {user}, Project: {project})")

# Display cost breakdown
print("\nüìà Cost Analysis Dashboard:")
print("=" * 40)

breakdown = cost_manager.get_cost_breakdown()
print(f"Total Cost: ${breakdown['total_cost']:.4f}")
print(f"Total Requests: {breakdown['total_records']}")
print(f"Average Cost/Request: ${breakdown['total_cost']/breakdown['total_records']:.4f}")

print("\nCost by Model:")
for model, cost in breakdown['by_model'].items():
    percentage = (cost / breakdown['total_cost']) * 100
    print(f"  {model}: ${cost:.4f} ({percentage:.1f}%)")

print("\nCost by Project:")
for project, cost in breakdown['by_project'].items():
    percentage = (cost / breakdown['total_cost']) * 100
    print(f"  {project}: ${cost:.4f} ({percentage:.1f}%)")

print("\nCost by User:")
for user, cost in breakdown['by_user'].items():
    percentage = (cost / breakdown['total_cost']) * 100
    print(f"  {user}: ${cost:.4f} ({percentage:.1f}%)")

# Show optimization recommendations
print("\nüéØ Optimization Recommendations:")
print("=" * 40)
recommendations = cost_manager.get_optimization_recommendations()
if recommendations:
    for i, rec in enumerate(recommendations, 1):
        print(f"{i}. {rec}")
else:
    print("‚úÖ No optimization recommendations at this time")

## üéØ Production Best Practices Summary

Here are the key production monitoring best practices covered in this notebook:

In [None]:
def print_production_best_practices():
    """Summary of production monitoring best practices"""
    
    best_practices = {
        "üîç Observability": [
            "Implement comprehensive tracing with LangSmith for all LLM interactions",
            "Track key metrics: latency, throughput, error rates, and quality scores",
            "Use structured logging with correlation IDs for distributed systems",
            "Implement health checks and readiness probes for services"
        ],
        
        "üö® Alerting": [
            "Set up multi-channel alerting (console, Slack, email, PagerDuty)",
            "Configure alert suppression to prevent notification fatigue",
            "Use severity-based routing (INFO ‚Üí console, CRITICAL ‚Üí on-call)",
            "Implement escalation policies for unacknowledged critical alerts"
        ],
        
        "üìä Metrics & Monitoring": [
            "Monitor the four golden signals: latency, traffic, errors, saturation",
            "Track business metrics: quality scores, user satisfaction, conversion rates",
            "Implement SLI/SLO monitoring with error budgets",
            "Use percentile-based latency monitoring (P50, P95, P99)"
        ],
        
        "üí∞ Cost Management": [
            "Implement real-time cost tracking with budget alerts",
            "Monitor token usage patterns and optimize for efficiency",
            "Use model routing (cheaper models for simple tasks)",
            "Track cost attribution by user, project, and feature"
        ],
        
        "üèóÔ∏è Infrastructure": [
            "Integrate with existing monitoring stack (Prometheus, Grafana, Datadog)",
            "Use OpenTelemetry for standardized distributed tracing",
            "Implement circuit breakers and retry logic with exponential backoff",
            "Set up automated scaling based on demand and performance metrics"
        ],
        
        "üîí Security & Compliance": [
            "Monitor for PII and sensitive data in prompts and responses",
            "Implement audit logs for all LLM interactions",
            "Set up anomaly detection for unusual usage patterns",
            "Ensure compliance with data retention and privacy policies"
        ],
        
        "üîÑ Continuous Improvement": [
            "Regular review of monitoring metrics and alert thresholds",
            "Conduct post-incident reviews and update monitoring accordingly",
            "Implement A/B testing for monitoring new features and optimizations",
            "Use monitoring data to inform capacity planning and architecture decisions"
        ]
    }
    
    print("\n" + "="*70)
    print("üè≠ PRODUCTION MONITORING BEST PRACTICES")
    print("="*70)
    
    for category, practices in best_practices.items():
        print(f"\n{category}")
        print("-" * (len(category) - 2))  # Subtract 2 for emoji
        for practice in practices:
            print(f"‚Ä¢ {practice}")
    
    print("\n" + "="*70)
    print("‚úÖ Remember: Production monitoring is an iterative process.")
    print("   Start with basic monitoring and gradually add sophistication.")
    print("="*70)

# Display best practices
print_production_best_practices()

## üéâ Congratulations!

You've successfully completed the Production Monitoring notebook! Here's what you've accomplished:

### ‚úÖ Key Achievements
- **Built a comprehensive monitoring system** with real-time metrics tracking
- **Implemented multi-channel alerting** with smart suppression rules
- **Created cost management systems** with budget tracking and optimization
- **Learned enterprise integration patterns** for existing monitoring infrastructure
- **Mastered production best practices** for LLM application operations

### üöÄ Next Steps

1. **Implement in your production environment**:
   - Start with basic metrics and alerting
   - Gradually add sophisticated monitoring features
   - Integrate with your existing infrastructure

2. **Continue with advanced topics**:
   - **LSM-007**: Advanced Patterns - Complex use cases and integrations
   - **LSM-008**: Tips and FAQs - Pro tips and troubleshooting

3. **Join the community**:
   - Share your monitoring setups and learnings
   - Contribute to LangSmith documentation and examples

### üí° Pro Tips for Production
- **Start simple**: Begin with basic monitoring and add complexity gradually
- **Monitor the monitors**: Ensure your monitoring system is reliable
- **Test your alerts**: Regularly test alert channels and escalation paths
- **Review and iterate**: Continuously improve based on production experience

---

**Ready for advanced patterns?** Continue to LSM-007 for complex use cases and integration patterns! üöÄ