# LSM-003: Mastering LLM Observability

## üéØ Learning Objectives

By the end of this notebook, you will:
- Master advanced tracing techniques and patterns
- Understand and implement Agent Observability (2025 feature)
- Learn debugging strategies using LangSmith traces
- Implement custom instrumentation and context propagation
- Analyze performance bottlenecks and optimization opportunities
- Work with complex multi-agent and tool-using applications

## üõ†Ô∏è Setup and Imports

Let's start by setting up our environment with the latest LangSmith capabilities.

In [None]:
# Install required packages
!pip install langsmith langchain langchain-openai langchain-community
!pip install tavily-python wikipedia-api
!pip install python-dotenv

In [None]:
import os
import time
import asyncio
from datetime import datetime
from typing import List, Dict, Any

from dotenv import load_dotenv
from langsmith import traceable, Client
from langsmith.run_helpers import tracing_context

from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage
from langchain.tools import Tool
from langchain.agents import create_openai_functions_agent, AgentExecutor
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder

# Load environment variables
load_dotenv()

# Verify setup
print(f"‚úÖ LangSmith Project: {os.getenv('LANGSMITH_PROJECT', 'Not set')}")

## üîç Advanced Tracing Patterns

Let's explore sophisticated tracing techniques that go beyond basic function decoration.

In [None]:
# Pattern 1: Dynamic run naming and metadata
@traceable(
    run_type="chain",
    tags=["advanced-tracing", "dynamic-metadata"]
)
def adaptive_text_processor(text: str, processing_mode: str = "standard"):
    """Demonstrates dynamic metadata and run naming"""
    
    # Update run info dynamically
    from langsmith.run_helpers import get_current_run_tree
    
    current_run = get_current_run_tree()
    if current_run:
        current_run.name = f"text_processor_{processing_mode}"
        current_run.extra["processing_mode"] = processing_mode
        current_run.extra["text_length"] = len(text)
        current_run.extra["word_count"] = len(text.split())
    
    # Different processing based on mode
    if processing_mode == "summarize":
        return _summarize_text(text)
    elif processing_mode == "analyze":
        return _analyze_text(text)
    else:
        return _standard_processing(text)

@traceable(run_type="llm")
def _summarize_text(text: str):
    """Summarize the given text"""
    llm = ChatOpenAI(temperature=0.3, model="gpt-3.5-turbo")
    messages = [
        SystemMessage(content="Summarize the following text concisely:"),
        HumanMessage(content=text)
    ]
    response = llm.invoke(messages)
    return {"summary": response.content, "original_length": len(text)}

@traceable(run_type="llm")
def _analyze_text(text: str):
    """Analyze the text for key themes"""
    llm = ChatOpenAI(temperature=0.2, model="gpt-3.5-turbo")
    messages = [
        SystemMessage(content="Analyze this text for key themes, sentiment, and complexity:"),
        HumanMessage(content=text)
    ]
    response = llm.invoke(messages)
    return {"analysis": response.content, "complexity_score": len(text) / 100}

@traceable(run_type="transform")
def _standard_processing(text: str):
    """Standard text processing"""
    return {
        "processed_text": text.strip().title(),
        "stats": {
            "character_count": len(text),
            "word_count": len(text.split()),
            "sentence_count": text.count('.') + text.count('!') + text.count('?')
        }
    }

# Test adaptive processing
sample_text = """Artificial intelligence is transforming industries across the globe. 
From healthcare to finance, AI applications are becoming more sophisticated and widespread. 
Machine learning algorithms can now process vast amounts of data and identify patterns 
that would be impossible for humans to detect manually."""

print("üîÑ Testing Adaptive Text Processing:\n")

for mode in ["standard", "summarize", "analyze"]:
    try:
        result = adaptive_text_processor(sample_text, mode)
        print(f"üìä Mode: {mode}")
        print(f"Result: {str(result)[:100]}...")
        print()
    except Exception as e:
        print(f"‚ùå Error in {mode} mode: {e}\n")

## ü§ñ Agent Observability - 2025 Feature Deep Dive

LangSmith's new Agent Observability provides enhanced insights into agent behavior, tool usage, and decision-making processes.

In [None]:
# Create custom tools for our agent

@traceable(run_type="tool", tags=["web-search", "agent-tool"])
def web_search_tool(query: str) -> str:
    """Simulate web search - in production, you'd use a real search API"""
    # Add tool-specific metadata
    from langsmith.run_helpers import get_current_run_tree
    current_run = get_current_run_tree()
    if current_run:
        current_run.extra["search_query"] = query
        current_run.extra["search_timestamp"] = datetime.now().isoformat()
    
    # Simulate search latency
    time.sleep(0.5)
    
    # Simulated search results
    results = f"""Search results for '{query}':
1. Recent developments in {query} show significant progress
2. Industry experts predict {query} will continue to evolve
3. Key challenges in {query} include scalability and adoption"""
    
    return results

@traceable(run_type="tool", tags=["calculator", "agent-tool"])
def calculator_tool(expression: str) -> str:
    """Safe calculator for mathematical expressions"""
    try:
        # Simple evaluation - in production, use a safer approach
        allowed_chars = set('0123456789+-*/.()')
        if all(c in allowed_chars for c in expression.replace(' ', '')):
            result = eval(expression)
            return f"Result: {result}"
        else:
            return "Error: Invalid characters in expression"
    except Exception as e:
        return f"Error: {str(e)}"

@traceable(run_type="tool", tags=["knowledge-base", "agent-tool"])
def knowledge_lookup(topic: str) -> str:
    """Lookup information from knowledge base"""
    
    # Simulate knowledge base lookup
    knowledge_base = {
        "langsmith": "LangSmith is a platform for building production-grade LLM applications with observability and evaluation.",
        "ai": "Artificial Intelligence involves creating systems that can perform tasks typically requiring human intelligence.",
        "python": "Python is a high-level programming language known for its simplicity and versatility.",
        "machine learning": "Machine Learning is a subset of AI that enables systems to learn from data without explicit programming."
    }
    
    topic_lower = topic.lower()
    for key, value in knowledge_base.items():
        if key in topic_lower:
            return f"Knowledge base entry for '{topic}': {value}"
    
    return f"No specific information found for '{topic}' in knowledge base."

# Create LangChain tools
tools = [
    Tool(
        name="web_search",
        func=web_search_tool,
        description="Search the web for current information about any topic"
    ),
    Tool(
        name="calculator",
        func=calculator_tool,
        description="Perform mathematical calculations. Input should be a mathematical expression."
    ),
    Tool(
        name="knowledge_lookup",
        func=knowledge_lookup,
        description="Look up information from the internal knowledge base"
    )
]

print("üõ†Ô∏è Created agent tools:")
for tool in tools:
    print(f"  - {tool.name}: {tool.description}")

In [None]:
# Create an advanced agent with enhanced observability

@traceable(
    run_type="agent",
    tags=["research-agent", "multi-tool", "agent-observability"]
)
def create_research_agent():
    """Create a research agent with comprehensive observability"""
    
    # Initialize the LLM
    llm = ChatOpenAI(
        temperature=0.1,
        model="gpt-3.5-turbo",
        model_kwargs={"seed": 42}  # For reproducibility
    )
    
    # Create a detailed system prompt
    system_prompt = """You are a research assistant with access to multiple tools.
    
Your capabilities:
- web_search: Find current information on the internet
- calculator: Perform mathematical calculations
- knowledge_lookup: Access internal knowledge base

Instructions:
1. Always think step by step
2. Use tools when you need external information or calculations
3. Provide comprehensive answers with sources when possible
4. If you use multiple tools, explain why each was necessary

Be thorough but concise in your responses."""
    
    # Create the prompt template
    prompt = ChatPromptTemplate.from_messages([
        ("system", system_prompt),
        MessagesPlaceholder(variable_name="chat_history"),
        ("human", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad")
    ])
    
    # Create the agent
    agent = create_openai_functions_agent(llm, tools, prompt)
    
    # Create agent executor with enhanced configuration
    agent_executor = AgentExecutor(
        agent=agent,
        tools=tools,
        verbose=True,
        max_iterations=5,
        early_stopping_method="generate",
        handle_parsing_errors=True,
        return_intermediate_steps=True
    )
    
    return agent_executor

# Initialize the research agent
research_agent = create_research_agent()
print("‚úÖ Research agent created successfully!")

In [None]:
# Test the agent with complex queries that require multiple tools

@traceable(
    run_type="workflow",
    tags=["agent-interaction", "multi-step-reasoning"]
)
def run_research_query(agent_executor, query: str, context: str = ""):
    """Execute a research query with full observability"""
    
    # Add query metadata
    from langsmith.run_helpers import get_current_run_tree
    current_run = get_current_run_tree()
    if current_run:
        current_run.extra["query_length"] = len(query)
        current_run.extra["has_context"] = bool(context)
        current_run.extra["query_timestamp"] = datetime.now().isoformat()
    
    start_time = time.time()
    
    try:
        # Execute the agent
        result = agent_executor.invoke({
            "input": query,
            "chat_history": []
        })
        
        execution_time = time.time() - start_time
        
        # Extract intermediate steps for analysis
        intermediate_steps = result.get("intermediate_steps", [])
        tool_calls = []
        
        for step in intermediate_steps:
            if len(step) >= 2:
                action, observation = step[0], step[1]
                tool_calls.append({
                    "tool": getattr(action, 'tool', 'unknown'),
                    "input": getattr(action, 'tool_input', {}),
                    "output_length": len(str(observation))
                })
        
        # Update run with execution metrics
        if current_run:
            current_run.extra["execution_time_seconds"] = round(execution_time, 2)
            current_run.extra["tool_calls_count"] = len(tool_calls)
            current_run.extra["tools_used"] = [tc["tool"] for tc in tool_calls]
            current_run.extra["total_iterations"] = len(intermediate_steps)
        
        return {
            "query": query,
            "answer": result["output"],
            "execution_time": execution_time,
            "tool_calls": tool_calls,
            "iterations": len(intermediate_steps)
        }
        
    except Exception as e:
        execution_time = time.time() - start_time
        if current_run:
            current_run.extra["error"] = str(e)
            current_run.extra["execution_time_seconds"] = round(execution_time, 2)
        raise

# Test queries that showcase agent observability
test_queries = [
    {
        "query": "What is LangSmith and how much would it cost to process 10,000 API calls?",
        "description": "Knowledge lookup + calculation"
    },
    {
        "query": "Find recent developments in AI and calculate the percentage growth if adoption increased from 30% to 45%",
        "description": "Web search + calculation"
    },
    {
        "query": "Compare machine learning with traditional programming approaches",
        "description": "Knowledge base synthesis"
    }
]

print("üß™ Testing Agent Observability with Complex Queries:\n")

for i, test in enumerate(test_queries, 1):
    print(f"üìù Query {i}: {test['description']}")
    print(f"Question: {test['query'][:80]}...")
    
    try:
        result = run_research_query(research_agent, test["query"])
        
        print(f"‚úÖ Completed in {result['execution_time']:.2f}s")
        print(f"üõ†Ô∏è Tools used: {', '.join(set(tc['tool'] for tc in result['tool_calls']))}")
        print(f"üîÑ Iterations: {result['iterations']}")
        print(f"üìÑ Answer: {result['answer'][:150]}...\n")
        
    except Exception as e:
        print(f"‚ùå Error: {e}\n")

## üîß Custom Instrumentation and Context Propagation

Learn how to implement custom instrumentation for non-LangChain components and propagate context across async operations.

In [None]:
# Advanced instrumentation patterns

from langsmith.run_helpers import tracing_context
from contextlib import contextmanager
import uuid

@contextmanager
def custom_trace_context(name: str, run_type: str = "custom", **kwargs):
    """Custom context manager for manual tracing"""
    from langsmith import Client
    from langsmith.run_helpers import get_current_run_tree
    
    client = Client()
    parent_run = get_current_run_tree()
    
    # Create a new run
    run_id = uuid.uuid4()
    start_time = datetime.now()
    
    try:
        # Start the run
        run = client.create_run(
            name=name,
            run_type=run_type,
            inputs=kwargs.get('inputs', {}),
            parent_run_id=parent_run.id if parent_run else None,
            extra=kwargs.get('extra', {}),
            tags=kwargs.get('tags', [])
        )
        
        yield run
        
        # End the run successfully
        client.update_run(
            run.id,
            outputs=kwargs.get('outputs', {}),
            end_time=datetime.now()
        )
        
    except Exception as e:
        # Handle errors
        client.update_run(
            run.id,
            error=str(e),
            end_time=datetime.now()
        )
        raise

@traceable(run_type="workflow", tags=["custom-instrumentation", "async"])
def complex_data_pipeline(data_source: str, transformations: List[str]):
    """A complex data pipeline with custom instrumentation"""
    
    results = []
    
    # Step 1: Data loading with custom tracing
    with custom_trace_context(
        name="data_loading",
        run_type="retrieval",
        inputs={"source": data_source},
        extra={"data_source_type": "simulated"}
    ) as load_run:
        
        # Simulate data loading
        raw_data = [f"record_{i}" for i in range(100)]
        time.sleep(0.2)  # Simulate I/O
        
        load_run.outputs = {
            "records_loaded": len(raw_data),
            "data_sample": raw_data[:3]
        }
    
    # Step 2: Apply transformations with individual tracing
    processed_data = raw_data
    
    for i, transformation in enumerate(transformations):
        with custom_trace_context(
            name=f"transformation_{i+1}_{transformation}",
            run_type="transform",
            inputs={
                "transformation": transformation,
                "input_size": len(processed_data)
            }
        ) as transform_run:
            
            # Apply transformation
            if transformation == "uppercase":
                processed_data = [item.upper() for item in processed_data]
            elif transformation == "filter":
                processed_data = [item for item in processed_data if "5" not in item]
            elif transformation == "prefix":
                processed_data = [f"processed_{item}" for item in processed_data]
            
            transform_run.outputs = {
                "output_size": len(processed_data),
                "transformation_applied": transformation
            }
            
            time.sleep(0.1)  # Simulate processing time
    
    # Step 3: Final aggregation
    final_result = {
        "original_count": len(raw_data),
        "final_count": len(processed_data),
        "transformations_applied": transformations,
        "sample_output": processed_data[:3]
    }
    
    return final_result

# Test the custom instrumentation
try:
    pipeline_result = complex_data_pipeline(
        data_source="user_interactions_db",
        transformations=["uppercase", "filter", "prefix"]
    )
    
    print("üè≠ Data Pipeline Results:")
    print(f"üìä Original records: {pipeline_result['original_count']}")
    print(f"üîÑ Transformations: {', '.join(pipeline_result['transformations_applied'])}")
    print(f"üìà Final records: {pipeline_result['final_count']}")
    print(f"üîç Sample output: {pipeline_result['sample_output']}")
    print("\n‚úÖ Custom instrumentation pipeline completed!")
    
except Exception as e:
    print(f"‚ùå Pipeline error: {e}")

## üêõ Advanced Debugging Techniques

Learn how to use LangSmith traces for debugging complex issues and performance optimization.

In [None]:
# Debugging helper functions

@traceable(run_type="chain", tags=["debugging", "error-handling"])
def robust_llm_chain(user_input: str, max_retries: int = 3):
    """LLM chain with comprehensive error handling and debugging info"""
    
    from langsmith.run_helpers import get_current_run_tree
    current_run = get_current_run_tree()
    
    if current_run:
        current_run.extra["max_retries"] = max_retries
        current_run.extra["input_length"] = len(user_input)
    
    llm = ChatOpenAI(temperature=0.7, model="gpt-3.5-turbo")
    attempts = []
    
    for attempt in range(max_retries):
        attempt_start = time.time()
        
        try:
            # Create messages
            messages = [
                SystemMessage(content="You are a helpful assistant. Respond clearly and concisely."),
                HumanMessage(content=user_input)
            ]
            
            # Make the LLM call
            response = llm.invoke(messages)
            
            attempt_time = time.time() - attempt_start
            
            # Record successful attempt
            attempts.append({
                "attempt": attempt + 1,
                "status": "success",
                "duration": round(attempt_time, 3),
                "response_length": len(response.content)
            })
            
            if current_run:
                current_run.extra["attempts"] = attempts
                current_run.extra["successful_attempt"] = attempt + 1
            
            return {
                "response": response.content,
                "attempts": attempts,
                "success": True
            }
            
        except Exception as e:
            attempt_time = time.time() - attempt_start
            
            # Record failed attempt
            attempts.append({
                "attempt": attempt + 1,
                "status": "failed",
                "duration": round(attempt_time, 3),
                "error": str(e)
            })
            
            # If this was the last attempt, raise the error
            if attempt == max_retries - 1:
                if current_run:
                    current_run.extra["attempts"] = attempts
                    current_run.extra["all_attempts_failed"] = True
                
                return {
                    "response": None,
                    "attempts": attempts,
                    "success": False,
                    "final_error": str(e)
                }
            
            # Wait before retry
            time.sleep(0.5 * (attempt + 1))

@traceable(run_type="evaluation", tags=["performance-analysis"])
def analyze_performance_patterns(test_inputs: List[str]):
    """Analyze performance patterns across multiple inputs"""
    
    results = []
    total_start_time = time.time()
    
    for i, input_text in enumerate(test_inputs):
        print(f"üìä Processing input {i+1}/{len(test_inputs)}...")
        
        result = robust_llm_chain(input_text)
        results.append({
            "input": input_text,
            "input_length": len(input_text),
            "success": result["success"],
            "attempts_count": len(result["attempts"]),
            "total_duration": sum(attempt["duration"] for attempt in result["attempts"]),
            "response_length": len(result["response"]) if result["response"] else 0
        })
    
    total_time = time.time() - total_start_time
    
    # Analyze patterns
    successful_results = [r for r in results if r["success"]]
    avg_duration = sum(r["total_duration"] for r in successful_results) / len(successful_results) if successful_results else 0
    
    analysis = {
        "total_inputs": len(test_inputs),
        "successful_count": len(successful_results),
        "success_rate": len(successful_results) / len(test_inputs) * 100,
        "average_duration": round(avg_duration, 3),
        "total_processing_time": round(total_time, 3),
        "detailed_results": results
    }
    
    return analysis

# Test with various input types to analyze patterns
test_inputs = [
    "What is the capital of France?",
    "Explain quantum computing in detail with examples and mathematical formulas.",
    "Hi",
    "Write a comprehensive analysis of the impact of artificial intelligence on modern society, including economic, social, and ethical considerations.",
    "42"
]

print("üîç Performance Analysis Starting...\n")

try:
    performance_analysis = analyze_performance_patterns(test_inputs)
    
    print("\nüìà Performance Analysis Results:")
    print(f"‚úÖ Success Rate: {performance_analysis['success_rate']:.1f}%")
    print(f"‚è±Ô∏è Average Duration: {performance_analysis['average_duration']}s")
    print(f"üïí Total Processing Time: {performance_analysis['total_processing_time']}s")
    
    print("\nüìä Individual Results:")
    for i, result in enumerate(performance_analysis['detailed_results'], 1):
        status = "‚úÖ" if result["success"] else "‚ùå"
        print(f"{status} Input {i}: {result['input_length']} chars ‚Üí {result['response_length']} chars ({result['total_duration']:.3f}s)")
    
except Exception as e:
    print(f"‚ùå Analysis failed: {e}")

## üìä Trace Analysis and Insights

Learn how to extract insights from your traces programmatically.

In [None]:
# Programmatic trace analysis

from langsmith import Client
from datetime import datetime, timedelta

def analyze_project_traces(project_name: str = None, hours_back: int = 24):
    """Analyze traces from your LangSmith project"""
    
    client = Client()
    
    if not project_name:
        project_name = os.getenv("LANGSMITH_PROJECT")
    
    if not project_name:
        print("‚ö†Ô∏è No project name provided or found in environment")
        return None
    
    # Calculate time range
    end_time = datetime.now()
    start_time = end_time - timedelta(hours=hours_back)
    
    print(f"üìä Analyzing traces from project '{project_name}'...")
    print(f"‚è∞ Time range: {start_time.strftime('%Y-%m-%d %H:%M')} to {end_time.strftime('%Y-%m-%d %H:%M')}")
    
    try:
        # Fetch traces
        runs = list(client.list_runs(
            project_name=project_name,
            start_time=start_time,
            end_time=end_time
        ))
        
        if not runs:
            print("üì≠ No traces found in the specified time range")
            return None
        
        # Analyze traces
        analysis = {
            "total_runs": len(runs),
            "run_types": {},
            "tags": {},
            "errors": 0,
            "avg_latency": 0,
            "total_tokens": 0
        }
        
        latencies = []
        
        for run in runs:
            # Count run types
            run_type = getattr(run, 'run_type', 'unknown')
            analysis["run_types"][run_type] = analysis["run_types"].get(run_type, 0) + 1
            
            # Count tags
            if hasattr(run, 'tags') and run.tags:
                for tag in run.tags:
                    analysis["tags"][tag] = analysis["tags"].get(tag, 0) + 1
            
            # Check for errors
            if getattr(run, 'error', None):
                analysis["errors"] += 1
            
            # Calculate latency
            if hasattr(run, 'start_time') and hasattr(run, 'end_time') and run.end_time:
                if run.start_time and run.end_time:
                    latency = (run.end_time - run.start_time).total_seconds()
                    latencies.append(latency)
            
            # Count tokens (if available in extra data)
            if hasattr(run, 'extra') and run.extra:
                extra = run.extra if isinstance(run.extra, dict) else {}
                if 'tokens' in extra:
                    analysis["total_tokens"] += extra['tokens']
        
        # Calculate averages
        if latencies:
            analysis["avg_latency"] = round(sum(latencies) / len(latencies), 3)
            analysis["min_latency"] = round(min(latencies), 3)
            analysis["max_latency"] = round(max(latencies), 3)
        
        return analysis
        
    except Exception as e:
        print(f"‚ùå Error analyzing traces: {e}")
        return None

# Run the analysis
trace_analysis = analyze_project_traces()

if trace_analysis:
    print("\nüìà Trace Analysis Results:")
    print(f"üìä Total Runs: {trace_analysis['total_runs']}")
    print(f"‚ùå Errors: {trace_analysis['errors']} ({trace_analysis['errors']/trace_analysis['total_runs']*100:.1f}%)")
    print(f"‚è±Ô∏è Average Latency: {trace_analysis['avg_latency']}s")
    
    if 'min_latency' in trace_analysis:
        print(f"‚ö° Min Latency: {trace_analysis['min_latency']}s")
        print(f"üêå Max Latency: {trace_analysis['max_latency']}s")
    
    print(f"üî¢ Total Tokens: {trace_analysis['total_tokens']}")
    
    print("\nüè∑Ô∏è Run Types:")
    for run_type, count in trace_analysis['run_types'].items():
        print(f"  - {run_type}: {count}")
    
    if trace_analysis['tags']:
        print("\nüîñ Most Common Tags:")
        sorted_tags = sorted(trace_analysis['tags'].items(), key=lambda x: x[1], reverse=True)
        for tag, count in sorted_tags[:5]:
            print(f"  - {tag}: {count}")

print("\n‚úÖ Observability deep dive completed! Check your LangSmith dashboard for detailed trace views.")

## üí° Key Takeaways and Best Practices

### ‚úÖ What You've Mastered

1. **Advanced Tracing Patterns**:
   - Dynamic metadata and run naming
   - Custom instrumentation for non-LangChain components
   - Context propagation across complex workflows

2. **Agent Observability (2025 Feature)**:
   - Tool usage analytics and performance insights
   - Multi-step reasoning analysis
   - Agent decision-making transparency

3. **Debugging and Performance Analysis**:
   - Error handling and retry logic tracing
   - Performance pattern analysis
   - Programmatic trace analysis

### üéØ Best Practices for Production

1. **Comprehensive Instrumentation**:
   - Instrument all critical paths in your application
   - Use meaningful run names and metadata
   - Tag runs consistently for easy filtering

2. **Performance Monitoring**:
   - Track latency trends across different input types
   - Monitor token usage and costs
   - Set up alerts for error rate spikes

3. **Agent Optimization**:
   - Analyze tool usage patterns to optimize agent workflows
   - Monitor tool call latencies and success rates
   - Use intermediate steps for detailed debugging

### üîß Advanced Tips

- **Sampling**: Use trace sampling in high-volume environments
- **Context**: Leverage context managers for complex instrumentation
- **Metadata**: Store business metrics in run metadata for analysis
- **Tags**: Use hierarchical tags for multi-dimensional filtering

## üöÄ What's Next?

You're now equipped with advanced observability skills! Continue to:

- **LSM-004: Evaluation Mastery** - Build comprehensive testing pipelines
- **LSM-005: Prompt Engineering** - Master collaborative prompt development
- **LSM-006: Production Monitoring** - Set up enterprise-grade monitoring

---

**Ready to build robust evaluation pipelines?** Continue to **LSM-004: Evaluation Mastery** to master systematic testing of your LLM applications! üß™