In [None]:
````xml
<!-- filepath: e:\hack\langchain-in-action\notebooks\chapter01\ch01_02_debugging_demo.ipynb -->
<VSCode.Cell language="markdown">
# Chapter 1: Agent Debugging and Observability

**Purpose:** Learn to debug and monitor LangChain agents using built-in tools and custom logging  
**Prerequisites:** Completed ch01_01_first_agent.ipynb  
**Duration:** 20-30 minutes  
**Key Concepts:** LangSmith tracing, custom logging, performance monitoring, error handling

---

This notebook demonstrates how to effectively debug, monitor, and optimize LangChain agents using both built-in observability tools and custom logging frameworks.
</VSCode.Cell>

<VSCode.Cell language="python">
# Enhanced Setup with Debugging Tools
import sys
import warnings
warnings.filterwarnings('ignore')

# Add project paths
sys.path.append('../../codes/shared')
sys.path.append('../../codes/chapter01')

# Standard imports
from langchain.agents import create_agent
from langchain_openai import ChatOpenAI
import os
import time
from datetime import datetime

# Custom debugging utilities
from shared.logging_utils import AgentLogger, error_handling_context, monitor_performance
from shared.config import config

# Load environment
from dotenv import load_dotenv
load_dotenv()

print("🔍 Agent Debugging & Observability Setup Complete")
print(f"📊 LangSmith Tracing: {'Enabled' if config.langchain_tracing_v2 else 'Disabled'}")
print(f"🐛 Debug Mode: {'On' if config.debug else 'Off'}")
</VSCode.Cell>

<VSCode.Cell language="markdown">
## Part 1: LangSmith Tracing - Built-in Observability

LangSmith provides automatic tracing for LangChain agents, giving you deep visibility into:
- Agent reasoning steps
- Tool execution details  
- Token usage and costs
- Performance metrics
- Error tracking

Let's enable and demonstrate LangSmith tracing.
</VSCode.Cell>

<VSCode.Cell language="python">
# Enable LangSmith Tracing

def setup_langsmith_tracing():
    """Enable LangSmith tracing for comprehensive observability."""
    
    if config.langchain_api_key:
        # Enable tracing
        os.environ["LANGCHAIN_TRACING_V2"] = "true"
        os.environ["LANGCHAIN_PROJECT"] = "langchain-book-debugging"
        os.environ["LANGCHAIN_API_KEY"] = config.langchain_api_key
        
        print("✅ LangSmith tracing enabled")
        print(f"📊 Project: langchain-book-debugging")
        print("🔗 View traces at: https://smith.langchain.com/")
        return True
    else:
        print("⚠️  LangSmith API key not configured")
        print("💡 Set LANGCHAIN_API_KEY environment variable to enable tracing")
        return False

# Setup tracing
tracing_enabled = setup_langsmith_tracing()

if tracing_enabled:
    print("\n🎯 All agent interactions will now be traced!")
    print("   - Reasoning steps visible")  
    print("   - Tool executions logged")
    print("   - Performance metrics captured")
    print("   - Errors automatically tracked")
</VSCode.Cell>

<VSCode.Cell language="python">
# Create Agent with Tracing

@monitor_performance("weather_tool_execution")
def get_weather_with_monitoring(city: str) -> str:
    """Get weather for a city with performance monitoring."""
    # Simulate API call delay
    time.sleep(0.1)
    return f"It's always sunny in {city}! Perfect weather for outdoor activities."

def create_monitored_agent():
    """Create agent with monitoring and tracing enabled."""
    try:
        agent = create_agent(
            model="anthropic:claude-sonnet-4-5",
            tools=[get_weather_with_monitoring],
            system_prompt="You are a helpful weather assistant who provides detailed forecasts.",
        )
        return agent
    except Exception as e:
        print(f"❌ Error creating monitored agent: {e}")
        return None

# Create the monitored agent
monitored_agent = create_monitored_agent()

if monitored_agent:
    print("✅ Monitored agent created successfully!")
    print("🔍 Agent ready with full observability")
else:
    print("⚠️ Could not create monitored agent")
</VSCode.Cell>

<VSCode.Cell language="python">
# Test Agent with Tracing

if monitored_agent:
    print("🧪 Testing Agent with Full Tracing")
    print("=" * 45)
    
    # Test queries that will generate traces
    test_queries = [
        "What's the weather in San Francisco?",
        "How about the weather in New York City?", 
        "Tell me about London's weather today"
    ]
    
    for i, query in enumerate(test_queries, 1):
        print(f"\n🔍 Trace {i}: {query}")
        print("-" * 40)
        
        try:
            start_time = time.time()
            
            response = monitored_agent.invoke({
                "messages": [{"role": "user", "content": query}]
            })
            
            execution_time = time.time() - start_time
            
            # Extract response  
            if isinstance(response, dict) and 'messages' in response:
                agent_response = response['messages'][-1]['content']
            else:
                agent_response = str(response)
                
            print(f"🤖 Response: {agent_response[:100]}...")
            print(f"⏱️  Execution Time: {execution_time:.2f}s")
            
            if tracing_enabled:
                print("📊 Trace captured in LangSmith")
                
        except Exception as e:
            print(f"❌ Error: {e}")
    
    print(f"\n✅ Tracing demonstration complete!")
    if tracing_enabled:
        print("🔗 View detailed traces at: https://smith.langchain.com/")
        
else:
    print("⚠️ Skipping tracing tests - agent not available")
</VSCode.Cell>

<VSCode.Cell language="markdown">
## Part 2: Custom Agent Logging Framework

While LangSmith provides excellent built-in tracing, sometimes you need custom logging for specific debugging scenarios. Let's explore our custom logging framework.

Key features:
- **Step-by-step logging** of agent reasoning
- **Tool execution tracking** with performance metrics
- **Error context capture** for debugging failures
- **Conversation history** for analysis
- **Performance summaries** for optimization
</VSCode.Cell>

<VSCode.Cell language="python">
# Custom Agent Logger Demonstration

# Initialize custom logger
agent_logger = AgentLogger("debugging_demo", log_level="DEBUG")

print("🎯 Custom Agent Logger Initialized")
print("=" * 40)

# Simulate agent interaction with detailed logging
def simulate_agent_interaction(query: str, logger: AgentLogger):
    """Simulate agent interaction with comprehensive logging."""
    
    with error_handling_context("agent_interaction", logger):
        # Log user query
        logger.log_agent_step("USER_QUERY", query, {
            "timestamp": datetime.now().isoformat(),
            "query_length": len(query),
            "query_type": "weather" if "weather" in query.lower() else "general"
        })
        
        # Simulate reasoning step
        logger.log_agent_step("AGENT_REASONING", "Analyzing query for weather request", {
            "reasoning_type": "intent_classification",
            "confidence": 0.95
        })
        
        # Simulate tool execution
        tool_start = time.time()
        time.sleep(0.1)  # Simulate API call
        tool_duration = time.time() - tool_start
        tool_result = f"Weather data for query: {query}"
        
        logger.log_tool_execution(
            tool_name="get_weather",
            input_data=query,
            output=tool_result,
            duration=tool_duration,
            success=True
        )
        
        # Simulate LLM response generation
        llm_start = time.time()
        time.sleep(0.05)  # Simulate LLM processing
        llm_duration = time.time() - llm_start
        llm_response = f"Based on the weather data, it's sunny in the requested location!"
        
        logger.log_llm_interaction(
            prompt=f"Generate weather response for: {query}",
            response=llm_response,
            duration=llm_duration,
            token_usage={"prompt_tokens": 50, "completion_tokens": 30, "total_tokens": 80}
        )
        
        return llm_response

# Test custom logging
test_queries = [
    "What's the weather in Miami?",
    "How about Seattle today?",
    "Is it raining in Portland?"
]

print("Testing custom logging framework...")
for query in test_queries:
    result = simulate_agent_interaction(query, agent_logger)
    print(f"✅ Logged interaction: {query[:30]}...")

print(f"\n📊 Custom logging demonstration complete!")
</VSCode.Cell>

<VSCode.Cell language="python">
# Analyze Logged Data

print("📈 Analyzing Logged Agent Data")
print("=" * 35)

# Get conversation summary
summary = agent_logger.get_conversation_summary()

print(f"📋 Conversation Summary:")
print(f"   Total steps logged: {summary['total_steps']}")
print(f"   Tool executions: {summary['tool_executions']}")
print(f"   LLM interactions: {summary['llm_interactions']}")
print(f"   Tools used: {', '.join(summary['tools_used'])}")
print(f"   Total duration: {summary['total_duration_ms']:.0f}ms")

# Get performance metrics
performance = agent_logger.get_performance_summary()

if performance:
    print(f"\n🔧 Tool Performance Analysis:")
    for tool_name, metrics in performance.items():
        print(f"   {tool_name}:")
        print(f"     • Total calls: {metrics['total_calls']}")
        print(f"     • Success rate: {metrics['success_rate']}%")
        print(f"     • Average duration: {metrics['average_duration_ms']:.1f}ms")
        print(f"     • Total time: {metrics['total_duration_ms']:.1f}ms")

# Show recent log entries
print(f"\n📝 Recent Log Entries:")
recent_logs = summary['conversation_log'][-3:]  # Last 3 entries
for i, log_entry in enumerate(recent_logs, 1):
    print(f"{i}. [{log_entry['step_type']}] {log_entry['content'][:50]}...")
    if log_entry['metadata']:
        key_metadata = {k: v for k, v in log_entry['metadata'].items() if k in ['duration_ms', 'success', 'tool_name']}
        if key_metadata:
            print(f"   Metadata: {key_metadata}")
</VSCode.Cell>

<VSCode.Cell language="markdown">
## Part 3: Error Handling and Recovery Patterns

Robust agents need comprehensive error handling. Let's explore common failure patterns and recovery strategies.

Common agent errors:
- **API timeouts** - external service delays
- **Tool failures** - invalid inputs or service unavailability  
- **Parsing errors** - malformed model responses
- **Rate limiting** - API quota exceeded
- **Memory issues** - conversation context too long
</VSCode.Cell>

<VSCode.Cell language="python">
# Error Handling Demonstration

def create_error_prone_tool():
    """Create a tool that demonstrates various error conditions."""
    
    @monitor_performance("error_prone_operation")
    def unreliable_calculator(expression: str) -> str:
        """Calculator that sometimes fails to demonstrate error handling."""
        
        # Simulate different error conditions
        import random
        error_chance = random.random()
        
        if error_chance < 0.2:  # 20% chance of timeout
            time.sleep(0.1)
            raise TimeoutError("Calculator service timeout")
        elif error_chance < 0.3:  # 10% chance of invalid input
            raise ValueError(f"Invalid expression: {expression}")
        elif error_chance < 0.35:  # 5% chance of service error
            raise ConnectionError("Calculator service unavailable")
        else:
            # Success case
            try:
                result = eval(expression.replace('^', '**'))
                return f"Calculation result: {result}"
            except Exception as e:
                raise ValueError(f"Mathematical error: {str(e)}")
    
    return unreliable_calculator

# Create error-prone tool
error_tool = create_error_prone_tool()

print("⚠️  Error-Prone Tool Created")
print("🎲 Tool has random failure patterns for demonstration")
</VSCode.Cell>

<VSCode.Cell language="python">
# Test Error Handling

def test_error_handling_patterns():
    """Test various error handling patterns."""
    
    print("🧪 Testing Error Handling Patterns")
    print("=" * 40)
    
    error_logger = AgentLogger("error_handling_demo")
    
    test_expressions = [
        "10 + 5",
        "20 * 3", 
        "100 / 4",
        "15 - 8",
        "2 ^ 8"  # Power operation
    ]
    
    success_count = 0
    total_attempts = len(test_expressions)
    
    for i, expression in enumerate(test_expressions, 1):
        print(f"\n🔍 Test {i}: {expression}")
        print("-" * 20)
        
        try:
            with error_handling_context(f"calculation_{i}", error_logger):
                result = error_tool(expression)
                print(f"✅ Success: {result}")
                success_count += 1
                
        except TimeoutError as e:
            print(f"⏰ Timeout Error: {e}")
            error_logger.log_agent_step("TIMEOUT_ERROR", str(e), {
                "error_type": "TimeoutError",
                "expression": expression,
                "retry_recommended": True
            })
            
        except ValueError as e:
            print(f"📊 Value Error: {e}")
            error_logger.log_agent_step("VALUE_ERROR", str(e), {
                "error_type": "ValueError", 
                "expression": expression,
                "user_input_issue": True
            })
            
        except ConnectionError as e:
            print(f"🌐 Connection Error: {e}")
            error_logger.log_agent_step("CONNECTION_ERROR", str(e), {
                "error_type": "ConnectionError",
                "expression": expression,
                "service_issue": True
            })
            
        except Exception as e:
            print(f"❌ Unexpected Error: {e}")
            error_logger.log_agent_step("UNEXPECTED_ERROR", str(e), {
                "error_type": type(e).__name__,
                "expression": expression
            })
    
    # Summary
    success_rate = (success_count / total_attempts) * 100
    print(f"\n📊 Error Handling Test Results:")
    print(f"   Successful operations: {success_count}/{total_attempts}")
    print(f"   Success rate: {success_rate:.1f}%")
    print(f"   Errors handled gracefully: {total_attempts - success_count}")
    
    # Analyze error patterns
    error_summary = error_logger.get_conversation_summary()
    error_types = {}
    for log_entry in error_summary['conversation_log']:
        if 'ERROR' in log_entry['step_type']:
            error_type = log_entry['metadata'].get('error_type', 'Unknown')
            error_types[error_type] = error_types.get(error_type, 0) + 1
    
    if error_types:
        print(f"\n🔧 Error Type Analysis:")
        for error_type, count in error_types.items():
            print(f"   {error_type}: {count} occurrences")
    
    return error_logger

# Run error handling tests
error_test_logger = test_error_handling_patterns()
</VSCode.Cell>

<VSCode.Cell language="markdown">
## Part 4: Performance Monitoring and Optimization

Understanding agent performance is crucial for production deployments. Let's explore performance monitoring techniques and optimization strategies.

Key performance metrics:
- **Response time** - total interaction duration
- **Tool execution time** - external service latency
- **Token usage** - LLM costs and efficiency
- **Success rates** - reliability metrics  
- **Memory usage** - conversation context size
</VSCode.Cell>

<VSCode.Cell language="python">
# Performance Monitoring Demo

class PerformanceMonitor:
    """Advanced performance monitoring for LangChain agents."""
    
    def __init__(self):
        self.metrics = {
            'total_interactions': 0,
            'total_duration': 0,
            'tool_calls': 0,
            'tool_duration': 0,
            'llm_calls': 0,
            'llm_duration': 0,
            'errors': 0,
            'token_usage': {'prompt': 0, 'completion': 0, 'total': 0}
        }
        self.response_times = []
    
    def record_interaction(self, duration: float, tool_calls: int = 0, 
                         tool_duration: float = 0, tokens: dict = None):
        """Record interaction metrics."""
        self.metrics['total_interactions'] += 1
        self.metrics['total_duration'] += duration
        self.metrics['tool_calls'] += tool_calls
        self.metrics['tool_duration'] += tool_duration
        self.response_times.append(duration)
        
        if tokens:
            for key in ['prompt', 'completion', 'total']:
                if key in tokens:
                    self.metrics['token_usage'][key] += tokens[key]
    
    def get_performance_report(self):
        """Generate comprehensive performance report."""
        if self.metrics['total_interactions'] == 0:
            return "No interactions recorded"
        
        avg_response_time = self.metrics['total_duration'] / self.metrics['total_interactions']
        avg_tool_time = self.metrics['tool_duration'] / max(self.metrics['tool_calls'], 1)
        
        # Calculate percentiles
        sorted_times = sorted(self.response_times)
        n = len(sorted_times)
        p50 = sorted_times[n//2] if n > 0 else 0
        p95 = sorted_times[int(n*0.95)] if n > 0 else 0
        
        return {
            'total_interactions': self.metrics['total_interactions'],
            'average_response_time': round(avg_response_time, 3),
            'p50_response_time': round(p50, 3),
            'p95_response_time': round(p95, 3),
            'tool_calls': self.metrics['tool_calls'],
            'average_tool_time': round(avg_tool_time, 3),
            'total_tokens': self.metrics['token_usage']['total'],
            'errors': self.metrics['errors']
        }

# Initialize performance monitor
perf_monitor = PerformanceMonitor()

print("📊 Performance Monitor Initialized")
print("🎯 Ready to track agent performance metrics")
</VSCode.Cell>

<VSCode.Cell language="python">
# Performance Testing

def run_performance_benchmark():
    """Run performance benchmark on agent operations."""
    
    print("🏃 Running Agent Performance Benchmark")
    print("=" * 45)
    
    # Simulate various agent interactions
    test_scenarios = [
        {"query": "Simple weather check", "expected_tools": 1, "complexity": "low"},
        {"query": "Calculate 25 * 4 + 10", "expected_tools": 1, "complexity": "low"},
        {"query": "Weather in 3 cities", "expected_tools": 3, "complexity": "medium"},
        {"query": "Complex analysis task", "expected_tools": 2, "complexity": "high"},
        {"query": "Multi-step calculation", "expected_tools": 2, "complexity": "medium"}
    ]
    
    for i, scenario in enumerate(test_scenarios, 1):
        print(f"\n🎯 Scenario {i}: {scenario['query'][:30]}...")
        print(f"   Complexity: {scenario['complexity']}")
        
        # Simulate interaction
        start_time = time.time()
        
        # Simulate tool calls based on complexity
        tool_calls = scenario['expected_tools']
        tool_start = time.time()
        time.sleep(0.02 * tool_calls)  # Simulate tool execution
        tool_duration = time.time() - tool_start
        
        # Simulate LLM processing
        time.sleep(0.05)  # Simulate LLM response time
        
        total_duration = time.time() - start_time
        
        # Record metrics
        token_usage = {
            'prompt': 50 + (tool_calls * 20),
            'completion': 30 + (tool_calls * 10),
            'total': 80 + (tool_calls * 30)
        }
        
        perf_monitor.record_interaction(
            duration=total_duration,
            tool_calls=tool_calls,
            tool_duration=tool_duration,
            tokens=token_usage
        )
        
        print(f"   ⏱️  Duration: {total_duration:.3f}s")
        print(f"   🔧 Tool calls: {tool_calls}")
        print(f"   🎫 Tokens: {token_usage['total']}")
    
    # Generate performance report
    report = perf_monitor.get_performance_report()
    
    print(f"\n📈 Performance Benchmark Results:")
    print("=" * 40)
    print(f"Total interactions: {report['total_interactions']}")
    print(f"Average response time: {report['average_response_time']:.3f}s")
    print(f"50th percentile: {report['p50_response_time']:.3f}s")
    print(f"95th percentile: {report['p95_response_time']:.3f}s")
    print(f"Total tool calls: {report['tool_calls']}")
    print(f"Average tool time: {report['average_tool_time']:.3f}s")
    print(f"Total tokens used: {report['total_tokens']}")
    
    # Performance analysis
    print(f"\n🎯 Performance Analysis:")
    if report['average_response_time'] < 0.5:
        print("   ✅ Excellent response times")
    elif report['average_response_time'] < 1.0:
        print("   ✅ Good response times")
    else:
        print("   ⚠️  Consider optimization")
    
    if report['p95_response_time'] > report['average_response_time'] * 3:
        print("   ⚠️  High response time variance detected")
    else:
        print("   ✅ Consistent performance")
    
    return report

# Run performance benchmark
benchmark_results = run_performance_benchmark()
</VSCode.Cell>

<VSCode.Cell language="markdown">
## Part 5: Debugging Best Practices and Troubleshooting

Based on our exploration, here are the key debugging and monitoring best practices for LangChain agents:

### 🔍 Observability Stack

1. **LangSmith Tracing** (Built-in)
   - Automatic trace capture
   - Visual reasoning flow
   - Token usage tracking
   - Error context preservation

2. **Custom Logging** (Detailed)
   - Step-by-step execution logs
   - Performance metrics
   - Error categorization
   - Conversation history

3. **Performance Monitoring** (Proactive)
   - Response time tracking
   - Tool execution analysis
   - Resource usage monitoring
   - Success rate metrics

### 🛠️ Debugging Workflow

1. **Enable Tracing** - Start with LangSmith for overview
2. **Add Custom Logging** - Detailed debugging for specific issues
3. **Monitor Performance** - Track metrics over time
4. **Analyze Patterns** - Identify optimization opportunities
5. **Implement Fixes** - Apply targeted improvements

### ⚡ Performance Optimization Tips

- **Cache tool results** when appropriate
- **Batch similar operations** to reduce overhead
- **Optimize prompts** to reduce token usage
- **Implement timeouts** for external services
- **Use structured outputs** to reduce parsing overhead
</VSCode.Cell>

<VSCode.Cell language="python">
# Debugging Checklist and Summary

def generate_debugging_checklist():
    """Generate comprehensive debugging checklist for agents."""
    
    checklist = {
        "🔍 Observability Setup": [
            "✓ LangSmith tracing enabled",
            "✓ Custom logging configured", 
            "✓ Performance monitoring active",
            "✓ Error handling implemented"
        ],
        "🧪 Testing Strategy": [
            "✓ Unit tests for individual tools",
            "✓ Integration tests for full flows",
            "✓ Error condition testing",
            "✓ Performance benchmarking"
        ],
        "⚡ Performance Optimization": [
            "✓ Response time monitoring",
            "✓ Token usage tracking",
            "✓ Tool execution profiling", 
            "✓ Memory usage analysis"
        ],
        "🛡️ Error Handling": [
            "✓ Timeout handling",
            "✓ Retry logic implementation",
            "✓ Graceful degradation",
            "✓ User-friendly error messages"
        ],
        "📊 Monitoring & Alerts": [
            "✓ Success rate tracking",
            "✓ Performance thresholds",
            "✓ Error rate monitoring",
            "✓ Cost tracking (tokens/API calls)"
        ]
    }
    
    return checklist

# Display debugging checklist
checklist = generate_debugging_checklist()

print("📋 LangChain Agent Debugging Checklist")
print("=" * 45)

for category, items in checklist.items():
    print(f"\n{category}:")
    for item in items:
        print(f"  {item}")

print(f"\n🎯 Key Takeaways:")
print("   • Always enable tracing for development")
print("   • Implement comprehensive error handling")
print("   • Monitor performance metrics continuously")
print("   • Test error conditions explicitly")
print("   • Use structured logging for analysis")

print(f"\n✅ Debugging and observability setup complete!")
print("🚀 Your agents are now ready for production monitoring")
</VSCode.Cell>

<VSCode.Cell language="markdown">
## Summary

You've successfully learned to debug and monitor LangChain agents using:

### ✅ Observability Tools
- **LangSmith Tracing**: Automatic trace capture and visualization
- **Custom Logging**: Detailed step-by-step execution tracking  
- **Performance Monitoring**: Response time and resource usage analysis
- **Error Handling**: Comprehensive error capture and recovery

### ✅ Best Practices
- Enable tracing during development
- Implement structured logging for analysis
- Monitor performance metrics continuously
- Test error conditions explicitly  
- Use graceful error handling and recovery

### ✅ Production Readiness
- Comprehensive error handling patterns
- Performance optimization strategies
- Monitoring and alerting setup
- Debugging workflow establishment

With these tools and techniques, you can confidently deploy and maintain LangChain agents in production environments with full visibility into their behavior and performance.

---

**Next**: Ready to explore advanced architectural patterns in Chapter 2!
</VSCode.Cell>
````