# Tutorial 05: Middleware and Filters - Production-Ready Safeguards

##  Learning Objectives
By the end of this notebook, you will:
- Understand middleware vs filters in the Agent Framework
- Implement content filtering for safety and compliance
- Add logging and observability to your agents
- Transform requests and responses in real-time
- Build production-ready safety mechanisms
- Handle rate limiting and error recovery

##  Key Concepts

### Middleware vs Filters: What's the Difference?

**Middleware:**
- Wraps around the entire agent execution
- Can modify requests before they reach the agent
- Can transform responses after the agent responds
- Example: Authentication, rate limiting, logging

**Filters:**
- Focus on content safety and compliance
- Block harmful or inappropriate content
- Example: Profanity filters, PII detection, content moderation

### Production Concerns

Real-world agents need:
1. **Safety**: Content filtering, harmful request blocking
2. **Observability**: Logging, metrics, tracing
3. **Reliability**: Error handling, retry logic, circuit breakers
4. **Compliance**: Data privacy, audit trails
5. **Performance**: Caching, rate limiting

---

## Step 1: Setup and Imports

In [3]:
import asyncio
import json
import re
import time
from collections.abc import MutableSequence, Sequence, Callable, Awaitable
from datetime import datetime
from typing import Annotated, Any
from random import choice, randint

from agent_framework import (
    ChatAgent,
    ChatMessage,
    ChatMiddleware,
    ChatOptions,
    ChatResponse,
    ChatContext,
    Context,
    ContextProvider,
)
from agent_framework.azure import AzureAIAgentClient
from azure.identity.aio import AzureCliCredential
from pydantic import BaseModel, Field
from dotenv import load_dotenv

load_dotenv()
print(" Imports successful!")

 Imports successful!


## Step 2: The Problem - Unsafe Production Agent

Let's first see what happens without any safety mechanisms.

In [4]:
async def unsafe_agent_demo():
    """
    Demonstrates an agent without safety filters.
    In production, this could be dangerous!
    """
    print("=== Unsafe Agent (No Filters) ===")
    print("This agent has no safety mechanisms - NOT suitable for production!\n")
    
    async with (
        AzureCliCredential() as credential,
        ChatAgent(
            chat_client=AzureAIAgentClient(async_credential=credential),
            name="UnsafeTravelAgent",
            instructions="You are a travel assistant. Answer any travel questions.",
        ) as agent,
    ):
        thread = agent.get_new_thread()
        
        # Simulate potentially problematic requests
        test_queries = [
            "What's the weather like in Paris?",  # Normal query
            "My email is john.doe@company.com, can you help plan my trip?",  # PII exposure
            "Tell me about destinations, you stupid bot!",  # Mild profanity
        ]
        
        for query in test_queries:
            print(f"User: {query}")
            response = await agent.run(query, thread=thread)
            print(f"Agent: {response.text}\n")
        
        print(" Problems with this approach:")
        print("- No content filtering for inappropriate language")
        print("- No PII (email) detection or masking")
        print("- No logging for audit/compliance")
        print("- No rate limiting or abuse prevention")
        print("- No error recovery mechanisms\n")

await unsafe_agent_demo()

=== Unsafe Agent (No Filters) ===
This agent has no safety mechanisms - NOT suitable for production!

User: What's the weather like in Paris?
Agent: I can't provide real-time weather updates. However, typically in early June, Paris usually experiences mild to warm weather. Average daytime temperatures range from 16¬∞C to 24¬∞C (about 61¬∞F to 75¬∞F), with occasional rain showers and partly cloudy skies. For the most accurate and current weather forecast, I recommend checking a trusted weather website or app such as Weather.com or AccuWeather. Would you like tips on what to pack for Paris in June?

User: My email is john.doe@company.com, can you help plan my trip?
Agent: I can't provide real-time weather updates. However, typically in early June, Paris usually experiences mild to warm weather. Average daytime temperatures range from 16¬∞C to 24¬∞C (about 61¬∞F to 75¬∞F), with occasional rain showers and partly cloudy skies. For the most accurate and current weather forecast, I recommend

## Step 3: Content Safety Filter

Let's build our first middleware - a content safety filter.

In [5]:
class ContentSafetyMiddleware(ChatMiddleware):
    """
    Middleware that filters inappropriate content in both requests and responses.
    
    This middleware:
    1. Checks user messages for profanity/inappropriate content
    2. Blocks requests that violate content policy
    3. Can also filter agent responses if needed
    """
    
    def __init__(self):
        # Simple profanity list (in production, use a proper service)
        self.blocked_words = {
            "stupid", "idiot", "dumb", "moron", 
            "hate", "kill", "die", "bomb"
        }
        
        # Inappropriate request patterns
        self.blocked_patterns = [
            r"hack\s+into",
            r"illegal\s+activities?",
            r"how\s+to\s+hurt",
        ]
    
    def _contains_inappropriate_content(self, text: str) -> tuple[bool, str | None]:
        """Check if text contains inappropriate content."""
        text_lower = text.lower()
        
        # Check for blocked words
        for word in self.blocked_words:
            if word in text_lower:
                return True, f"inappropriate language detected: '{word}'"
        
        # Check for blocked patterns
        for pattern in self.blocked_patterns:
            if re.search(pattern, text_lower):
                return True, f"inappropriate request pattern detected"
        
        return False, None
    
    async def process(
        self,
        context: ChatContext,
        next: Callable[[ChatContext], Awaitable[None]],
    ) -> None:
        """Process messages through content safety filter."""
        
        # Filter incoming user messages
        for message in context.messages:
            if hasattr(message, 'role') and message.role.value == 'user':
                # Extract text from message contents
                text_contents = []
                if hasattr(message, 'contents'):
                    for content in message.contents:
                        if hasattr(content, 'text'):
                            text_contents.append(content.text)
                
                # Check all text content
                for text in text_contents:
                    is_inappropriate, reason = self._contains_inappropriate_content(text)
                    if is_inappropriate:
                        print(f"üö´ Content blocked: {reason}")
                        # Set result to override execution
                        context.result = ChatResponse(
                            messages=[
                                ChatMessage(
                                    role="assistant",
                                    text="I'm sorry, but I cannot process requests that contain inappropriate language or content. Please rephrase your message respectfully."
                                )
                            ]
                        )
                        return  # Don't call next()
        
        print(" Content safety check passed")
        
        # Continue to next middleware or chat client
        await next(context)

print(" Content Safety Middleware created!")

 Content Safety Middleware created!


## Step 4: PII Detection and Masking Middleware

Let's add privacy protection by detecting and masking personally identifiable information.

In [6]:
class PIIDetectionMiddleware(ChatMiddleware):
    """
    Middleware that detects and masks Personally Identifiable Information (PII).
    
    This middleware:
    1. Detects emails, phone numbers, credit cards, etc.
    2. Masks sensitive data before sending to AI
    3. Logs PII detection events for compliance
    """
    
    def __init__(self):
        # PII detection patterns
        self.pii_patterns = {
            "email": r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b",
            "phone": r"\b(?:\+?1[-.]?)?\(?([0-9]{3})\)?[-.]?([0-9]{3})[-.]?([0-9]{4})\b",
            "ssn": r"\b\d{3}-\d{2}-\d{4}\b",
            "credit_card": r"\b(?:\d{4}[-\s]?){3}\d{4}\b",
        }
    
    def _mask_pii(self, text: str) -> tuple[str, list[str]]:
        """Mask PII in text and return masked text + detected PII types."""
        masked_text = text
        detected_pii = []
        
        for pii_type, pattern in self.pii_patterns.items():
            matches = re.finditer(pattern, text)
            for match in matches:
                detected_pii.append(pii_type)
                # Mask with type-specific replacement
                if pii_type == "email":
                    replacement = "[EMAIL_REDACTED]"
                elif pii_type == "phone":
                    replacement = "[PHONE_REDACTED]"
                elif pii_type == "ssn":
                    replacement = "[SSN_REDACTED]"
                elif pii_type == "credit_card":
                    replacement = "[CARD_REDACTED]"
                else:
                    replacement = "[PII_REDACTED]"
                
                masked_text = masked_text.replace(match.group(), replacement)
        
        return masked_text, list(set(detected_pii))  # Remove duplicates
    
    async def process(
        self,
        context: ChatContext,
        next: Callable[[ChatContext], Awaitable[None]],
    ) -> None:
        """Process messages through PII detection and masking."""
        
        # Process user messages for PII
        for message in context.messages:
            if hasattr(message, 'role') and message.role.value == 'user':
                if hasattr(message, 'contents'):
                    for content in message.contents:
                        if hasattr(content, 'text'):
                            original_text = content.text
                            masked_text, detected_pii = self._mask_pii(original_text)
                            
                            if detected_pii:
                                print(f"üîí PII detected and masked: {', '.join(detected_pii)}")
                                # Update the message content with masked version
                                content.text = masked_text
                                
                                # In production, log this event for compliance
                                print(f"üìã Compliance log: PII types {detected_pii} detected at {datetime.now()}")
        
        print(" PII detection check complete")
        
        # Continue to next middleware or chat client
        await next(context)

print(" PII Detection Middleware created!")

 PII Detection Middleware created!


## Step 5: Logging and Observability Middleware

Essential for production: comprehensive logging and monitoring.

In [7]:
class LoggingMiddleware(ChatMiddleware):
    """
    Middleware that provides comprehensive logging and observability.
    
    This middleware:
    1. Logs all requests and responses
    2. Tracks processing time and performance metrics
    3. Records errors and exceptions
    4. Provides audit trail for compliance
    """
    
    def __init__(self):
        self.request_count = 0
        self.total_processing_time = 0
    
    async def process(
        self,
        context: ChatContext,
        next: Callable[[ChatContext], Awaitable[None]],
    ) -> None:
        """Process messages with comprehensive logging."""
        
        self.request_count += 1
        start_time = time.time()
        timestamp = datetime.now().isoformat()
        
        # Log incoming request
        user_messages = [
            msg for msg in context.messages 
            if hasattr(msg, 'role') and msg.role.value == 'user'
        ]
        
        print(f"üì• [{timestamp}] Request #{self.request_count}")
        print(f"   Messages: {len(context.messages)} total, {len(user_messages)} from user")
        
        # Extract user message preview (first 100 chars)
        if user_messages:
            for msg in user_messages:
                if hasattr(msg, 'contents'):
                    for content in msg.contents:
                        if hasattr(content, 'text'):
                            preview = content.text[:100] + "..." if len(content.text) > 100 else content.text
                            print(f"   User input: {preview}")
        
        try:
            # Continue to next middleware or chat client
            await next(context)
            
            # Calculate processing time
            processing_time = time.time() - start_time
            self.total_processing_time += processing_time
            
            # Log successful response
            response_preview = ""
            if context.result and hasattr(context.result, 'messages') and context.result.messages:
                first_msg = context.result.messages[0]
                if hasattr(first_msg, 'contents'):
                    for content in first_msg.contents:
                        if hasattr(content, 'text'):
                            response_preview = content.text[:100] + "..." if len(content.text) > 100 else content.text
                            break
            
            print(f"üì§ [{timestamp}] Response #{self.request_count}")
            print(f"   Processing time: {processing_time:.2f}s")
            print(f"   Status: SUCCESS")
            print(f"   Response preview: {response_preview}")
            print(f"   Avg processing time: {self.total_processing_time / self.request_count:.2f}s")
            
        except Exception as e:
            # Log error
            processing_time = time.time() - start_time
            print(f" [{timestamp}] Error in request #{self.request_count}")
            print(f"   Processing time: {processing_time:.2f}s")
            print(f"   Error type: {type(e).__name__}")
            print(f"   Error message: {str(e)}")
            
            # Re-raise the exception
            raise

print(" Logging Middleware created!")

 Logging Middleware created!


## Step 6: Rate Limiting Middleware

Protect against abuse and manage resource usage.

In [8]:
class RateLimitingMiddleware(ChatMiddleware):
    """
    Middleware that implements rate limiting to prevent abuse.
    
    This middleware:
    1. Tracks requests per time window
    2. Blocks requests that exceed the limit
    3. Provides different limits for different user types
    4. Implements sliding window rate limiting
    """
    
    def __init__(self, requests_per_minute: int = 10, window_size: int = 60):
        self.requests_per_minute = requests_per_minute
        self.window_size = window_size  # seconds
        self.request_times = []  # List of request timestamps
    
    def _clean_old_requests(self):
        """Remove request timestamps outside the current window."""
        current_time = time.time()
        cutoff_time = current_time - self.window_size
        self.request_times = [
            req_time for req_time in self.request_times 
            if req_time > cutoff_time
        ]
    
    def _is_rate_limited(self) -> tuple[bool, int]:
        """Check if current request should be rate limited."""
        self._clean_old_requests()
        
        current_requests = len(self.request_times)
        is_limited = current_requests >= self.requests_per_minute
        
        return is_limited, current_requests
    
    async def process(
        self,
        context: ChatContext,
        next: Callable[[ChatContext], Awaitable[None]],
    ) -> None:
        """Process messages through rate limiting."""
        
        # Check rate limit
        is_limited, current_requests = self._is_rate_limited()
        
        if is_limited:
            print(f"üö¶ Rate limit exceeded: {current_requests}/{self.requests_per_minute} requests per minute")
            context.result = ChatResponse(
                messages=[
                    ChatMessage(
                        role="assistant",
                        text=f"I'm sorry, but you've reached the rate limit of {self.requests_per_minute} requests per minute. "
                        f"Please wait a moment before sending another message."
                    )
                ]
            )
            return  # Don't call next()
        
        # Record this request
        self.request_times.append(time.time())
        
        remaining_requests = self.requests_per_minute - current_requests - 1
        print(f" Rate limit check passed ({remaining_requests} requests remaining)")
        
        # Continue to next middleware or chat client
        await next(context)

print(" Rate Limiting Middleware created!")

 Rate Limiting Middleware created!


## Step 7: Production-Safe Agent with All Middleware

Let's combine all our middleware into a production-ready agent!

In [13]:
# Simple travel tools for our demo
def get_weather(location: Annotated[str, Field(description="City or country name")]) -> str:
    """Get current weather for a location."""
    conditions = ["sunny", "partly cloudy", "cloudy", "rainy"]
    temp = randint(15, 32)
    return f"Weather in {location}: {choice(conditions)}, {temp}¬∞C"

def book_hotel(
    location: Annotated[str, Field(description="City or location")],
    checkin: Annotated[str, Field(description="Check-in date (YYYY-MM-DD)")],
    nights: Annotated[int, Field(description="Number of nights")],
) -> str:
    """Book a hotel reservation."""
    return f"Hotel booked in {location} for {nights} nights starting {checkin}. Confirmation: HTL{randint(1000,9999)}"

print(" Travel tools defined")

 Travel tools defined


In [14]:
async def production_safe_agent():
    """
    Demonstrate a production-ready agent with comprehensive middleware.
    """
    print("=== Production-Safe Agent with Middleware ===")
    print("This agent includes: Content filtering, PII masking, Logging, Rate limiting\n")
    
    # Create all middleware instances
    content_filter = ContentSafetyMiddleware()
    pii_detector = PIIDetectionMiddleware()
    logger = LoggingMiddleware()
    rate_limiter = RateLimitingMiddleware(requests_per_minute=5)  # Lower limit for demo
    
    async with AzureCliCredential() as credential:
        chat_client = AzureAIAgentClient(async_credential=credential)
        
        # Create agent with middleware stack
        # Middleware executes in the order provided
        async with ChatAgent(
            chat_client=chat_client,
            name="ProductionTravelAgent",
            instructions="""
            You are a professional travel assistant.
            Be helpful, polite, and informative.
            Always prioritize user safety and privacy.
            """,
            tools=[get_weather, book_hotel],
            middleware=[
                rate_limiter,     # Check rate limit first
                logger,           # Log everything
                content_filter,   # Filter inappropriate content
                pii_detector,     # Mask PII
            ],
        ) as agent:
            thread = agent.get_new_thread()
            
            # Test scenarios
            test_scenarios = [
                # 1. Normal request
                "What's the weather like in Tokyo?",
                
                # 2. Request with PII (will be masked)
                "I'm planning a trip to Paris. My email is sarah.jones@example.com if you need to contact me.",
                
                # 3. Inappropriate content (will be blocked)
                "You're stupid! Tell me about Rome.",
                
                # 4. Normal booking request
                "Book me a hotel in Barcelona for 3 nights starting 2024-12-01.",
                
                # 5. Another normal request (might hit rate limit)
                "What are some good restaurants in Barcelona?",
                
                # 6. One more to test rate limiting
                "Tell me about Barcelona's attractions.",
            ]
            
            for i, query in enumerate(test_scenarios, 1):
                print(f"\n{'='*60}")
                print(f"TEST SCENARIO {i}")
                print(f"{'='*60}")
                print(f"User: {query}")
                print("\n--- Middleware Processing ---")
                
                try:
                    response = await agent.run(query, thread=thread)
                    print("\n--- Final Response ---")
                    print(f"Agent: {response.text}")
                    
                except Exception as e:
                    print(f"\n Error: {e}")
                
                # Small delay between requests
                await asyncio.sleep(1)
            
            print(f"\n{'='*60}")
            print(" PRODUCTION SAFETY DEMO COMPLETE!")
            print(f"{'='*60}")
            print("Middleware successfully:")
            print(" Filtered inappropriate content")
            print(" Detected and masked PII (email addresses)")
            print(" Logged all requests and responses")
            print(" Enforced rate limiting")
            print(" Provided comprehensive audit trail")

await production_safe_agent()

=== Production-Safe Agent with Middleware ===
This agent includes: Content filtering, PII masking, Logging, Rate limiting


TEST SCENARIO 1
User: What's the weather like in Tokyo?

--- Middleware Processing ---
 Rate limit check passed (4 requests remaining)
üì• [2025-10-02T15:34:13.139723] Request #1
   Messages: 2 total, 1 from user
   User input: What's the weather like in Tokyo?
 Content safety check passed
 PII detection check complete
üì§ [2025-10-02T15:34:13.139723] Response #1
   Processing time: 6.31s
   Status: SUCCESS
   Response preview: 
   Avg processing time: 6.31s
 Rate limit check passed (3 requests remaining)
üì• [2025-10-02T15:34:19.451125] Request #2
   Messages: 3 total, 0 from user
 Content safety check passed
 PII detection check complete
üì§ [2025-10-02T15:34:13.139723] Response #1
   Processing time: 6.31s
   Status: SUCCESS
   Response preview: 
   Avg processing time: 6.31s
 Rate limit check passed (3 requests remaining)
üì• [2025-10-02T15:34:19.451125] 

## Step 8: Response Transformation Middleware

Sometimes you need to transform or enhance agent responses.

In [15]:
class ResponseTransformMiddleware(ChatMiddleware):
    """
    Middleware that transforms agent responses.
    
    This middleware:
    1. Adds disclaimers to responses
    2. Formats responses for better readability
    3. Adds metadata or branding
    4. Implements response caching
    """
    
    def __init__(self, add_disclaimers: bool = True, add_branding: bool = True):
        self.add_disclaimers = add_disclaimers
        self.add_branding = add_branding
        self.response_cache = {}  # Simple in-memory cache
    
    def _add_travel_disclaimer(self, response_text: str) -> str:
        """Add travel-specific disclaimers to responses."""
        disclaimer = ("\n\nüìã *Please note: Travel conditions and requirements may change. "
                     "Always verify current information with official sources before traveling.*")
        return response_text + disclaimer
    
    def _add_branding(self, response_text: str) -> str:
        """Add company branding to responses."""
        branding = "\n\n *Powered by SafeTravel AI Assistant*"
        return response_text + branding
    
    def _format_response(self, response_text: str) -> str:
        """Improve response formatting."""
        # Add emoji icons for better visual appeal
        replacements = {
            "Weather": " Weather",
            "Temperature": "üå° Temperature", 
            "Hotel": " Hotel",
            "Booking": " Booking",
            "Confirmation": " Confirmation",
        }
        
        formatted_text = response_text
        for old, new in replacements.items():
            formatted_text = formatted_text.replace(old, new)
        
        return formatted_text
    
    async def process(
        self,
        context: ChatContext,
        next: Callable[[ChatContext], Awaitable[None]],
    ) -> None:
        """Process messages and transform responses."""
        
        # Continue to next middleware or chat client
        await next(context)
        
        # Transform each response message if we have a result
        if context.result and hasattr(context.result, 'messages'):
            for message in context.result.messages:
                if hasattr(message, 'contents'):
                    for content in message.contents:
                        if hasattr(content, 'text'):
                            original_text = content.text
                            transformed_text = original_text
                            
                            # Apply transformations
                            transformed_text = self._format_response(transformed_text)
                            
                            if self.add_disclaimers:
                                transformed_text = self._add_travel_disclaimer(transformed_text)
                            
                            if self.add_branding:
                                transformed_text = self._add_branding(transformed_text)
                            
                            # Update the content
                            content.text = transformed_text
                            
                            print(f" Response transformed (added formatting, disclaimers, branding)")

print(" Response Transform Middleware created!")

 Response Transform Middleware created!


## Step 9: Complete Production Pipeline

Let's put it all together - a complete production pipeline with all safety measures.

In [16]:
async def complete_production_pipeline():
    """
    Demonstrate the complete production middleware pipeline.
    """
    print("=== COMPLETE PRODUCTION PIPELINE ===")
    print("Full middleware stack: Rate Limiting ‚Üí Logging ‚Üí Content Safety ‚Üí PII Detection ‚Üí Response Transform\n")
    
    # Create complete middleware stack
    rate_limiter = RateLimitingMiddleware(requests_per_minute=3)  # Very low for demo
    logger = LoggingMiddleware()
    content_filter = ContentSafetyMiddleware()
    pii_detector = PIIDetectionMiddleware()
    response_transformer = ResponseTransformMiddleware()
    
    async with AzureCliCredential() as credential:
        chat_client = AzureAIAgentClient(async_credential=credential)
        
        async with ChatAgent(
            chat_client=chat_client,
            name="EnterpriseTravelAgent",
            instructions="""
            You are an enterprise-grade travel assistant.
            Provide accurate, helpful, and safe travel information.
            Always be professional and courteous.
            """,
            tools=[get_weather, book_hotel],
            middleware=[
                rate_limiter,          # 1. Check rate limits first
                logger,                # 2. Log all activity
                content_filter,        # 3. Filter inappropriate content
                pii_detector,         # 4. Detect and mask PII
                response_transformer,  # 5. Transform responses (runs after agent)
            ],
        ) as agent:
            thread = agent.get_new_thread()
            
            # Comprehensive test scenarios
            scenarios = [
                {
                    "name": "Normal Weather Query",
                    "query": "What's the weather like in London today?",
                    "expected": "Should work normally with enhanced formatting"
                },
                {
                    "name": "PII Detection Test",
                    "query": "I'm traveling to Paris. Contact me at john.smith@email.com or call 555-123-4567.",
                    "expected": "PII should be detected and masked"
                },
                {
                    "name": "Content Filter Test", 
                    "query": "You're a stupid bot! Tell me about Rome.",
                    "expected": "Should be blocked by content filter"
                },
                {
                    "name": "Rate Limit Test",
                    "query": "What about Barcelona?",
                    "expected": "Might hit rate limit (3 requests max)"
                },
            ]
            
            for i, scenario in enumerate(scenarios, 1):
                print(f"\n{'='*80}")
                print(f"SCENARIO {i}: {scenario['name']}")
                print(f"Expected: {scenario['expected']}")
                print(f"{'='*80}")
                print(f"User: {scenario['query']}")
                print("\n--- Processing through middleware stack ---")
                
                try:
                    response = await agent.run(scenario['query'], thread=thread)
                    print("\n--- Final Enhanced Response ---")
                    print(f"Agent: {response.text}")
                    
                except Exception as e:
                    print(f"\n Pipeline Error: {e}")
                
                # Delay between requests
                if i < len(scenarios):
                    print("\n‚è≥ Waiting 2 seconds before next request...")
                    await asyncio.sleep(2)
            
            print(f"\n{'='*80}")
            print("üèÜ ENTERPRISE PRODUCTION PIPELINE COMPLETE!")
            print(f"{'='*80}")
            print("\nüõ° Security Features Demonstrated:")
            print("    Rate limiting (3 req/min)")
            print("    Comprehensive request/response logging")
            print("    Content safety filtering")
            print("    PII detection and masking")
            print("    Response transformation and branding")
            print("    Error handling and recovery")
            print("    Audit trail for compliance")
            print("\n This agent is now production-ready!")

await complete_production_pipeline()

=== COMPLETE PRODUCTION PIPELINE ===
Full middleware stack: Rate Limiting ‚Üí Logging ‚Üí Content Safety ‚Üí PII Detection ‚Üí Response Transform


SCENARIO 1: Normal Weather Query
Expected: Should work normally with enhanced formatting
User: What's the weather like in London today?

--- Processing through middleware stack ---
 Rate limit check passed (2 requests remaining)
üì• [2025-10-02T15:34:48.081331] Request #1
   Messages: 2 total, 1 from user
   User input: What's the weather like in London today?
 Content safety check passed
 PII detection check complete
üì§ [2025-10-02T15:34:48.081331] Response #1
   Processing time: 4.63s
   Status: SUCCESS
   Response preview: 
   Avg processing time: 4.63s
 Rate limit check passed (1 requests remaining)
üì• [2025-10-02T15:34:52.715895] Request #2
   Messages: 3 total, 0 from user
 Content safety check passed
 PII detection check complete
üì§ [2025-10-02T15:34:48.081331] Response #1
   Processing time: 4.63s
   Status: SUCCESS
   Respon

##  Understanding Middleware Architecture

### Middleware Execution Order

Middleware executes in a "pipeline" pattern:

```python
# Request flows through middleware in order:
User Input
    ‚Üì
Rate Limiter        # Check limits first
    ‚Üì
Logger              # Log everything
    ‚Üì
Content Filter      # Safety checks
    ‚Üì
PII Detector        # Privacy protection
    ‚Üì
AGENT EXECUTION     # Core AI processing
    ‚Üì
Response Transform  # Format output
    ‚Üì
Logger              # Log response
    ‚Üì
Final Response
```

### Key Patterns

**1. Early Termination**
- Middleware can stop the pipeline early
- Example: Rate limiter blocks excessive requests
- Example: Content filter rejects inappropriate content

**2. Request Transformation**
- Modify messages before they reach the agent
- Example: PII masking replaces sensitive data

**3. Response Enhancement**
- Transform agent responses before returning
- Example: Add disclaimers and formatting

**4. Cross-Cutting Concerns**
- Logging, monitoring, and observability
- Applied consistently across all requests

### Production Considerations

| Middleware Type | Purpose | Production Features |
|----------------|---------|--------------------|
| **Rate Limiting** | Prevent abuse | Redis-backed counters, per-user limits |
| **Content Safety** | Block harmful content | AI-powered detection, custom rules |
| **PII Detection** | Privacy compliance | ML-based detection, data residency |
| **Logging** | Observability | Structured logs, distributed tracing |
| **Caching** | Performance | Redis/Memcached, TTL policies |
| **Authentication** | Security | OAuth, API keys, role-based access |

##  Key Takeaways

### What We Learned

1. **Middleware Architecture**
   - Pipeline pattern for processing requests/responses
   - Early termination for safety and limits
   - Cross-cutting concerns (logging, monitoring)

2. **Production Safety**
   - Content filtering prevents inappropriate interactions
   - PII detection protects user privacy
   - Rate limiting prevents abuse
   - Comprehensive logging enables debugging

3. **Enterprise Features**
   - Response transformation for consistency
   - Audit trails for compliance
   - Error handling and recovery
   - Performance monitoring

### Best Practices

1. **Layer Security** - Multiple middleware layers provide defense in depth
2. **Fail Fast** - Check limits and safety early in the pipeline
3. **Log Everything** - Comprehensive logging for debugging and compliance
4. **Transform Carefully** - Preserve original context while enhancing output
5. **Monitor Performance** - Track processing times and bottlenecks

### Production Patterns

```python
# Enterprise middleware stack
middleware_stack = [
    AuthenticationMiddleware(),     # Verify identity
    RateLimitingMiddleware(),      # Prevent abuse
    LoggingMiddleware(),           # Audit everything
    ContentSafetyMiddleware(),     # Safety first
    PIIDetectionMiddleware(),      # Privacy protection
    CachingMiddleware(),           # Performance optimization
    ResponseTransformMiddleware(), # Consistent formatting
]

# Agent with production middleware
agent = ChatAgent(
    chat_client=client,
    chat_middleware=middleware_stack
)
```

##  Practice Exercises

1. **Custom Content Filter** - Create a middleware that blocks requests about sensitive topics (politics, medical advice)
2. **Performance Monitor** - Build middleware that tracks and alerts on slow responses
3. **A/B Testing** - Create middleware that randomly selects between different agent configurations
4. **Caching Layer** - Implement response caching to improve performance for repeated queries
5. **Multi-Language Support** - Build middleware that detects language and routes to appropriate specialized agents

In [None]:
# Exercise: Create a performance monitoring middleware

class PerformanceMonitorMiddleware(ChatMiddleware):
    """
    Monitor agent performance and alert on slow responses.
    
    Your task:
    1. Track response times
    2. Calculate rolling averages
    3. Alert when responses are unusually slow
    4. Provide performance metrics
    """
    
    def __init__(self, slow_threshold_seconds: float = 5.0):
        self.slow_threshold = slow_threshold_seconds
        # Your code here!
        pass
    
    async def process(self, context, next):
        # Your implementation here!
        # Remember to:
        # - Measure processing time
        # - Track statistics
        # - Alert on slow responses
        # - Call await next(context) to continue pipeline
        pass

# Test your middleware here!
print(" Exercise ready - implement PerformanceMonitorMiddleware!")

##  What's Next?

Congratulations! You now have a production-ready agent with comprehensive safety measures.

But enterprise applications need more:
-  No multi-agent coordination
-  No workflow orchestration
-  No human-in-the-loop approvals

**In Tutorial 06: Multi-Agent Systems**, you'll learn to:
- Create teams of specialized agents
- Orchestrate complex multi-step workflows
- Handle agent-to-agent communication
- Build hierarchical agent systems

---

### Quick Reference

**Create Middleware:**
```python
class MyMiddleware(ChatMiddleware):
    async def handle(self, messages, context, **kwargs):
        # Process request
        response = await context.next(messages, **kwargs)
        # Transform response
        return response
```

**Use with Agent:**
```python
agent = ChatAgent(
    chat_client=client,
    chat_middleware=[MyMiddleware()]
)
```

**Early Termination:**
```python
# Block request without calling agent
return ChatResponse(
    messages=[AssistantMessage("Request blocked")]
)
```