# Advanced Agentic AI Systems

This notebook covers production-grade agentic AI systems - a critical skill for FAANG ML engineers building autonomous AI applications.

## Topics Covered
1. **ReAct Pattern** - Reasoning and Acting framework
2. **Multi-Agent Systems** - Orchestrating multiple specialized agents
3. **Agent Memory Systems** - Working, episodic, and long-term memory
4. **Tool Use & Function Calling** - Safe tool execution patterns
5. **Agent Evaluation & Monitoring** - Measuring agent performance
6. **Production Patterns** - Reliability, safety, and scalability

In [None]:
import torch
import torch.nn as nn
import numpy as np
from typing import Dict, List, Any, Optional, Callable, Tuple, Union
from dataclasses import dataclass, field
from enum import Enum
from abc import ABC, abstractmethod
import json
import time
import re
import ast
import operator
from collections import deque
from datetime import datetime
import hashlib
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

## 1. ReAct Agent Pattern

ReAct (Reasoning + Acting) is a paradigm where agents interleave reasoning traces with actions.

In [None]:
class AgentState(Enum):
    """Agent execution states"""
    IDLE = "idle"
    THINKING = "thinking"
    ACTING = "acting"
    OBSERVING = "observing"
    FINISHED = "finished"
    ERROR = "error"


@dataclass
class AgentAction:
    """Represents an action taken by the agent"""
    tool_name: str
    tool_input: Dict[str, Any]
    reasoning: str
    timestamp: float = field(default_factory=time.time)


@dataclass
class AgentObservation:
    """Represents an observation from tool execution"""
    content: str
    success: bool
    tool_name: str
    execution_time: float
    timestamp: float = field(default_factory=time.time)


@dataclass
class AgentStep:
    """A single step in agent execution"""
    thought: str
    action: Optional[AgentAction]
    observation: Optional[AgentObservation]
    step_number: int

In [None]:
class SafeMathParser:
    """
    Safe mathematical expression parser.
    Avoids eval() by using AST parsing with whitelisted operations.
    """
    
    # Whitelisted operators
    OPERATORS = {
        ast.Add: operator.add,
        ast.Sub: operator.sub,
        ast.Mult: operator.mul,
        ast.Div: operator.truediv,
        ast.FloorDiv: operator.floordiv,
        ast.Mod: operator.mod,
        ast.Pow: operator.pow,
        ast.USub: operator.neg,
        ast.UAdd: operator.pos,
    }
    
    # Whitelisted functions
    FUNCTIONS = {
        'abs': abs,
        'round': round,
        'min': min,
        'max': max,
        'sum': sum,
        'sqrt': lambda x: x ** 0.5,
    }
    
    @classmethod
    def evaluate(cls, expression: str) -> float:
        """Safely evaluate a mathematical expression"""
        try:
            tree = ast.parse(expression, mode='eval')
            return cls._eval_node(tree.body)
        except Exception as e:
            raise ValueError(f"Invalid expression: {expression}. Error: {e}")
    
    @classmethod
    def _eval_node(cls, node) -> float:
        """Recursively evaluate AST nodes"""
        if isinstance(node, ast.Constant):  # Numbers
            if isinstance(node.value, (int, float)):
                return node.value
            raise ValueError(f"Unsupported constant: {node.value}")
        
        elif isinstance(node, ast.BinOp):  # Binary operations
            op_type = type(node.op)
            if op_type not in cls.OPERATORS:
                raise ValueError(f"Unsupported operator: {op_type}")
            left = cls._eval_node(node.left)
            right = cls._eval_node(node.right)
            return cls.OPERATORS[op_type](left, right)
        
        elif isinstance(node, ast.UnaryOp):  # Unary operations
            op_type = type(node.op)
            if op_type not in cls.OPERATORS:
                raise ValueError(f"Unsupported operator: {op_type}")
            operand = cls._eval_node(node.operand)
            return cls.OPERATORS[op_type](operand)
        
        elif isinstance(node, ast.Call):  # Function calls
            if isinstance(node.func, ast.Name):
                func_name = node.func.id
                if func_name not in cls.FUNCTIONS:
                    raise ValueError(f"Unsupported function: {func_name}")
                args = [cls._eval_node(arg) for arg in node.args]
                return cls.FUNCTIONS[func_name](*args)
            raise ValueError("Unsupported function call")
        
        else:
            raise ValueError(f"Unsupported node type: {type(node)}")


# Test safe math parser
parser = SafeMathParser()
print(f"2 + 3 * 4 = {parser.evaluate('2 + 3 * 4')}")
print(f"sqrt(16) = {parser.evaluate('sqrt(16)')}")
print(f"(10 + 5) / 3 = {parser.evaluate('(10 + 5) / 3')}")

In [None]:
class Tool(ABC):
    """Base class for agent tools"""
    
    @property
    @abstractmethod
    def name(self) -> str:
        pass
    
    @property
    @abstractmethod
    def description(self) -> str:
        pass
    
    @property
    @abstractmethod
    def parameters(self) -> Dict[str, Any]:
        pass
    
    @abstractmethod
    def execute(self, **kwargs) -> str:
        pass
    
    def to_schema(self) -> Dict[str, Any]:
        """Convert tool to OpenAI function schema format"""
        return {
            "type": "function",
            "function": {
                "name": self.name,
                "description": self.description,
                "parameters": self.parameters
            }
        }


class CalculatorTool(Tool):
    """Safe calculator tool using AST-based parsing"""
    
    @property
    def name(self) -> str:
        return "calculator"
    
    @property
    def description(self) -> str:
        return "Performs mathematical calculations safely. Supports +, -, *, /, **, sqrt, abs, min, max."
    
    @property
    def parameters(self) -> Dict[str, Any]:
        return {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "Mathematical expression to evaluate"
                }
            },
            "required": ["expression"]
        }
    
    def execute(self, expression: str) -> str:
        try:
            result = SafeMathParser.evaluate(expression)
            return f"Result: {result}"
        except Exception as e:
            return f"Error: {str(e)}"


class SearchTool(Tool):
    """Simulated search tool"""
    
    def __init__(self):
        self.knowledge_base = {
            "pytorch": "PyTorch is an open-source machine learning framework developed by Meta AI.",
            "transformer": "Transformers are neural network architectures using self-attention mechanisms.",
            "llm": "Large Language Models are AI models trained on vast text data for language tasks.",
            "rag": "Retrieval-Augmented Generation combines retrieval with generation for accurate responses.",
        }
    
    @property
    def name(self) -> str:
        return "search"
    
    @property
    def description(self) -> str:
        return "Search for information on a given topic"
    
    @property
    def parameters(self) -> Dict[str, Any]:
        return {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Search query"
                }
            },
            "required": ["query"]
        }
    
    def execute(self, query: str) -> str:
        query_lower = query.lower()
        for key, value in self.knowledge_base.items():
            if key in query_lower:
                return f"Found: {value}"
        return "No relevant information found."


# Test tools
calc = CalculatorTool()
search = SearchTool()

print(calc.execute(expression="2 ** 10"))
print(search.execute(query="What is PyTorch?"))

In [None]:
class ReActAgent:
    """
    ReAct Agent implementing Reasoning + Acting pattern.
    
    The agent follows a loop:
    1. Thought: Reason about what to do next
    2. Action: Select and execute a tool
    3. Observation: Process tool output
    4. Repeat until task is complete
    """
    
    def __init__(
        self,
        tools: List[Tool],
        max_steps: int = 10,
        verbose: bool = True
    ):
        self.tools = {tool.name: tool for tool in tools}
        self.max_steps = max_steps
        self.verbose = verbose
        self.state = AgentState.IDLE
        self.steps: List[AgentStep] = []
        self.scratchpad = ""
    
    def _format_tools(self) -> str:
        """Format available tools for prompt"""
        tool_descriptions = []
        for name, tool in self.tools.items():
            tool_descriptions.append(f"- {name}: {tool.description}")
        return "\n".join(tool_descriptions)
    
    def _parse_action(self, response: str) -> Optional[Tuple[str, Dict[str, Any]]]:
        """Parse action from LLM response"""
        # Look for Action: tool_name(args) pattern
        action_pattern = r"Action:\s*(\w+)\((.*)\)"
        match = re.search(action_pattern, response)
        
        if match:
            tool_name = match.group(1)
            args_str = match.group(2)
            
            # Parse arguments (simplified)
            args = {}
            if args_str:
                # Handle key=value format
                arg_pattern = r'(\w+)=["\']?([^"\'\),]+)["\']?'
                for arg_match in re.finditer(arg_pattern, args_str):
                    args[arg_match.group(1)] = arg_match.group(2)
                
                # If no key=value, treat as single positional arg
                if not args and args_str.strip():
                    # Get first parameter name from tool
                    if tool_name in self.tools:
                        tool = self.tools[tool_name]
                        params = tool.parameters.get("properties", {})
                        if params:
                            first_param = list(params.keys())[0]
                            args[first_param] = args_str.strip().strip('"\'')
            
            return tool_name, args
        
        return None
    
    def _execute_tool(self, tool_name: str, args: Dict[str, Any]) -> AgentObservation:
        """Execute a tool and return observation"""
        start_time = time.time()
        
        if tool_name not in self.tools:
            return AgentObservation(
                content=f"Tool '{tool_name}' not found. Available: {list(self.tools.keys())}",
                success=False,
                tool_name=tool_name,
                execution_time=time.time() - start_time
            )
        
        try:
            result = self.tools[tool_name].execute(**args)
            return AgentObservation(
                content=result,
                success=True,
                tool_name=tool_name,
                execution_time=time.time() - start_time
            )
        except Exception as e:
            return AgentObservation(
                content=f"Error executing {tool_name}: {str(e)}",
                success=False,
                tool_name=tool_name,
                execution_time=time.time() - start_time
            )
    
    def _simulate_llm_response(self, task: str, step: int) -> str:
        """
        Simulate LLM response for demonstration.
        In production, this would call an actual LLM API.
        """
        # Simple rule-based simulation for demo
        if step == 0:
            if "calculate" in task.lower() or any(op in task for op in ['+', '-', '*', '/']):
                # Extract math expression
                numbers = re.findall(r'\d+(?:\.\d+)?', task)
                if len(numbers) >= 2:
                    expr = f"{numbers[0]} + {numbers[1]}"
                    if "multiply" in task.lower() or "*" in task:
                        expr = f"{numbers[0]} * {numbers[1]}"
                    elif "divide" in task.lower() or "/" in task:
                        expr = f"{numbers[0]} / {numbers[1]}"
                    return f"Thought: I need to perform a calculation.\nAction: calculator(expression=\"{expr}\")"
            elif "search" in task.lower() or "what is" in task.lower():
                topic = task.split()[-1] if task.split() else "topic"
                return f"Thought: I need to search for information.\nAction: search(query=\"{topic}\")"
        
        return "Thought: I have gathered enough information.\nFinal Answer: Task completed based on observations."
    
    def run(self, task: str) -> str:
        """Execute the agent on a task"""
        self.state = AgentState.THINKING
        self.steps = []
        self.scratchpad = f"Task: {task}\n\nAvailable Tools:\n{self._format_tools()}\n\n"
        
        for step_num in range(self.max_steps):
            if self.verbose:
                print(f"\n--- Step {step_num + 1} ---")
            
            # Get LLM response (simulated)
            response = self._simulate_llm_response(task, step_num)
            
            if self.verbose:
                print(f"Response: {response}")
            
            # Check for final answer
            if "Final Answer:" in response:
                self.state = AgentState.FINISHED
                final_answer = response.split("Final Answer:")[-1].strip()
                return final_answer
            
            # Parse and execute action
            action_result = self._parse_action(response)
            
            if action_result:
                tool_name, args = action_result
                self.state = AgentState.ACTING
                
                # Extract thought
                thought = ""
                if "Thought:" in response:
                    thought = response.split("Thought:")[1].split("Action:")[0].strip()
                
                action = AgentAction(
                    tool_name=tool_name,
                    tool_input=args,
                    reasoning=thought
                )
                
                # Execute tool
                self.state = AgentState.OBSERVING
                observation = self._execute_tool(tool_name, args)
                
                if self.verbose:
                    print(f"Observation: {observation.content}")
                
                # Record step
                self.steps.append(AgentStep(
                    thought=thought,
                    action=action,
                    observation=observation,
                    step_number=step_num + 1
                ))
                
                # Update scratchpad
                self.scratchpad += f"\nStep {step_num + 1}:\n"
                self.scratchpad += f"Thought: {thought}\n"
                self.scratchpad += f"Action: {tool_name}({args})\n"
                self.scratchpad += f"Observation: {observation.content}\n"
            else:
                # No valid action found
                self.state = AgentState.THINKING
        
        self.state = AgentState.ERROR
        return "Max steps reached without completing task."


# Test ReAct Agent
agent = ReActAgent(
    tools=[CalculatorTool(), SearchTool()],
    max_steps=5,
    verbose=True
)

result = agent.run("Calculate 15 multiply 20")
print(f"\nFinal Result: {result}")

## 2. Multi-Agent Systems

Complex tasks often require multiple specialized agents working together.

In [None]:
@dataclass
class AgentMessage:
    """Message passed between agents"""
    sender: str
    receiver: str
    content: str
    message_type: str  # 'task', 'result', 'query', 'feedback'
    timestamp: float = field(default_factory=time.time)
    metadata: Dict[str, Any] = field(default_factory=dict)


class SpecializedAgent:
    """Base class for specialized agents in a multi-agent system"""
    
    def __init__(self, name: str, specialty: str):
        self.name = name
        self.specialty = specialty
        self.message_history: List[AgentMessage] = []
    
    def receive_message(self, message: AgentMessage) -> AgentMessage:
        """Process incoming message and generate response"""
        self.message_history.append(message)
        response_content = self._process_task(message.content)
        
        return AgentMessage(
            sender=self.name,
            receiver=message.sender,
            content=response_content,
            message_type="result"
        )
    
    def _process_task(self, task: str) -> str:
        """Override in subclasses for specialized processing"""
        return f"[{self.name}] Processed: {task}"


class ResearchAgent(SpecializedAgent):
    """Agent specialized in research and information gathering"""
    
    def __init__(self):
        super().__init__("ResearchAgent", "information_retrieval")
        self.knowledge = {
            "ml": "Machine Learning is a subset of AI focused on learning from data.",
            "deep_learning": "Deep Learning uses neural networks with many layers.",
            "nlp": "Natural Language Processing enables machines to understand human language."
        }
    
    def _process_task(self, task: str) -> str:
        task_lower = task.lower()
        findings = []
        
        for key, value in self.knowledge.items():
            if key.replace("_", " ") in task_lower or key in task_lower:
                findings.append(value)
        
        if findings:
            return f"Research findings: {' '.join(findings)}"
        return "No specific research findings. General analysis needed."


class AnalysisAgent(SpecializedAgent):
    """Agent specialized in data analysis"""
    
    def __init__(self):
        super().__init__("AnalysisAgent", "data_analysis")
    
    def _process_task(self, task: str) -> str:
        # Simulate analysis
        if "compare" in task.lower():
            return "Comparative analysis: Both approaches have merits. Recommend hybrid solution."
        elif "evaluate" in task.lower():
            return "Evaluation complete: 85% confidence in proposed solution."
        else:
            return "Analysis: Data patterns suggest strong correlation with target metrics."


class CodeAgent(SpecializedAgent):
    """Agent specialized in code generation and review"""
    
    def __init__(self):
        super().__init__("CodeAgent", "code_generation")
    
    def _process_task(self, task: str) -> str:
        if "implement" in task.lower() or "code" in task.lower():
            return """Generated code structure:
```python
class Solution:
    def __init__(self):
        self.initialized = True
    
    def process(self, data):
        return self._transform(data)
```"""
        elif "review" in task.lower():
            return "Code review: No critical issues. Suggest adding error handling."
        else:
            return "Code task received. Ready to generate implementation."

In [None]:
class SupervisorAgent:
    """
    Supervisor agent that orchestrates multiple specialized agents.
    
    Responsibilities:
    - Task decomposition
    - Agent selection and routing
    - Result aggregation
    - Conflict resolution
    """
    
    def __init__(self, agents: List[SpecializedAgent]):
        self.agents = {agent.name: agent for agent in agents}
        self.conversation_history: List[AgentMessage] = []
        self.task_queue: deque = deque()
    
    def _decompose_task(self, task: str) -> List[Tuple[str, str]]:
        """
        Decompose a complex task into subtasks for different agents.
        Returns list of (agent_name, subtask) tuples.
        """
        subtasks = []
        task_lower = task.lower()
        
        # Simple rule-based decomposition (in production: use LLM)
        if "research" in task_lower or "find" in task_lower or "what is" in task_lower:
            subtasks.append(("ResearchAgent", f"Research: {task}"))
        
        if "analyze" in task_lower or "compare" in task_lower or "evaluate" in task_lower:
            subtasks.append(("AnalysisAgent", f"Analyze: {task}"))
        
        if "implement" in task_lower or "code" in task_lower or "build" in task_lower:
            subtasks.append(("CodeAgent", f"Implement: {task}"))
        
        # Default: send to all agents if no specific match
        if not subtasks:
            for agent_name in self.agents:
                subtasks.append((agent_name, task))
        
        return subtasks
    
    def _aggregate_results(self, results: List[AgentMessage]) -> str:
        """Aggregate results from multiple agents"""
        aggregated = "\n\n=== Aggregated Results ===\n\n"
        
        for result in results:
            aggregated += f"From {result.sender}:\n{result.content}\n\n"
        
        # Add synthesis (in production: use LLM for intelligent synthesis)
        aggregated += "=== Synthesis ===\n"
        aggregated += "Based on inputs from all agents, the task has been addressed comprehensively."
        
        return aggregated
    
    def execute(self, task: str) -> str:
        """Execute a task using the multi-agent system"""
        print(f"\nSupervisor received task: {task}")
        print("="*50)
        
        # Decompose task
        subtasks = self._decompose_task(task)
        print(f"\nDecomposed into {len(subtasks)} subtasks")
        
        # Execute subtasks
        results = []
        for agent_name, subtask in subtasks:
            if agent_name in self.agents:
                print(f"\nRouting to {agent_name}: {subtask}")
                
                message = AgentMessage(
                    sender="Supervisor",
                    receiver=agent_name,
                    content=subtask,
                    message_type="task"
                )
                
                response = self.agents[agent_name].receive_message(message)
                results.append(response)
                self.conversation_history.extend([message, response])
                
                print(f"Response: {response.content[:100]}...")
        
        # Aggregate results
        final_result = self._aggregate_results(results)
        return final_result


# Test Multi-Agent System
supervisor = SupervisorAgent([
    ResearchAgent(),
    AnalysisAgent(),
    CodeAgent()
])

result = supervisor.execute(
    "Research machine learning, analyze the findings, and implement a simple solution"
)
print(result)

## 3. Agent Memory Systems

Effective agents need various types of memory for different purposes.

In [None]:
@dataclass
class MemoryEntry:
    """A single memory entry"""
    content: str
    memory_type: str  # 'working', 'episodic', 'semantic', 'procedural'
    timestamp: float
    importance: float  # 0-1 scale
    access_count: int = 0
    last_accessed: float = None
    embedding: Optional[np.ndarray] = None
    metadata: Dict[str, Any] = field(default_factory=dict)
    
    def __post_init__(self):
        if self.last_accessed is None:
            self.last_accessed = self.timestamp


class AgentMemory:
    """
    Comprehensive memory system for agents.
    
    Memory Types:
    - Working Memory: Current context, limited capacity
    - Short-term Memory: Recent interactions, moderate capacity
    - Long-term Memory: Persistent knowledge, unlimited capacity
    - Episodic Memory: Specific events/interactions
    """
    
    def __init__(
        self,
        working_memory_capacity: int = 5,
        short_term_capacity: int = 50,
        embedding_dim: int = 768
    ):
        self.working_memory_capacity = working_memory_capacity
        self.short_term_capacity = short_term_capacity
        self.embedding_dim = embedding_dim
        
        # Memory stores
        self.working_memory: deque = deque(maxlen=working_memory_capacity)
        self.short_term_memory: List[MemoryEntry] = []
        self.long_term_memory: List[MemoryEntry] = []
        self.episodic_memory: List[MemoryEntry] = []
        
        # Simple embedding model (in production: use proper embeddings)
        self.embedding_layer = nn.Linear(100, embedding_dim)
    
    def _compute_embedding(self, text: str) -> np.ndarray:
        """Compute embedding for text (simplified)"""
        # In production: use sentence-transformers or similar
        # Simple hash-based embedding for demo
        hash_val = int(hashlib.md5(text.encode()).hexdigest(), 16)
        np.random.seed(hash_val % (2**32))
        return np.random.randn(self.embedding_dim).astype(np.float32)
    
    def _compute_importance(self, content: str) -> float:
        """Compute importance score for memory"""
        # Simple heuristics (in production: use LLM or learned model)
        importance = 0.5
        
        # Longer content might be more important
        importance += min(len(content) / 1000, 0.2)
        
        # Keywords that indicate importance
        important_keywords = ['error', 'important', 'critical', 'remember', 'key', 'must']
        for keyword in important_keywords:
            if keyword in content.lower():
                importance += 0.1
        
        return min(importance, 1.0)
    
    def add_to_working_memory(self, content: str) -> None:
        """Add to working memory (most recent context)"""
        entry = MemoryEntry(
            content=content,
            memory_type='working',
            timestamp=time.time(),
            importance=1.0,  # Working memory is always high priority
            embedding=self._compute_embedding(content)
        )
        self.working_memory.append(entry)
    
    def add_to_short_term(self, content: str, memory_type: str = 'episodic') -> None:
        """Add to short-term memory with importance scoring"""
        importance = self._compute_importance(content)
        
        entry = MemoryEntry(
            content=content,
            memory_type=memory_type,
            timestamp=time.time(),
            importance=importance,
            embedding=self._compute_embedding(content)
        )
        
        self.short_term_memory.append(entry)
        
        # Consolidate to long-term if exceeding capacity
        if len(self.short_term_memory) > self.short_term_capacity:
            self._consolidate_to_long_term()
    
    def add_episode(self, episode: Dict[str, Any]) -> None:
        """Add an episodic memory (specific interaction/event)"""
        content = json.dumps(episode)
        
        entry = MemoryEntry(
            content=content,
            memory_type='episodic',
            timestamp=time.time(),
            importance=self._compute_importance(content),
            embedding=self._compute_embedding(content),
            metadata={'episode_type': episode.get('type', 'unknown')}
        )
        
        self.episodic_memory.append(entry)
    
    def _consolidate_to_long_term(self) -> None:
        """Move important memories from short-term to long-term"""
        # Sort by importance
        self.short_term_memory.sort(key=lambda x: x.importance, reverse=True)
        
        # Keep top half in short-term, move rest to long-term
        midpoint = len(self.short_term_memory) // 2
        
        for entry in self.short_term_memory[midpoint:]:
            entry.memory_type = 'long_term'
            self.long_term_memory.append(entry)
        
        self.short_term_memory = self.short_term_memory[:midpoint]
    
    def retrieve(self, query: str, top_k: int = 5, memory_types: List[str] = None) -> List[MemoryEntry]:
        """
        Retrieve relevant memories using similarity search.
        
        Uses combination of:
        - Semantic similarity (embedding cosine similarity)
        - Recency (time decay)
        - Importance score
        """
        query_embedding = self._compute_embedding(query)
        current_time = time.time()
        
        # Collect all relevant memories
        all_memories = []
        
        if memory_types is None:
            memory_types = ['working', 'episodic', 'short_term', 'long_term']
        
        if 'working' in memory_types:
            all_memories.extend(list(self.working_memory))
        if 'short_term' in memory_types or 'episodic' in memory_types:
            all_memories.extend(self.short_term_memory)
        if 'long_term' in memory_types:
            all_memories.extend(self.long_term_memory)
        if 'episodic' in memory_types:
            all_memories.extend(self.episodic_memory)
        
        # Score each memory
        scored_memories = []
        for memory in all_memories:
            if memory.embedding is not None:
                # Cosine similarity
                similarity = np.dot(query_embedding, memory.embedding) / (
                    np.linalg.norm(query_embedding) * np.linalg.norm(memory.embedding) + 1e-8
                )
                
                # Time decay (exponential decay over hours)
                time_diff_hours = (current_time - memory.timestamp) / 3600
                recency_score = np.exp(-0.1 * time_diff_hours)
                
                # Combined score
                score = 0.5 * similarity + 0.3 * recency_score + 0.2 * memory.importance
                
                scored_memories.append((score, memory))
        
        # Sort by score and return top-k
        scored_memories.sort(key=lambda x: x[0], reverse=True)
        
        results = []
        for score, memory in scored_memories[:top_k]:
            memory.access_count += 1
            memory.last_accessed = current_time
            results.append(memory)
        
        return results
    
    def get_context_window(self) -> str:
        """Get current context from working memory"""
        context_parts = []
        for entry in self.working_memory:
            context_parts.append(entry.content)
        return "\n".join(context_parts)
    
    def summarize_memory_stats(self) -> Dict[str, Any]:
        """Get memory system statistics"""
        return {
            "working_memory_size": len(self.working_memory),
            "working_memory_capacity": self.working_memory_capacity,
            "short_term_memory_size": len(self.short_term_memory),
            "long_term_memory_size": len(self.long_term_memory),
            "episodic_memory_size": len(self.episodic_memory),
            "total_memories": (
                len(self.working_memory) + len(self.short_term_memory) +
                len(self.long_term_memory) + len(self.episodic_memory)
            )
        }


# Test Memory System
memory = AgentMemory(working_memory_capacity=5, short_term_capacity=10)

# Add some memories
memory.add_to_working_memory("User asked about machine learning")
memory.add_to_working_memory("Explained neural networks")
memory.add_to_short_term("Important: User prefers PyTorch over TensorFlow")
memory.add_to_short_term("Discussed transformer architectures")
memory.add_episode({"type": "error", "content": "API call failed", "resolution": "Retried successfully"})

# Retrieve relevant memories
results = memory.retrieve("Tell me about neural networks", top_k=3)
print("Retrieved memories:")
for r in results:
    print(f"  - {r.content[:50]}... (importance: {r.importance:.2f})")

print(f"\nMemory stats: {memory.summarize_memory_stats()}")

## 4. Production Agent Patterns

Patterns for building reliable, safe, and scalable agent systems.

In [None]:
class AgentCircuitBreaker:
    """
    Circuit breaker pattern for agent reliability.
    Prevents cascading failures in multi-agent systems.
    """
    
    def __init__(
        self,
        failure_threshold: int = 5,
        recovery_timeout: float = 60.0,
        half_open_requests: int = 3
    ):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.half_open_requests = half_open_requests
        
        self.failure_count = 0
        self.success_count = 0
        self.last_failure_time = None
        self.state = "closed"  # closed, open, half-open
    
    def can_execute(self) -> bool:
        """Check if execution is allowed"""
        if self.state == "closed":
            return True
        
        if self.state == "open":
            # Check if recovery timeout has passed
            if time.time() - self.last_failure_time >= self.recovery_timeout:
                self.state = "half-open"
                self.success_count = 0
                return True
            return False
        
        if self.state == "half-open":
            return True
        
        return False
    
    def record_success(self) -> None:
        """Record successful execution"""
        self.success_count += 1
        
        if self.state == "half-open":
            if self.success_count >= self.half_open_requests:
                self.state = "closed"
                self.failure_count = 0
        
        if self.state == "closed":
            self.failure_count = max(0, self.failure_count - 1)
    
    def record_failure(self) -> None:
        """Record failed execution"""
        self.failure_count += 1
        self.last_failure_time = time.time()
        
        if self.state == "half-open":
            self.state = "open"
        elif self.state == "closed" and self.failure_count >= self.failure_threshold:
            self.state = "open"


class AgentRateLimiter:
    """
    Rate limiter for agent API calls.
    Uses token bucket algorithm.
    """
    
    def __init__(self, tokens_per_second: float = 10.0, max_tokens: int = 100):
        self.tokens_per_second = tokens_per_second
        self.max_tokens = max_tokens
        self.tokens = max_tokens
        self.last_update = time.time()
    
    def acquire(self, tokens: int = 1) -> bool:
        """Try to acquire tokens, return True if successful"""
        self._refill()
        
        if self.tokens >= tokens:
            self.tokens -= tokens
            return True
        return False
    
    def _refill(self) -> None:
        """Refill tokens based on elapsed time"""
        now = time.time()
        elapsed = now - self.last_update
        self.tokens = min(
            self.max_tokens,
            self.tokens + elapsed * self.tokens_per_second
        )
        self.last_update = now


class SafetyGuard:
    """
    Safety guard for agent actions.
    Prevents dangerous or unauthorized actions.
    """
    
    def __init__(self):
        self.blocked_patterns = [
            r'rm\s+-rf',
            r'sudo\s+',
            r'DROP\s+TABLE',
            r'DELETE\s+FROM.*WHERE\s+1\s*=\s*1',
            r'format\s+c:',
        ]
        self.max_action_length = 10000
        self.allowed_tools = set()
    
    def register_tool(self, tool_name: str) -> None:
        """Register an allowed tool"""
        self.allowed_tools.add(tool_name)
    
    def check_action(self, action: AgentAction) -> Tuple[bool, str]:
        """Check if action is safe to execute"""
        # Check tool is allowed
        if self.allowed_tools and action.tool_name not in self.allowed_tools:
            return False, f"Tool '{action.tool_name}' is not in allowed list"
        
        # Check action length
        action_str = json.dumps(action.tool_input)
        if len(action_str) > self.max_action_length:
            return False, f"Action too long: {len(action_str)} > {self.max_action_length}"
        
        # Check for dangerous patterns
        for pattern in self.blocked_patterns:
            if re.search(pattern, action_str, re.IGNORECASE):
                return False, f"Blocked pattern detected: {pattern}"
        
        return True, "Action approved"


# Test production patterns
circuit_breaker = AgentCircuitBreaker(failure_threshold=3)
rate_limiter = AgentRateLimiter(tokens_per_second=5)
safety_guard = SafetyGuard()

safety_guard.register_tool("calculator")
safety_guard.register_tool("search")

# Test safety guard
safe_action = AgentAction(
    tool_name="calculator",
    tool_input={"expression": "2 + 2"},
    reasoning="Simple calculation"
)

unsafe_action = AgentAction(
    tool_name="shell",
    tool_input={"command": "rm -rf /"},
    reasoning="Dangerous command"
)

print(f"Safe action check: {safety_guard.check_action(safe_action)}")
print(f"Unsafe action check: {safety_guard.check_action(unsafe_action)}")

## 5. Agent Evaluation & Monitoring

Measuring and monitoring agent performance in production.

In [None]:
@dataclass
class AgentMetrics:
    """Metrics collected for agent execution"""
    task_id: str
    start_time: float
    end_time: float
    total_steps: int
    successful_tool_calls: int
    failed_tool_calls: int
    total_tokens: int
    task_completed: bool
    error_message: Optional[str] = None
    
    @property
    def duration(self) -> float:
        return self.end_time - self.start_time
    
    @property
    def success_rate(self) -> float:
        total = self.successful_tool_calls + self.failed_tool_calls
        return self.successful_tool_calls / total if total > 0 else 0.0


class AgentEvaluator:
    """
    Evaluates agent performance across multiple dimensions.
    """
    
    def __init__(self):
        self.metrics_history: List[AgentMetrics] = []
        self.evaluation_results: List[Dict[str, Any]] = []
    
    def evaluate_task_completion(
        self,
        task: str,
        agent_output: str,
        expected_output: str = None
    ) -> Dict[str, float]:
        """
        Evaluate task completion quality.
        """
        scores = {}
        
        # Completeness: Does the output address the task?
        task_keywords = set(task.lower().split())
        output_keywords = set(agent_output.lower().split())
        keyword_overlap = len(task_keywords & output_keywords) / len(task_keywords) if task_keywords else 0
        scores['completeness'] = min(keyword_overlap * 2, 1.0)
        
        # Length appropriateness
        output_length = len(agent_output)
        if output_length < 10:
            scores['length_score'] = 0.2
        elif output_length < 50:
            scores['length_score'] = 0.5
        elif output_length < 500:
            scores['length_score'] = 1.0
        else:
            scores['length_score'] = 0.8  # Might be too verbose
        
        # Accuracy (if expected output provided)
        if expected_output:
            expected_keywords = set(expected_output.lower().split())
            accuracy_overlap = len(output_keywords & expected_keywords) / len(expected_keywords) if expected_keywords else 0
            scores['accuracy'] = accuracy_overlap
        
        # Overall score
        scores['overall'] = np.mean(list(scores.values()))
        
        return scores
    
    def evaluate_efficiency(
        self,
        metrics: AgentMetrics,
        baseline_steps: int = 5,
        baseline_duration: float = 10.0
    ) -> Dict[str, float]:
        """
        Evaluate agent efficiency.
        """
        scores = {}
        
        # Step efficiency
        scores['step_efficiency'] = min(baseline_steps / max(metrics.total_steps, 1), 1.0)
        
        # Time efficiency
        scores['time_efficiency'] = min(baseline_duration / max(metrics.duration, 0.1), 1.0)
        
        # Tool success rate
        scores['tool_success_rate'] = metrics.success_rate
        
        # Token efficiency (lower is better)
        tokens_per_step = metrics.total_tokens / max(metrics.total_steps, 1)
        scores['token_efficiency'] = min(500 / max(tokens_per_step, 1), 1.0)
        
        return scores
    
    def evaluate_safety(
        self,
        agent_steps: List[AgentStep],
        safety_guard: SafetyGuard
    ) -> Dict[str, Any]:
        """
        Evaluate agent safety behavior.
        """
        results = {
            'total_actions': len(agent_steps),
            'blocked_actions': 0,
            'safety_score': 1.0,
            'violations': []
        }
        
        for step in agent_steps:
            if step.action:
                is_safe, message = safety_guard.check_action(step.action)
                if not is_safe:
                    results['blocked_actions'] += 1
                    results['violations'].append({
                        'step': step.step_number,
                        'action': step.action.tool_name,
                        'message': message
                    })
        
        if results['total_actions'] > 0:
            results['safety_score'] = 1.0 - (results['blocked_actions'] / results['total_actions'])
        
        return results


class AgentMonitor:
    """
    Real-time monitoring for agent systems.
    """
    
    def __init__(self):
        self.active_tasks: Dict[str, Dict[str, Any]] = {}
        self.completed_tasks: List[Dict[str, Any]] = []
        self.alerts: List[Dict[str, Any]] = []
        
        # Thresholds
        self.max_duration_threshold = 60.0  # seconds
        self.max_steps_threshold = 20
        self.min_success_rate_threshold = 0.7
    
    def start_task(self, task_id: str, task: str) -> None:
        """Record task start"""
        self.active_tasks[task_id] = {
            'task': task,
            'start_time': time.time(),
            'steps': 0,
            'successful_calls': 0,
            'failed_calls': 0
        }
    
    def record_step(self, task_id: str, success: bool) -> None:
        """Record a step in task execution"""
        if task_id in self.active_tasks:
            self.active_tasks[task_id]['steps'] += 1
            if success:
                self.active_tasks[task_id]['successful_calls'] += 1
            else:
                self.active_tasks[task_id]['failed_calls'] += 1
            
            self._check_alerts(task_id)
    
    def _check_alerts(self, task_id: str) -> None:
        """Check if any alert thresholds are exceeded"""
        task_info = self.active_tasks[task_id]
        
        # Duration alert
        duration = time.time() - task_info['start_time']
        if duration > self.max_duration_threshold:
            self._create_alert(task_id, 'duration_exceeded', f"Duration {duration:.1f}s exceeds threshold")
        
        # Steps alert
        if task_info['steps'] > self.max_steps_threshold:
            self._create_alert(task_id, 'steps_exceeded', f"Steps {task_info['steps']} exceeds threshold")
        
        # Success rate alert
        total_calls = task_info['successful_calls'] + task_info['failed_calls']
        if total_calls >= 5:
            success_rate = task_info['successful_calls'] / total_calls
            if success_rate < self.min_success_rate_threshold:
                self._create_alert(task_id, 'low_success_rate', f"Success rate {success_rate:.1%} below threshold")
    
    def _create_alert(self, task_id: str, alert_type: str, message: str) -> None:
        """Create an alert"""
        alert = {
            'task_id': task_id,
            'type': alert_type,
            'message': message,
            'timestamp': time.time()
        }
        self.alerts.append(alert)
        logger.warning(f"Agent Alert: {message}")
    
    def end_task(self, task_id: str, success: bool) -> None:
        """Record task completion"""
        if task_id in self.active_tasks:
            task_info = self.active_tasks.pop(task_id)
            task_info['end_time'] = time.time()
            task_info['success'] = success
            task_info['duration'] = task_info['end_time'] - task_info['start_time']
            self.completed_tasks.append(task_info)
    
    def get_dashboard_metrics(self) -> Dict[str, Any]:
        """Get metrics for monitoring dashboard"""
        completed = self.completed_tasks
        
        if not completed:
            return {'message': 'No completed tasks yet'}
        
        return {
            'total_tasks': len(completed),
            'success_rate': sum(1 for t in completed if t['success']) / len(completed),
            'avg_duration': np.mean([t['duration'] for t in completed]),
            'avg_steps': np.mean([t['steps'] for t in completed]),
            'active_tasks': len(self.active_tasks),
            'total_alerts': len(self.alerts),
            'recent_alerts': self.alerts[-5:] if self.alerts else []
        }


# Test evaluation and monitoring
evaluator = AgentEvaluator()
monitor = AgentMonitor()

# Simulate task execution
task_id = "task_001"
monitor.start_task(task_id, "Calculate 2 + 2")

for i in range(3):
    monitor.record_step(task_id, success=True)
    time.sleep(0.1)

monitor.end_task(task_id, success=True)

# Evaluate
task_scores = evaluator.evaluate_task_completion(
    task="Calculate the sum of 2 and 2",
    agent_output="The sum of 2 and 2 is 4. I used the calculator tool to compute this.",
    expected_output="4"
)
print(f"Task completion scores: {task_scores}")

print(f"\nDashboard metrics: {monitor.get_dashboard_metrics()}")

## 6. Advanced: Agentic RAG

Combining agents with Retrieval-Augmented Generation for enhanced capabilities.

In [None]:
class AgenticRAG:
    """
    Agent-enhanced RAG system.
    
    Combines:
    - Intelligent query routing
    - Multi-step retrieval
    - Self-reflection and refinement
    """
    
    def __init__(
        self,
        knowledge_base: Dict[str, str],
        embedding_dim: int = 128
    ):
        self.knowledge_base = knowledge_base
        self.embedding_dim = embedding_dim
        
        # Pre-compute embeddings for knowledge base
        self.kb_embeddings = {}
        for key, value in knowledge_base.items():
            self.kb_embeddings[key] = self._compute_embedding(value)
    
    def _compute_embedding(self, text: str) -> np.ndarray:
        """Compute simple embedding (demo purposes)"""
        hash_val = int(hashlib.md5(text.encode()).hexdigest(), 16)
        np.random.seed(hash_val % (2**32))
        return np.random.randn(self.embedding_dim).astype(np.float32)
    
    def _route_query(self, query: str) -> str:
        """
        Route query to appropriate retrieval strategy.
        """
        query_lower = query.lower()
        
        if any(word in query_lower for word in ['compare', 'difference', 'versus', 'vs']):
            return 'comparison'
        elif any(word in query_lower for word in ['how to', 'steps', 'guide', 'tutorial']):
            return 'procedural'
        elif any(word in query_lower for word in ['what is', 'define', 'explain']):
            return 'factual'
        else:
            return 'general'
    
    def _retrieve(self, query: str, top_k: int = 3) -> List[Tuple[str, str, float]]:
        """
        Retrieve relevant documents.
        """
        query_embedding = self._compute_embedding(query)
        
        scored_docs = []
        for key, value in self.knowledge_base.items():
            doc_embedding = self.kb_embeddings[key]
            similarity = np.dot(query_embedding, doc_embedding) / (
                np.linalg.norm(query_embedding) * np.linalg.norm(doc_embedding) + 1e-8
            )
            scored_docs.append((key, value, float(similarity)))
        
        scored_docs.sort(key=lambda x: x[2], reverse=True)
        return scored_docs[:top_k]
    
    def _decompose_query(self, query: str) -> List[str]:
        """
        Decompose complex query into sub-queries.
        """
        sub_queries = [query]  # Always include original
        
        # Simple decomposition based on 'and'
        if ' and ' in query.lower():
            parts = query.lower().split(' and ')
            sub_queries.extend(parts)
        
        # Handle comparison queries
        if 'compare' in query.lower() or 'vs' in query.lower():
            # Extract entities being compared
            words = query.split()
            for i, word in enumerate(words):
                if word.lower() in ['vs', 'versus', 'and', 'compare']:
                    if i > 0:
                        sub_queries.append(f"what is {words[i-1]}")
                    if i < len(words) - 1:
                        sub_queries.append(f"what is {words[i+1]}")
        
        return list(set(sub_queries))
    
    def _self_reflect(self, query: str, response: str, context: List[str]) -> Dict[str, Any]:
        """
        Self-reflection to evaluate response quality.
        """
        reflection = {
            'is_grounded': False,
            'is_complete': False,
            'needs_refinement': False,
            'missing_aspects': []
        }
        
        # Check if response is grounded in context
        response_words = set(response.lower().split())
        context_text = ' '.join(context).lower()
        context_words = set(context_text.split())
        
        overlap = len(response_words & context_words) / len(response_words) if response_words else 0
        reflection['is_grounded'] = overlap > 0.3
        
        # Check if response addresses query
        query_keywords = set(query.lower().split()) - {'what', 'is', 'the', 'a', 'an', 'how', 'to'}
        addressed = sum(1 for kw in query_keywords if kw in response.lower())
        reflection['is_complete'] = addressed / len(query_keywords) > 0.5 if query_keywords else True
        
        # Identify missing aspects
        for kw in query_keywords:
            if kw not in response.lower():
                reflection['missing_aspects'].append(kw)
        
        reflection['needs_refinement'] = not reflection['is_grounded'] or not reflection['is_complete']
        
        return reflection
    
    def query(self, query: str, max_iterations: int = 3) -> Dict[str, Any]:
        """
        Execute agentic RAG query.
        """
        result = {
            'query': query,
            'route': self._route_query(query),
            'iterations': [],
            'final_response': '',
            'sources': []
        }
        
        # Decompose query
        sub_queries = self._decompose_query(query)
        
        all_context = []
        for sub_query in sub_queries:
            retrieved = self._retrieve(sub_query, top_k=2)
            for key, value, score in retrieved:
                if value not in all_context:
                    all_context.append(value)
                    result['sources'].append({'key': key, 'score': score})
        
        # Generate initial response
        response = f"Based on the retrieved information: {' '.join(all_context[:2])}"
        
        # Self-reflection loop
        for i in range(max_iterations):
            reflection = self._self_reflect(query, response, all_context)
            result['iterations'].append({
                'iteration': i + 1,
                'response_preview': response[:100],
                'reflection': reflection
            })
            
            if not reflection['needs_refinement']:
                break
            
            # Refine response (simplified)
            if reflection['missing_aspects']:
                # Retrieve more for missing aspects
                for aspect in reflection['missing_aspects'][:2]:
                    additional = self._retrieve(aspect, top_k=1)
                    for key, value, score in additional:
                        if value not in all_context:
                            all_context.append(value)
                
                response = f"Refined response based on {len(all_context)} sources: {' '.join(all_context[:3])}"
        
        result['final_response'] = response
        return result


# Test Agentic RAG
knowledge_base = {
    "pytorch": "PyTorch is an open-source machine learning framework developed by Meta AI. It's known for dynamic computation graphs and Pythonic design.",
    "tensorflow": "TensorFlow is Google's open-source machine learning framework. It uses static computation graphs and has strong production deployment support.",
    "transformer": "Transformers are neural network architectures that use self-attention mechanisms. They power models like BERT and GPT.",
    "bert": "BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model developed by Google.",
    "gpt": "GPT (Generative Pre-trained Transformer) is a family of language models developed by OpenAI."
}

agentic_rag = AgenticRAG(knowledge_base)
result = agentic_rag.query("Compare PyTorch and TensorFlow for deep learning")

print(f"Query Route: {result['route']}")
print(f"Iterations: {len(result['iterations'])}")
print(f"Sources used: {len(result['sources'])}")
print(f"\nFinal Response: {result['final_response'][:200]}...")

## FAANG Interview Questions

### Q1: How would you design a multi-agent system for a complex task like code review?

**Answer:**
I would design a hierarchical multi-agent system:

1. **Supervisor Agent**: Orchestrates the review process, routes tasks, aggregates findings
2. **Specialized Agents**:
   - **Security Agent**: Scans for vulnerabilities (OWASP Top 10, injection risks)
   - **Style Agent**: Checks coding standards, naming conventions
   - **Logic Agent**: Analyzes correctness, edge cases, race conditions
   - **Performance Agent**: Identifies bottlenecks, complexity issues
   - **Documentation Agent**: Checks comments, docstrings, API docs

3. **Communication Pattern**: Pub/sub for parallel execution, shared memory for context
4. **Conflict Resolution**: Priority-based (security > correctness > performance > style)
5. **Output**: Aggregated report with severity rankings and suggested fixes

### Q2: What memory systems would you implement for a long-running agent?

**Answer:**
I would implement a hierarchical memory system:

1. **Working Memory** (Limited, ~5 items): Current context, active task state
2. **Short-term Memory** (Moderate, ~50 items): Recent interactions, session context
3. **Long-term Memory** (Unlimited): Persistent knowledge, user preferences
4. **Episodic Memory**: Specific past interactions for few-shot learning

Key components:
- **Importance scoring** for memory consolidation
- **Vector similarity search** for retrieval
- **Time decay** for recency bias
- **Compression** for long-term storage efficiency

### Q3: How do you ensure agent safety in production?

**Answer:**
Multiple layers of safety:

1. **Action Validation**: Whitelist allowed tools, block dangerous patterns
2. **Rate Limiting**: Token bucket algorithm to prevent runaway costs
3. **Circuit Breaker**: Stop execution on repeated failures
4. **Sandboxing**: Execute tool calls in isolated environments
5. **Human-in-the-Loop**: Require approval for high-risk actions
6. **Monitoring**: Real-time alerts for anomalous behavior
7. **Audit Logging**: Complete trace of all actions for review

## Summary

This notebook covered:

1. **ReAct Pattern**: Reasoning and acting loop for structured problem-solving
2. **Multi-Agent Systems**: Orchestrating specialized agents with supervisors
3. **Memory Systems**: Hierarchical memory with working, short-term, and long-term stores
4. **Production Patterns**: Circuit breakers, rate limiters, safety guards
5. **Evaluation & Monitoring**: Metrics, dashboards, and alerting
6. **Agentic RAG**: Enhanced retrieval with query decomposition and self-reflection

### Key Takeaways for FAANG Interviews:
- Agents require careful design for reliability and safety
- Memory systems enable context retention across interactions
- Multi-agent systems need clear communication and coordination patterns
- Production systems require monitoring, circuit breakers, and rate limiting
- Safety is paramount: always validate actions and maintain audit trails