# Agentic AI Implementation: GenAI Use-Case Evaluation & Recommendation Engine

This notebook demonstrates a comprehensive agentic AI solution for evaluating whether a use case should use Generative AI, deterministic logic, or a hybrid approach. The implementation progressively applies all 12 lab techniques from the GAI-3101 course.

## Use Case Overview

**Problem**: AI architects spend ~2.75 hours per use case manually evaluating whether GenAI is appropriate, applying decision trees, and documenting recommendations.

**Solution**: Multi-agent system that automates data collection, decision-tree evaluation, and recommendation report generation.

**Business Impact**:
- Lead time reduction: 13.75 hours → 4 hours (71% improvement)
- Manual effort savings: 2.75 hours → 1 hour per use case (64% reduction)
- Annual cost savings: $67,200 (40 evaluations/month at $80/hour expert rate)
- ROI: Breakeven in Year 1, $47,200/year net benefit from Year 2

## Workflow Steps

1. **Collect Evaluation Data** (45 min → 15 min)
2. **Apply Decision Tree & Draft Verdict** (60 min → 15 min)
3. **Generate Recommendation & Documentation** (60 min → 30 min)

## Lab Techniques Applied

- **Lab 1**: Simple Python Agent
- **Lab 2**: Round Robin Communication
- **Lab 4**: Deliberative Agent (LangGraph)
- **Lab 6**: Observation Tools
- **Lab 7**: Action Tools
- **Lab 8**: Hierarchical Planning
- **Lab 9**: Rule-Based Reasoning
- **Lab 12**: Error Recovery
- **Lab 11**: Complete End-to-End System

## Part 1: Environment Setup

In [None]:
# Install required packages
!pip install -q openai pyautogen langchain langchain-openai langgraph python-dotenv

In [None]:
# Configure OpenAI API Key
import os
from getpass import getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")

In [None]:
# Initialize OpenAI client
from openai import OpenAI
import json
from typing import Dict, List, Any, Optional
from datetime import datetime
from enum import Enum

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
print("✓ OpenAI client initialized")

In [None]:
# Define recommendation types
class RecommendationType(Enum):
    DETERMINISTIC = "deterministic"
    GENAI = "genai"
    HYBRID = "hybrid"

# Sample use case submissions for testing
sample_use_cases = [
    {
        "use_case_id": "UC-2025-001",
        "title": "Customer Email Response Generator",
        "description": "Generate personalized email responses to customer inquiries based on their account status and query type.",
        "submitted_by": "customer_service_team",
        "submitted_at": "2025-11-25",
        "input_data_types": ["customer_email_text", "account_data", "query_history"],
        "expected_output": "Natural language email response",
        "volume": "500 emails/day",
        "compliance_requirements": ["GDPR", "brand_guidelines"]
    },
    {
        "use_case_id": "UC-2025-002",
        "title": "Invoice Validation System",
        "description": "Validate incoming invoices against predefined business rules: check totals match line items, verify tax calculations, confirm vendor is approved.",
        "submitted_by": "finance_team",
        "submitted_at": "2025-11-25",
        "input_data_types": ["structured_invoice_data", "vendor_list", "tax_rules"],
        "expected_output": "Pass/Fail validation result with error codes",
        "volume": "200 invoices/day",
        "compliance_requirements": ["SOX", "audit_trail"]
    },
    {
        "use_case_id": "UC-2025-003",
        "title": "Contract Clause Analyzer",
        "description": "Analyze legal contracts to extract key clauses, identify risks, and generate summary with specific rule-based flags for non-standard terms.",
        "submitted_by": "legal_team",
        "submitted_at": "2025-11-25",
        "input_data_types": ["pdf_contracts", "clause_library", "risk_rules"],
        "expected_output": "Structured extraction + natural language risk summary",
        "volume": "50 contracts/week",
        "compliance_requirements": ["legal_review", "audit_requirements"]
    }
]

print("✓ Sample use cases loaded")
print(f"   - {sample_use_cases[0]['title']} (likely GenAI)")
print(f"   - {sample_use_cases[1]['title']} (likely Deterministic)")
print(f"   - {sample_use_cases[2]['title']} (likely Hybrid)")

## Part 2: Foundation – Simple Python Agent (Lab 1)

We create a base `Agent` class and 3 specialized agents for the evaluation workflow:

1. **InfoCollectionAgent**: Gathers and structures use case information
2. **EvaluationAgent**: Applies decision tree and generates verdict
3. **ReportGenerationAgent**: Creates final recommendation with trade-offs

In [None]:
# Base Agent class (Lab 1 - Simple Python Agent)
class Agent:
    """Base agent class with action selection and execution."""
    
    def __init__(self, name: str, role: str):
        self.name = name
        self.role = role
        self.state = {}
        
    def _select_action(self, observation: Dict[str, Any]) -> str:
        """Select action based on observation. Override in subclasses."""
        raise NotImplementedError("Subclasses must implement _select_action")
        
    def act(self, observation: Dict[str, Any]) -> Dict[str, Any]:
        """Execute action based on observation."""
        action = self._select_action(observation)
        result = self._execute_action(action, observation)
        return result
        
    def _execute_action(self, action: str, observation: Dict[str, Any]) -> Dict[str, Any]:
        """Execute the selected action."""
        return {
            "agent": self.name,
            "action": action,
            "status": "completed",
            "observation": observation
        }

print("✓ Base Agent class defined")

In [None]:
# Specialized Evaluation Agents

class InfoCollectionAgent(Agent):
    """Agent for collecting and structuring use case information (Step 1)."""
    
    def __init__(self):
        super().__init__("InfoCollectionAgent", "Collect and structure use case data")
        self.required_fields = [
            "title", "description", "input_data_types", 
            "expected_output", "volume", "compliance_requirements"
        ]
        self.decision_questions = [
            {
                "id": "q1_deterministic_rules",
                "question": "Can the problem be solved with CLEAR DETERMINISTIC RULES?",
                "guidance": "If all inputs map to specific outputs via known rules, answer YES"
            },
            {
                "id": "q2_natural_language_output",
                "question": "Does the desired output require NATURAL LANGUAGE generation?",
                "guidance": "If output is free-form text, summaries, or conversational, answer YES"
            },
            {
                "id": "q3_unstructured_data",
                "question": "Must UNSTRUCTURED DATA be analyzed (text, images, audio)?",
                "guidance": "If inputs include documents, images, or audio for understanding, answer YES"
            },
            {
                "id": "q4_mixed_requirements",
                "question": "Are there MIXED requirements (some deterministic, some generative)?",
                "guidance": "If workflow combines rule-based validation with generative outputs, answer YES"
            }
        ]
        
    def _select_action(self, observation: Dict[str, Any]) -> str:
        if "use_case" in observation:
            return "collect_and_structure"
        return "request_information"
        
    def collect_and_structure(self, use_case: Dict) -> Dict[str, Any]:
        """Collect use case info and apply decision-tree questions."""
        
        # Check completeness
        missing_fields = [f for f in self.required_fields if f not in use_case or not use_case[f]]
        
        # Analyze use case to infer answers to decision questions
        criteria_answers = self._infer_criteria_answers(use_case)
        
        # Build structured summary
        structured_summary = {
            "use_case_id": use_case.get("use_case_id", "unknown"),
            "title": use_case.get("title", "Untitled"),
            "description": use_case.get("description", ""),
            "data_characteristics": {
                "input_types": use_case.get("input_data_types", []),
                "has_unstructured_data": self._has_unstructured_data(use_case.get("input_data_types", [])),
                "output_type": use_case.get("expected_output", ""),
                "requires_natural_language": self._requires_nlg(use_case.get("expected_output", ""))
            },
            "operational_context": {
                "volume": use_case.get("volume", "unknown"),
                "compliance": use_case.get("compliance_requirements", [])
            },
            "criteria_answers": criteria_answers,
            "completeness": {
                "is_complete": len(missing_fields) == 0,
                "missing_fields": missing_fields
            },
            "collected_at": datetime.now().isoformat()
        }
        
        return {
            "agent": self.name,
            "action": "collect_and_structure",
            "status": "success",
            "structured_summary": structured_summary,
            "message": f"Collected data for use case: {use_case.get('title', 'unknown')}"
        }
        
    def _infer_criteria_answers(self, use_case: Dict) -> Dict[str, Any]:
        """Infer answers to decision-tree questions based on use case data."""
        input_types = use_case.get("input_data_types", [])
        output = use_case.get("expected_output", "").lower()
        description = use_case.get("description", "").lower()
        
        # Heuristics for inferring answers
        has_unstructured = self._has_unstructured_data(input_types)
        requires_nlg = self._requires_nlg(output)
        
        # Check for deterministic indicators
        deterministic_keywords = ["validate", "check", "verify", "match", "compare", "calculate", "rule"]
        has_deterministic = any(kw in description for kw in deterministic_keywords)
        
        return {
            "q1_deterministic_rules": has_deterministic and not has_unstructured and not requires_nlg,
            "q2_natural_language_output": requires_nlg,
            "q3_unstructured_data": has_unstructured,
            "q4_mixed_requirements": has_deterministic and (has_unstructured or requires_nlg)
        }
        
    def _has_unstructured_data(self, input_types: List[str]) -> bool:
        """Check if input types include unstructured data."""
        unstructured_indicators = ["text", "email", "document", "pdf", "image", "audio", "video", "contract"]
        return any(
            any(ind in input_type.lower() for ind in unstructured_indicators)
            for input_type in input_types
        )
        
    def _requires_nlg(self, output: str) -> bool:
        """Check if output requires natural language generation."""
        nlg_indicators = ["natural language", "email", "response", "summary", "narrative", "text", "description"]
        return any(ind in output.lower() for ind in nlg_indicators)


class EvaluationAgent(Agent):
    """Agent for applying decision tree and generating verdict (Step 2)."""
    
    def __init__(self):
        super().__init__("EvaluationAgent", "Apply decision tree and recommend approach")
        
    def _select_action(self, observation: Dict[str, Any]) -> str:
        if "structured_summary" in observation:
            return "evaluate_and_recommend"
        return "insufficient_data"
        
    def evaluate_and_recommend(self, structured_summary: Dict) -> Dict[str, Any]:
        """Apply decision tree and generate preliminary verdict."""
        
        criteria = structured_summary.get("criteria_answers", {})
        
        # Apply decision tree logic
        decision_path = []
        
        # Question 1: Clear deterministic rules?
        if criteria.get("q1_deterministic_rules", False):
            decision_path.append("Q1: YES - Problem can be solved with deterministic rules")
            recommendation = RecommendationType.DETERMINISTIC
        else:
            decision_path.append("Q1: NO - Cannot be fully solved with deterministic rules")
            
            # Question 2: Natural language output?
            if criteria.get("q2_natural_language_output", False):
                decision_path.append("Q2: YES - Requires natural language generation")
                recommendation = RecommendationType.GENAI
            else:
                decision_path.append("Q2: NO - Does not require natural language generation")
                
                # Question 3: Unstructured data?
                if criteria.get("q3_unstructured_data", False):
                    decision_path.append("Q3: YES - Must analyze unstructured data")
                    recommendation = RecommendationType.GENAI
                else:
                    decision_path.append("Q3: NO - No unstructured data analysis needed")
                    recommendation = RecommendationType.DETERMINISTIC
        
        # Check for hybrid scenarios
        if criteria.get("q4_mixed_requirements", False):
            decision_path.append("Q4: YES - Mixed requirements detected")
            recommendation = RecommendationType.HYBRID
            
        # Generate confidence score based on clarity of decision
        confidence = self._calculate_confidence(criteria, recommendation)
        
        evaluation_result = {
            "use_case_id": structured_summary.get("use_case_id", "unknown"),
            "title": structured_summary.get("title", "unknown"),
            "recommendation": recommendation.value,
            "confidence": confidence,
            "decision_path": decision_path,
            "criteria_evaluated": criteria,
            "preliminary_rationale": self._generate_rationale(recommendation, criteria),
            "evaluated_at": datetime.now().isoformat()
        }
        
        return {
            "agent": self.name,
            "action": "evaluate_and_recommend",
            "status": "success",
            "evaluation": evaluation_result,
            "message": f"Recommendation: {recommendation.value.upper()} (confidence: {confidence}%)"
        }
        
    def _calculate_confidence(self, criteria: Dict, recommendation: RecommendationType) -> int:
        """Calculate confidence score for the recommendation."""
        # Higher confidence when criteria clearly point one direction
        if recommendation == RecommendationType.DETERMINISTIC:
            if criteria.get("q1_deterministic_rules") and not criteria.get("q2_natural_language_output") and not criteria.get("q3_unstructured_data"):
                return 95
            return 75
        elif recommendation == RecommendationType.GENAI:
            if criteria.get("q2_natural_language_output") or criteria.get("q3_unstructured_data"):
                return 90
            return 70
        else:  # HYBRID
            return 85  # Hybrid is typically well-justified when mixed requirements exist
            
    def _generate_rationale(self, recommendation: RecommendationType, criteria: Dict) -> str:
        """Generate preliminary rationale for the recommendation."""
        if recommendation == RecommendationType.DETERMINISTIC:
            return "The use case can be addressed with deterministic logic because the problem has clear, rule-based solutions and does not require natural language generation or unstructured data analysis."
        elif recommendation == RecommendationType.GENAI:
            reasons = []
            if criteria.get("q2_natural_language_output"):
                reasons.append("natural language output is required")
            if criteria.get("q3_unstructured_data"):
                reasons.append("unstructured data must be analyzed")
            return f"GenAI is recommended because {' and '.join(reasons)}. These capabilities are not achievable with traditional deterministic approaches."
        else:  # HYBRID
            return "A hybrid approach is recommended because the use case has both deterministic components (rule-based validation/processing) and generative components (natural language or unstructured data). Combining both approaches optimizes for accuracy and auditability."


class ReportGenerationAgent(Agent):
    """Agent for generating final recommendation report (Step 3)."""
    
    def __init__(self):
        super().__init__("ReportGenerationAgent", "Generate final recommendation with trade-offs")
        
    def _select_action(self, observation: Dict[str, Any]) -> str:
        if "structured_summary" in observation and "evaluation" in observation:
            return "generate_report"
        return "insufficient_data"
        
    def generate_report(self, structured_summary: Dict, evaluation: Dict) -> Dict[str, Any]:
        """Generate comprehensive recommendation report with trade-offs."""
        
        recommendation = evaluation.get("recommendation", "unknown")
        
        # Generate trade-off analysis
        trade_offs = self._generate_trade_offs(recommendation, structured_summary)
        
        # Generate warnings and considerations
        warnings = self._generate_warnings(recommendation, structured_summary)
        
        # Generate implementation guidance
        implementation_guidance = self._generate_implementation_guidance(recommendation)
        
        # Build final report
        report = {
            "report_id": f"REC-{evaluation.get('use_case_id', 'unknown')}",
            "generated_at": datetime.now().isoformat(),
            
            "executive_summary": self._generate_executive_summary(evaluation, structured_summary),
            
            "use_case_details": {
                "id": structured_summary.get("use_case_id"),
                "title": structured_summary.get("title"),
                "description": structured_summary.get("description")
            },
            
            "recommendation": {
                "approach": recommendation,
                "confidence": evaluation.get("confidence"),
                "rationale": evaluation.get("preliminary_rationale")
            },
            
            "decision_analysis": {
                "decision_path": evaluation.get("decision_path"),
                "criteria_evaluated": evaluation.get("criteria_evaluated")
            },
            
            "trade_off_analysis": trade_offs,
            
            "warnings_and_considerations": warnings,
            
            "implementation_guidance": implementation_guidance,
            
            "next_steps": self._generate_next_steps(recommendation)
        }
        
        return {
            "agent": self.name,
            "action": "generate_report",
            "status": "success",
            "report": report,
            "message": f"Report {report['report_id']} generated successfully"
        }
        
    def _generate_executive_summary(self, evaluation: Dict, summary: Dict) -> str:
        """Generate executive summary."""
        title = summary.get("title", "Unknown")
        rec = evaluation.get("recommendation", "unknown").upper()
        conf = evaluation.get("confidence", 0)
        
        return f"""Use case \"{title}\" has been evaluated using the GenAI Decision Framework.

**Recommendation: {rec} APPROACH** (Confidence: {conf}%)

{evaluation.get('preliminary_rationale', '')}

This recommendation is based on analysis of the use case requirements, data characteristics, and expected outputs. See detailed trade-off analysis below."""
        
    def _generate_trade_offs(self, recommendation: str, summary: Dict) -> Dict[str, Any]:
        """Generate trade-off analysis for the recommendation."""
        
        trade_offs = {
            "deterministic": {
                "pros": [
                    "100% predictable and deterministic outputs",
                    "Full auditability and explainability",
                    "Lower operational costs",
                    "Easier compliance and testing",
                    "No model drift or prompt engineering needed"
                ],
                "cons": [
                    "Cannot handle unstructured data effectively",
                    "Limited flexibility for edge cases",
                    "Rules must be manually maintained",
                    "Cannot generate natural language outputs"
                ]
            },
            "genai": {
                "pros": [
                    "Can process unstructured data (text, images, audio)",
                    "Generates natural, contextual language outputs",
                    "Handles edge cases and ambiguity well",
                    "Faster development for complex NLP tasks"
                ],
                "cons": [
                    "Non-deterministic outputs (variability)",
                    "Harder to audit and explain decisions",
                    "Higher operational costs (token usage)",
                    "Potential hallucinations and errors",
                    "Model drift and prompt maintenance required"
                ]
            },
            "hybrid": {
                "pros": [
                    "Best of both worlds",
                    "Deterministic validation with generative flexibility",
                    "Clear separation of concerns",
                    "Better auditability for rule-based components"
                ],
                "cons": [
                    "More complex architecture",
                    "Higher development and maintenance effort",
                    "Need to define clear boundaries",
                    "Testing complexity increases"
                ]
            }
        }
        
        return {
            "recommended_approach": trade_offs.get(recommendation, {}),
            "alternatives": {
                k: v for k, v in trade_offs.items() if k != recommendation
            }
        }
        
    def _generate_warnings(self, recommendation: str, summary: Dict) -> List[str]:
        """Generate warnings and considerations."""
        warnings = []
        compliance = summary.get("operational_context", {}).get("compliance", [])
        
        if recommendation == "genai":
            warnings.append("⚠️ GenAI outputs are non-deterministic; implement output validation")
            warnings.append("⚠️ Consider explainability requirements for audit purposes")
            if "SOX" in compliance or "audit" in str(compliance).lower():
                warnings.append("⚠️ Compliance requirement detected: Ensure GenAI decisions can be explained and traced")
                
        if recommendation == "hybrid":
            warnings.append("⚠️ Define clear boundaries between deterministic and GenAI components")
            warnings.append("⚠️ Ensure deterministic validation runs before/after GenAI processing")
            
        if "GDPR" in compliance:
            warnings.append("⚠️ GDPR compliance: Ensure no PII is sent to external LLM APIs")
            
        return warnings
        
    def _generate_implementation_guidance(self, recommendation: str) -> Dict[str, Any]:
        """Generate implementation guidance."""
        guidance = {
            "deterministic": {
                "suggested_technologies": ["Python/Java rule engine", "SQL-based validation", "Business rules management system"],
                "architecture_pattern": "Rule-based processing pipeline",
                "key_considerations": [
                    "Document all rules clearly",
                    "Implement comprehensive unit tests",
                    "Version control rule changes",
                    "Build monitoring for rule hit rates"
                ]
            },
            "genai": {
                "suggested_technologies": ["OpenAI GPT-4", "Claude", "LangChain", "Custom fine-tuned models"],
                "architecture_pattern": "LLM-powered processing with validation layer",
                "key_considerations": [
                    "Implement prompt versioning and testing",
                    "Add output validation and guardrails",
                    "Monitor for model drift and hallucinations",
                    "Implement fallback mechanisms",
                    "Consider token cost optimization"
                ]
            },
            "hybrid": {
                "suggested_technologies": ["Rule engine + LangChain", "Python + OpenAI API", "Orchestration framework"],
                "architecture_pattern": "Layered architecture: Deterministic validation → GenAI processing → Deterministic post-validation",
                "key_considerations": [
                    "Define clear interface between components",
                    "Use deterministic pre-processing for data validation",
                    "Apply deterministic post-processing for output validation",
                    "Document which component handles which logic"
                ]
            }
        }
        
        return guidance.get(recommendation, {})
        
    def _generate_next_steps(self, recommendation: str) -> List[str]:
        """Generate recommended next steps."""
        base_steps = [
            "Review this recommendation with the requesting team",
            "Validate decision criteria answers with stakeholders",
            "Assess implementation complexity and timeline"
        ]
        
        if recommendation == "genai":
            base_steps.extend([
                "Conduct prompt engineering proof-of-concept",
                "Evaluate LLM provider options and costs",
                "Define output validation strategy"
            ])
        elif recommendation == "hybrid":
            base_steps.extend([
                "Design component boundary specification",
                "Define data flow between deterministic and GenAI layers",
                "Plan integration testing strategy"
            ])
        else:
            base_steps.extend([
                "Document complete rule set",
                "Design rule engine architecture",
                "Plan rule maintenance process"
            ])
            
        return base_steps

print("✓ All 3 specialized evaluation agents defined")

In [None]:
# Test Specialized Agents with sample use cases

# Test with GenAI-likely use case (Customer Email Response)
print("=" * 80)
print("Testing with: Customer Email Response Generator")
print("=" * 80)

# 1. Info Collection Agent
info_agent = InfoCollectionAgent()
info_result = info_agent.collect_and_structure(sample_use_cases[0])
print(f"\n1. InfoCollectionAgent: {info_result['message']}")
print(f"   Has unstructured data: {info_result['structured_summary']['data_characteristics']['has_unstructured_data']}")
print(f"   Requires NLG: {info_result['structured_summary']['data_characteristics']['requires_natural_language']}")

# 2. Evaluation Agent
eval_agent = EvaluationAgent()
eval_result = eval_agent.evaluate_and_recommend(info_result['structured_summary'])
print(f"\n2. EvaluationAgent: {eval_result['message']}")
print(f"   Decision Path:")
for step in eval_result['evaluation']['decision_path']:
    print(f"      - {step}")

# 3. Report Generation Agent
report_agent = ReportGenerationAgent()
report_result = report_agent.generate_report(info_result['structured_summary'], eval_result['evaluation'])
print(f"\n3. ReportGenerationAgent: {report_result['message']}")
print(f"   Recommendation: {report_result['report']['recommendation']['approach'].upper()}")
print(f"   Confidence: {report_result['report']['recommendation']['confidence']}%")

In [None]:
# Test with Deterministic-likely use case (Invoice Validation)
print("=" * 80)
print("Testing with: Invoice Validation System")
print("=" * 80)

info_result_2 = info_agent.collect_and_structure(sample_use_cases[1])
print(f"\n1. InfoCollectionAgent: {info_result_2['message']}")
print(f"   Has unstructured data: {info_result_2['structured_summary']['data_characteristics']['has_unstructured_data']}")
print(f"   Requires NLG: {info_result_2['structured_summary']['data_characteristics']['requires_natural_language']}")

eval_result_2 = eval_agent.evaluate_and_recommend(info_result_2['structured_summary'])
print(f"\n2. EvaluationAgent: {eval_result_2['message']}")

report_result_2 = report_agent.generate_report(info_result_2['structured_summary'], eval_result_2['evaluation'])
print(f"\n3. ReportGenerationAgent: {report_result_2['message']}")
print(f"   Recommendation: {report_result_2['report']['recommendation']['approach'].upper()}")

In [None]:
# Test with Hybrid-likely use case (Contract Clause Analyzer)
print("=" * 80)
print("Testing with: Contract Clause Analyzer")
print("=" * 80)

info_result_3 = info_agent.collect_and_structure(sample_use_cases[2])
print(f"\n1. InfoCollectionAgent: {info_result_3['message']}")
print(f"   Has unstructured data: {info_result_3['structured_summary']['data_characteristics']['has_unstructured_data']}")
print(f"   Mixed requirements: {info_result_3['structured_summary']['criteria_answers']['q4_mixed_requirements']}")

eval_result_3 = eval_agent.evaluate_and_recommend(info_result_3['structured_summary'])
print(f"\n2. EvaluationAgent: {eval_result_3['message']}")

report_result_3 = report_agent.generate_report(info_result_3['structured_summary'], eval_result_3['evaluation'])
print(f"\n3. ReportGenerationAgent: {report_result_3['message']}")
print(f"   Recommendation: {report_result_3['report']['recommendation']['approach'].upper()}")

## Part 3: Multi-Agent Communication (Lab 2)

Use AutoGen's `RoundRobinGroupChat` to orchestrate sequential communication between evaluation agents.

In [None]:
# Initialize model client
from autogen import ConversableAgent, GroupChat, GroupChatManager

llm_config = {
    "model": "gpt-4",
    "api_key": os.environ["OPENAI_API_KEY"],
    "temperature": 0.7
}

print("✓ AutoGen LLM config initialized")

In [None]:
# Define AutoGen agents for GenAI evaluation workflow

evaluation_coordinator = ConversableAgent(
    name="EvaluationCoordinator",
    system_message="""You are the GenAI Use-Case Evaluation coordinator. You receive use case submissions 
    and orchestrate the evaluation workflow through specialized agents: 
    InfoCollector → DecisionTreeEvaluator → ReportWriter.
    Ensure the final recommendation is clear: DETERMINISTIC, GENAI, or HYBRID.""",
    llm_config=llm_config,
    human_input_mode="NEVER"
)

info_collector = ConversableAgent(
    name="InfoCollector",
    system_message="""You collect and structure use case information. Analyze the submission to answer:
    1. Can this be solved with clear deterministic rules?
    2. Does it require natural language generation?
    3. Must unstructured data be analyzed?
    4. Are there mixed requirements?
    Provide a structured summary with these answers.""",
    llm_config=llm_config,
    human_input_mode="NEVER"
)

decision_evaluator = ConversableAgent(
    name="DecisionTreeEvaluator",
    system_message="""You apply the GenAI decision tree:
    - If deterministic rules suffice → DETERMINISTIC
    - If NLG or unstructured data needed → GENAI
    - If mixed requirements → HYBRID
    Provide the decision path and preliminary verdict with confidence level.""",
    llm_config=llm_config,
    human_input_mode="NEVER"
)

report_writer = ConversableAgent(
    name="ReportWriter",
    system_message="""You generate the final recommendation report with:
    - Executive summary
    - Trade-off analysis (pros/cons of recommended approach vs alternatives)
    - Warnings and compliance considerations
    - Implementation guidance
    - Next steps
    Make the report actionable and clear.""",
    llm_config=llm_config,
    human_input_mode="NEVER"
)

print("✓ AutoGen agents for evaluation workflow defined")

In [None]:
# Create Round-Robin Group Chat

evaluation_group_chat = GroupChat(
    agents=[evaluation_coordinator, info_collector, decision_evaluator, report_writer],
    messages=[],
    max_round=8,
    speaker_selection_method="round_robin"
)

evaluation_manager = GroupChatManager(
    groupchat=evaluation_group_chat,
    llm_config=llm_config
)

print("✓ Evaluation Round-Robin Group Chat created")
print("Workflow: Coordinator → InfoCollector → DecisionEvaluator → ReportWriter")

## Part 4: Deliberative Agent with LangGraph (Lab 4)

Implement a deliberative workflow using LangGraph's StateGraph for the evaluation process.

In [None]:
# Initialize LangChain LLM
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4", temperature=0.7, api_key=os.environ["OPENAI_API_KEY"])
print("✓ LangChain LLM initialized")

In [None]:
# Define State Schema
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
import operator

class EvaluationWorkflowState(TypedDict):
    """State for GenAI evaluation workflow."""
    use_case: Dict[str, Any]
    structured_summary: Optional[Dict[str, Any]]
    evaluation: Optional[Dict[str, Any]]
    report: Optional[Dict[str, Any]]
    messages: Annotated[List[str], operator.add]
    next_step: str

print("✓ Evaluation workflow state schema defined")

In [None]:
# Define workflow nodes

def collect_info_node(state: EvaluationWorkflowState) -> EvaluationWorkflowState:
    """Collect and structure use case information."""
    agent = InfoCollectionAgent()
    result = agent.collect_and_structure(state["use_case"])
    state["structured_summary"] = result["structured_summary"]
    state["messages"].append(f"[InfoCollector] {result['message']}")
    state["next_step"] = "evaluate"
    return state

def evaluate_node(state: EvaluationWorkflowState) -> EvaluationWorkflowState:
    """Apply decision tree and generate verdict."""
    agent = EvaluationAgent()
    result = agent.evaluate_and_recommend(state["structured_summary"])
    state["evaluation"] = result["evaluation"]
    state["messages"].append(f"[Evaluator] {result['message']}")
    state["next_step"] = "generate_report"
    return state

def generate_report_node(state: EvaluationWorkflowState) -> EvaluationWorkflowState:
    """Generate final recommendation report."""
    agent = ReportGenerationAgent()
    result = agent.generate_report(state["structured_summary"], state["evaluation"])
    state["report"] = result["report"]
    state["messages"].append(f"[ReportGenerator] {result['message']}")
    state["next_step"] = "end"
    return state

print("✓ Workflow nodes defined")

In [None]:
# Build the workflow graph

workflow = StateGraph(EvaluationWorkflowState)

# Add nodes
workflow.add_node("collect_info", collect_info_node)
workflow.add_node("evaluate", evaluate_node)
workflow.add_node("generate_report", generate_report_node)

# Define edges
workflow.add_edge("collect_info", "evaluate")
workflow.add_edge("evaluate", "generate_report")
workflow.add_edge("generate_report", END)

# Set entry point
workflow.set_entry_point("collect_info")

# Compile
evaluation_workflow_app = workflow.compile()

print("✓ LangGraph workflow compiled")

In [None]:
# Execute workflow for all sample use cases

print("=" * 80)
print("EXECUTING LANGGRAPH WORKFLOW FOR ALL USE CASES")
print("=" * 80)

workflow_results = []

for use_case in sample_use_cases:
    print(f"\n--- Evaluating: {use_case['title']} ---")
    
    initial_state = EvaluationWorkflowState(
        use_case=use_case,
        structured_summary=None,
        evaluation=None,
        report=None,
        messages=[],
        next_step="collect_info"
    )
    
    final_state = evaluation_workflow_app.invoke(initial_state)
    workflow_results.append(final_state)
    
    print(f"Workflow Steps:")
    for msg in final_state["messages"]:
        print(f"  {msg}")
    print(f"Final Recommendation: {final_state['report']['recommendation']['approach'].upper()}")

print("\n" + "=" * 80)
print("WORKFLOW EXECUTION COMPLETE")
print("=" * 80)

## Part 5: Observation & Action Tools (Labs 6-7)

Implement observation tools (data retrieval) and action tools (persistence) for the evaluation workflow.

In [None]:
# Observation Tools (Lab 6)

# Simulated knowledge base of past evaluations
CASE_KNOWLEDGE_BASE = [
    {
        "case_id": "HIST-001",
        "title": "Chatbot for FAQ",
        "recommendation": "genai",
        "rationale": "Required natural language understanding and generation"
    },
    {
        "case_id": "HIST-002",
        "title": "Transaction Fraud Rules",
        "recommendation": "deterministic",
        "rationale": "Clear threshold-based rules for fraud detection"
    },
    {
        "case_id": "HIST-003",
        "title": "Document Summarization with Validation",
        "recommendation": "hybrid",
        "rationale": "Combine GenAI for summarization with rules for validation"
    }
]

def search_similar_cases(use_case_title: str, use_case_description: str) -> List[Dict]:
    """
    Search knowledge base for similar past evaluations.
    
    Args:
        use_case_title: Title of the use case
        use_case_description: Description of the use case
        
    Returns:
        List of similar past cases
    """
    # In production, use embeddings and vector search
    # For demo, return all cases as "similar"
    return {
        "query": use_case_title,
        "similar_cases": CASE_KNOWLEDGE_BASE,
        "search_timestamp": datetime.now().isoformat()
    }

def get_evaluation_criteria() -> Dict[str, Any]:
    """
    Retrieve the current evaluation criteria and decision tree.
    
    Returns:
        Evaluation criteria configuration
    """
    return {
        "version": "1.0",
        "decision_tree": [
            {
                "question_id": "q1",
                "question": "Can the problem be solved with CLEAR DETERMINISTIC RULES?",
                "yes_action": "recommend_deterministic",
                "no_action": "continue_to_q2"
            },
            {
                "question_id": "q2",
                "question": "Does the desired output require NATURAL LANGUAGE generation?",
                "yes_action": "recommend_genai",
                "no_action": "continue_to_q3"
            },
            {
                "question_id": "q3",
                "question": "Must UNSTRUCTURED DATA be analyzed?",
                "yes_action": "recommend_genai",
                "no_action": "recommend_deterministic"
            },
            {
                "question_id": "q4",
                "question": "Are there MIXED requirements?",
                "yes_action": "recommend_hybrid",
                "no_action": "use_previous_recommendation"
            }
        ],
        "retrieved_at": datetime.now().isoformat()
    }

def get_compliance_requirements(compliance_codes: List[str]) -> Dict[str, Any]:
    """
    Retrieve compliance requirement details.
    
    Args:
        compliance_codes: List of compliance requirement codes
        
    Returns:
        Compliance requirement details
    """
    compliance_details = {
        "GDPR": {
            "name": "General Data Protection Regulation",
            "genai_considerations": ["No PII to external LLMs", "Right to explanation", "Data minimization"],
            "risk_level": "HIGH"
        },
        "SOX": {
            "name": "Sarbanes-Oxley Act",
            "genai_considerations": ["Full audit trail required", "Decision explainability", "Version control"],
            "risk_level": "HIGH"
        },
        "audit_trail": {
            "name": "Audit Trail Requirements",
            "genai_considerations": ["Log all inputs and outputs", "Trace decision paths"],
            "risk_level": "MEDIUM"
        },
        "brand_guidelines": {
            "name": "Brand Guidelines",
            "genai_considerations": ["Output validation for brand consistency", "Tone monitoring"],
            "risk_level": "LOW"
        }
    }
    
    return {
        "requested_codes": compliance_codes,
        "details": {code: compliance_details.get(code, {"name": "Unknown", "genai_considerations": [], "risk_level": "UNKNOWN"}) for code in compliance_codes}
    }

print("✓ Observation tools defined")

In [None]:
# Action Tools (Lab 7)

# Simulated persistent storage
EVALUATION_DATABASE = []

def save_evaluation_to_kb(report: Dict, structured_summary: Dict) -> Dict[str, Any]:
    """
    Save evaluation report to knowledge base.
    
    Args:
        report: Final recommendation report
        structured_summary: Structured use case summary
        
    Returns:
        Save confirmation
    """
    record = {
        "record_id": f"KB-{len(EVALUATION_DATABASE) + 1:04d}",
        "report_id": report.get("report_id"),
        "use_case_id": structured_summary.get("use_case_id"),
        "title": structured_summary.get("title"),
        "recommendation": report.get("recommendation", {}).get("approach"),
        "confidence": report.get("recommendation", {}).get("confidence"),
        "criteria_answers": structured_summary.get("criteria_answers"),
        "saved_at": datetime.now().isoformat()
    }
    
    EVALUATION_DATABASE.append(record)
    
    return {
        "status": "SAVED",
        "record_id": record["record_id"],
        "saved_at": record["saved_at"]
    }

def notify_stakeholders(use_case_id: str, recommendation: str, submitted_by: str) -> Dict[str, Any]:
    """
    Notify stakeholders of evaluation completion.
    
    Args:
        use_case_id: Use case identifier
        recommendation: Final recommendation
        submitted_by: Original requester
        
    Returns:
        Notification confirmation
    """
    # In production, send actual notifications
    return {
        "notification_id": f"NOTIF-{datetime.now().strftime('%Y%m%d%H%M%S')}",
        "use_case_id": use_case_id,
        "recipients": [submitted_by, "ai_governance_team"],
        "message": f"Evaluation complete: {recommendation.upper()} approach recommended",
        "sent_at": datetime.now().isoformat(),
        "status": "SENT"
    }

def export_report(report: Dict, format: str = "json") -> Dict[str, Any]:
    """
    Export recommendation report.
    
    Args:
        report: Report to export
        format: Export format (json, pdf, etc.)
        
    Returns:
        Export confirmation with location
    """
    # In production, generate actual file
    return {
        "report_id": report.get("report_id"),
        "format": format,
        "location": f"/exports/{report.get('report_id')}.{format}",
        "exported_at": datetime.now().isoformat(),
        "status": "EXPORTED"
    }

print("✓ Action tools defined")

In [None]:
# Test observation and action tools

print("=== Testing Observation Tools ===")

print("\n1. Search Similar Cases:")
similar = search_similar_cases("Customer Email Generator", "Generate email responses")
print(f"   Found {len(similar['similar_cases'])} similar cases")
for case in similar['similar_cases']:
    print(f"     - {case['title']}: {case['recommendation'].upper()}")

print("\n2. Get Evaluation Criteria:")
criteria = get_evaluation_criteria()
print(f"   Version: {criteria['version']}")
print(f"   Questions in decision tree: {len(criteria['decision_tree'])}")

print("\n3. Get Compliance Requirements:")
compliance = get_compliance_requirements(["GDPR", "SOX"])
for code, details in compliance['details'].items():
    print(f"   {code}: {details['name']} (Risk: {details['risk_level']})")

print("\n=== Testing Action Tools ===")

print("\n1. Save Evaluation to Knowledge Base:")
save_result = save_evaluation_to_kb(
    workflow_results[0]['report'],
    workflow_results[0]['structured_summary']
)
print(f"   Status: {save_result['status']}")
print(f"   Record ID: {save_result['record_id']}")

print("\n2. Notify Stakeholders:")
notification = notify_stakeholders(
    sample_use_cases[0]['use_case_id'],
    workflow_results[0]['report']['recommendation']['approach'],
    sample_use_cases[0]['submitted_by']
)
print(f"   Status: {notification['status']}")
print(f"   Recipients: {notification['recipients']}")

print("\n3. Export Report:")
export = export_report(workflow_results[0]['report'], "json")
print(f"   Status: {export['status']}")
print(f"   Location: {export['location']}")

## Part 6: Rule-Based Reasoning (Lab 9)

Implement deterministic rule-based validation for evaluation quality and completeness.

In [None]:
# Rule-Based Validator for Evaluation Quality

class RuleBasedEvaluationValidator:
    """Deterministic rule-based validation for use case evaluations."""
    
    def __init__(self):
        self.validation_rules = [
            self.rule_required_use_case_fields,
            self.rule_decision_tree_completeness,
            self.rule_recommendation_consistency,
            self.rule_confidence_threshold,
            self.rule_trade_offs_present,
            self.rule_compliance_addressed
        ]
        
    def validate_all(self, use_case: Dict, structured_summary: Dict, evaluation: Dict, report: Optional[Dict] = None) -> Dict[str, Any]:
        """Run all validation rules."""
        results = []
        
        for rule in self.validation_rules:
            result = rule(use_case, structured_summary, evaluation, report)
            results.append(result)
            
        passed = sum(1 for r in results if r["passed"])
        failed = len(results) - passed
        
        return {
            "validation_summary": {
                "total_rules": len(results),
                "passed": passed,
                "failed": failed,
                "overall_status": "PASS" if failed == 0 else "FAIL"
            },
            "rule_results": results
        }
        
    def rule_required_use_case_fields(self, use_case: Dict, summary: Dict, evaluation: Dict, report: Optional[Dict]) -> Dict:
        """Validate required use case fields are present."""
        required_fields = ["title", "description", "input_data_types", "expected_output"]
        missing = [f for f in required_fields if f not in use_case or not use_case[f]]
        
        return {
            "rule": "required_use_case_fields",
            "passed": len(missing) == 0,
            "message": "All required fields present" if len(missing) == 0 else f"Missing fields: {missing}"
        }
        
    def rule_decision_tree_completeness(self, use_case: Dict, summary: Dict, evaluation: Dict, report: Optional[Dict]) -> Dict:
        """Validate all decision tree questions were answered."""
        criteria = summary.get("criteria_answers", {})
        required_questions = ["q1_deterministic_rules", "q2_natural_language_output", "q3_unstructured_data", "q4_mixed_requirements"]
        answered = [q for q in required_questions if q in criteria and criteria[q] is not None]
        
        return {
            "rule": "decision_tree_completeness",
            "passed": len(answered) == len(required_questions),
            "message": f"Decision tree complete: {len(answered)}/{len(required_questions)} questions answered"
        }
        
    def rule_recommendation_consistency(self, use_case: Dict, summary: Dict, evaluation: Dict, report: Optional[Dict]) -> Dict:
        """Validate recommendation is consistent with criteria answers."""
        criteria = summary.get("criteria_answers", {})
        rec = evaluation.get("recommendation", "")
        
        # Check consistency
        is_consistent = True
        issues = []
        
        if rec == "deterministic":
            if criteria.get("q2_natural_language_output") or criteria.get("q3_unstructured_data"):
                is_consistent = False
                issues.append("Deterministic recommended but NLG or unstructured data detected")
                
        elif rec == "genai":
            if criteria.get("q1_deterministic_rules") and not criteria.get("q2_natural_language_output") and not criteria.get("q3_unstructured_data"):
                is_consistent = False
                issues.append("GenAI recommended but problem appears deterministic")
                
        return {
            "rule": "recommendation_consistency",
            "passed": is_consistent,
            "message": "Recommendation consistent with criteria" if is_consistent else f"Inconsistencies: {issues}"
        }
        
    def rule_confidence_threshold(self, use_case: Dict, summary: Dict, evaluation: Dict, report: Optional[Dict]) -> Dict:
        """Validate confidence meets minimum threshold."""
        confidence = evaluation.get("confidence", 0)
        threshold = 60  # Minimum 60% confidence required
        
        return {
            "rule": "confidence_threshold",
            "passed": confidence >= threshold,
            "message": f"Confidence {confidence}% meets threshold ({threshold}%)" if confidence >= threshold else f"Confidence {confidence}% below threshold ({threshold}%)"
        }
        
    def rule_trade_offs_present(self, use_case: Dict, summary: Dict, evaluation: Dict, report: Optional[Dict]) -> Dict:
        """Validate trade-off analysis is present in report."""
        if not report:
            return {"rule": "trade_offs_present", "passed": True, "message": "Skipped (no report)"}
            
        trade_offs = report.get("trade_off_analysis", {})
        has_trade_offs = (
            "recommended_approach" in trade_offs and
            "pros" in trade_offs.get("recommended_approach", {}) and
            "cons" in trade_offs.get("recommended_approach", {})
        )
        
        return {
            "rule": "trade_offs_present",
            "passed": has_trade_offs,
            "message": "Trade-off analysis complete" if has_trade_offs else "Trade-off analysis missing or incomplete"
        }
        
    def rule_compliance_addressed(self, use_case: Dict, summary: Dict, evaluation: Dict, report: Optional[Dict]) -> Dict:
        """Validate compliance requirements are addressed."""
        compliance_reqs = use_case.get("compliance_requirements", [])
        
        if not compliance_reqs:
            return {"rule": "compliance_addressed", "passed": True, "message": "No compliance requirements specified"}
            
        if not report:
            return {"rule": "compliance_addressed", "passed": True, "message": "Skipped (no report)"}
            
        warnings = report.get("warnings_and_considerations", [])
        has_compliance_warnings = any("compliance" in str(w).lower() or any(req.lower() in str(w).lower() for req in compliance_reqs) for w in warnings)
        
        return {
            "rule": "compliance_addressed",
            "passed": has_compliance_warnings or len(compliance_reqs) == 0,
            "message": "Compliance requirements addressed in warnings" if has_compliance_warnings else f"Compliance requirements {compliance_reqs} not addressed in warnings"
        }

print("✓ Rule-based evaluation validator defined")

In [None]:
# Test rule-based validation

validator = RuleBasedEvaluationValidator()

print("=== Rule-Based Validation Results ===")

for i, result in enumerate(workflow_results):
    print(f"\n--- Use Case: {sample_use_cases[i]['title']} ---")
    
    validation = validator.validate_all(
        sample_use_cases[i],
        result['structured_summary'],
        result['evaluation'],
        result['report']
    )
    
    print(f"Overall Status: {validation['validation_summary']['overall_status']}")
    print(f"Passed: {validation['validation_summary']['passed']}/{validation['validation_summary']['total_rules']}")
    
    for rule_result in validation['rule_results']:
        status_icon = "✓" if rule_result['passed'] else "✗"
        print(f"  {status_icon} {rule_result['rule']}: {rule_result['message']}")

## Part 7: Hierarchical Planning (Lab 8)

Implement hierarchical task decomposition for the evaluation workflow.

In [None]:
# Hierarchical Planner for Evaluation Workflow

class HierarchicalEvaluationPlanner:
    """Hierarchical planner for use case evaluation workflows."""
    
    def __init__(self):
        self.tasks = self._define_tasks()
        self.task_dependencies = self._define_dependencies()
        self.agent_assignments = self._define_agent_assignments()
        
    def _define_tasks(self) -> Dict[str, Dict]:
        """Define hierarchical task structure."""
        return {
            "genai_evaluation": {
                "description": "Complete GenAI use-case evaluation",
                "type": "composite",
                "subtasks": ["data_collection", "decision_analysis", "documentation"]
            },
            "data_collection": {
                "description": "Collect and structure use case information",
                "type": "composite",
                "subtasks": ["extract_use_case_info", "answer_decision_questions", "search_similar_cases"]
            },
            "extract_use_case_info": {
                "description": "Extract basic use case information",
                "type": "primitive",
                "estimated_time_min": 5
            },
            "answer_decision_questions": {
                "description": "Infer answers to decision tree questions",
                "type": "primitive",
                "estimated_time_min": 5
            },
            "search_similar_cases": {
                "description": "Search knowledge base for similar past evaluations",
                "type": "primitive",
                "estimated_time_min": 3
            },
            "decision_analysis": {
                "description": "Apply decision tree and generate verdict",
                "type": "composite",
                "subtasks": ["apply_decision_tree", "calculate_confidence", "generate_rationale"]
            },
            "apply_decision_tree": {
                "description": "Execute decision tree logic",
                "type": "primitive",
                "estimated_time_min": 3
            },
            "calculate_confidence": {
                "description": "Calculate recommendation confidence score",
                "type": "primitive",
                "estimated_time_min": 2
            },
            "generate_rationale": {
                "description": "Generate preliminary rationale for recommendation",
                "type": "primitive",
                "estimated_time_min": 5
            },
            "documentation": {
                "description": "Generate final report and persist to KB",
                "type": "composite",
                "subtasks": ["generate_trade_offs", "generate_warnings", "assemble_report", "save_to_kb"]
            },
            "generate_trade_offs": {
                "description": "Generate trade-off analysis",
                "type": "primitive",
                "estimated_time_min": 8
            },
            "generate_warnings": {
                "description": "Generate compliance warnings and considerations",
                "type": "primitive",
                "estimated_time_min": 5
            },
            "assemble_report": {
                "description": "Assemble final recommendation report",
                "type": "primitive",
                "estimated_time_min": 10
            },
            "save_to_kb": {
                "description": "Save evaluation to knowledge base",
                "type": "primitive",
                "estimated_time_min": 2
            }
        }
        
    def _define_dependencies(self) -> Dict[str, List[str]]:
        """Define task dependencies."""
        return {
            "answer_decision_questions": ["extract_use_case_info"],
            "search_similar_cases": ["extract_use_case_info"],
            "apply_decision_tree": ["answer_decision_questions"],
            "calculate_confidence": ["apply_decision_tree"],
            "generate_rationale": ["apply_decision_tree", "search_similar_cases"],
            "generate_trade_offs": ["generate_rationale"],
            "generate_warnings": ["generate_rationale", "extract_use_case_info"],
            "assemble_report": ["generate_trade_offs", "generate_warnings", "calculate_confidence"],
            "save_to_kb": ["assemble_report"]
        }
        
    def _define_agent_assignments(self) -> Dict[str, str]:
        """Define which agent handles each task."""
        return {
            "extract_use_case_info": "InfoCollectionAgent",
            "answer_decision_questions": "InfoCollectionAgent",
            "search_similar_cases": "InfoCollectionAgent",
            "apply_decision_tree": "EvaluationAgent",
            "calculate_confidence": "EvaluationAgent",
            "generate_rationale": "EvaluationAgent",
            "generate_trade_offs": "ReportGenerationAgent",
            "generate_warnings": "ReportGenerationAgent",
            "assemble_report": "ReportGenerationAgent",
            "save_to_kb": "ReportGenerationAgent"
        }
        
    def create_execution_plan(self, root_task: str = "genai_evaluation") -> Dict[str, Any]:
        """Create hierarchical execution plan."""
        
        def get_all_primitive_tasks(task_name: str) -> List[str]:
            task = self.tasks[task_name]
            if task["type"] == "primitive":
                return [task_name]
            else:
                primitives = []
                for subtask in task.get("subtasks", []):
                    primitives.extend(get_all_primitive_tasks(subtask))
                return primitives
                
        primitive_tasks = get_all_primitive_tasks(root_task)
        ordered_tasks = self._topological_sort(primitive_tasks)
        
        execution_plan = {
            "root_task": root_task,
            "total_tasks": len(ordered_tasks),
            "estimated_total_time_min": sum(
                self.tasks[task].get("estimated_time_min", 0)
                for task in ordered_tasks
            ),
            "execution_sequence": [
                {
                    "order": idx + 1,
                    "task_name": task,
                    "description": self.tasks[task]["description"],
                    "agent": self.agent_assignments.get(task, "Unknown"),
                    "estimated_time_min": self.tasks[task].get("estimated_time_min", 0),
                    "dependencies": self.task_dependencies.get(task, [])
                }
                for idx, task in enumerate(ordered_tasks)
            ]
        }
        
        return execution_plan
        
    def _topological_sort(self, tasks: List[str]) -> List[str]:
        """Topological sort of tasks based on dependencies."""
        in_degree = {task: 0 for task in tasks}
        adj_list = {task: [] for task in tasks}
        
        for task in tasks:
            deps = self.task_dependencies.get(task, [])
            for dep in deps:
                if dep in tasks:
                    adj_list[dep].append(task)
                    in_degree[task] += 1
                    
        queue = [task for task in tasks if in_degree[task] == 0]
        sorted_tasks = []
        
        while queue:
            current = queue.pop(0)
            sorted_tasks.append(current)
            
            for neighbor in adj_list[current]:
                in_degree[neighbor] -= 1
                if in_degree[neighbor] == 0:
                    queue.append(neighbor)
                    
        return sorted_tasks

print("✓ Hierarchical evaluation planner defined")

In [None]:
# Test hierarchical planning

planner = HierarchicalEvaluationPlanner()
execution_plan = planner.create_execution_plan()

print("=== Hierarchical Execution Plan ===")
print(f"\nRoot Task: {execution_plan['root_task']}")
print(f"Total Tasks: {execution_plan['total_tasks']}")
print(f"Estimated Total Time: {execution_plan['estimated_total_time_min']} minutes")

print("\nExecution Sequence:")
for step in execution_plan['execution_sequence']:
    deps_str = f" (depends on: {', '.join(step['dependencies'])})" if step['dependencies'] else ""
    print(f"  {step['order']}. {step['task_name']} [{step['agent']}] - {step['estimated_time_min']}min{deps_str}")
    print(f"     {step['description']}")

## Part 8: Error Recovery (Lab 12)

Implement robust error handling with retry logic and circuit breaker patterns.

In [None]:
# Error Recovery Mechanisms

import time
from functools import wraps

class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class CircuitBreaker:
    """Circuit breaker pattern for fault tolerance."""
    
    def __init__(self, failure_threshold: int = 3, timeout: int = 60):
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.failure_count = 0
        self.last_failure_time = None
        self.state = CircuitState.CLOSED
        
    def call(self, func, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if time.time() - self.last_failure_time >= self.timeout:
                self.state = CircuitState.HALF_OPEN
            else:
                raise Exception("Circuit breaker OPEN - service unavailable")
                
        try:
            result = func(*args, **kwargs)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure()
            raise e
            
    def _on_success(self):
        self.failure_count = 0
        self.state = CircuitState.CLOSED
        
    def _on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

def with_retry(max_attempts: int = 3, backoff_factor: float = 2.0):
    """Retry decorator with exponential backoff."""
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            attempt = 0
            while attempt < max_attempts:
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    attempt += 1
                    if attempt >= max_attempts:
                        raise e
                    wait_time = backoff_factor ** attempt
                    print(f"  [Retry] Attempt {attempt} failed: {str(e)}. Retrying in {wait_time}s...")
                    time.sleep(wait_time)
        return wrapper
    return decorator

class ResilientEvaluationAgent:
    """Evaluation agent with error recovery capabilities."""
    
    def __init__(self, base_agent: Agent):
        self.base_agent = base_agent
        self.circuit_breaker = CircuitBreaker(failure_threshold=3, timeout=60)
        self.error_log = []
        
    @with_retry(max_attempts=3, backoff_factor=2.0)
    def execute_with_recovery(self, method_name: str, *args, **kwargs) -> Dict[str, Any]:
        try:
            method = getattr(self.base_agent, method_name)
            result = self.circuit_breaker.call(method, *args, **kwargs)
            return {
                "status": "success",
                "result": result,
                "agent": self.base_agent.name
            }
        except Exception as e:
            self.error_log.append({
                "timestamp": datetime.now().isoformat(),
                "agent": self.base_agent.name,
                "method": method_name,
                "error": str(e)
            })
            return {
                "status": "degraded",
                "result": self._graceful_degradation(method_name),
                "agent": self.base_agent.name,
                "error": str(e)
            }
            
    def _graceful_degradation(self, method_name: str) -> Dict[str, Any]:
        return {
            "degraded": True,
            "message": f"Service degraded - using default recommendation",
            "fallback_recommendation": "hybrid"  # Default to hybrid as safest option
        }
        
    def get_error_summary(self) -> Dict[str, Any]:
        return {
            "total_errors": len(self.error_log),
            "circuit_state": self.circuit_breaker.state.value,
            "recent_errors": self.error_log[-5:] if self.error_log else []
        }

print("✓ Error recovery mechanisms defined")

In [None]:
# Test error recovery

info_agent = InfoCollectionAgent()
resilient_agent = ResilientEvaluationAgent(info_agent)

print("=== Testing Error Recovery ===")

print("\n1. Successful execution:")
result = resilient_agent.execute_with_recovery("collect_and_structure", sample_use_cases[0])
print(f"   Status: {result['status']}")
print(f"   Agent: {result['agent']}")

print("\n2. Error summary:")
error_summary = resilient_agent.get_error_summary()
print(f"   Total Errors: {error_summary['total_errors']}")
print(f"   Circuit State: {error_summary['circuit_state']}")

## Part 9: Complete End-to-End System (Lab 11)

Integrate all components into a production-ready evaluation system.

In [None]:
# Complete GenAI Evaluation System

class GenAIEvaluationSystem:
    """Production-ready GenAI use-case evaluation system."""
    
    def __init__(self):
        # Initialize agents with resilience
        self.info_agent = ResilientEvaluationAgent(InfoCollectionAgent())
        self.eval_agent = ResilientEvaluationAgent(EvaluationAgent())
        self.report_agent = ResilientEvaluationAgent(ReportGenerationAgent())
        
        # Initialize supporting components
        self.validator = RuleBasedEvaluationValidator()
        self.planner = HierarchicalEvaluationPlanner()
        
        # System state
        self.processing_log = []
        
    def evaluate_use_case(self, use_case: Dict) -> Dict[str, Any]:
        """Evaluate a use case end-to-end."""
        start_time = time.time()
        self.processing_log = []
        
        try:
            # Step 0: Create execution plan
            self._log(f"Creating execution plan for use case: {use_case.get('title', 'unknown')}")
            execution_plan = self.planner.create_execution_plan()
            
            # Step 1: Collect and structure information
            self._log("Collecting and structuring use case information")
            info_result = self.info_agent.execute_with_recovery("collect_and_structure", use_case)
            if info_result['status'] != "success":
                return self._create_error_response(use_case, "Info collection failed", info_result)
            structured_summary = info_result['result']['structured_summary']
            
            # Step 1b: Search similar cases
            self._log("Searching for similar past evaluations")
            similar_cases = search_similar_cases(use_case.get('title', ''), use_case.get('description', ''))
            
            # Step 2: Apply decision tree
            self._log("Applying decision tree and generating verdict")
            eval_result = self.eval_agent.execute_with_recovery("evaluate_and_recommend", structured_summary)
            if eval_result['status'] != "success":
                return self._create_error_response(use_case, "Evaluation failed", eval_result)
            evaluation = eval_result['result']['evaluation']
            
            # Step 3: Generate report
            self._log("Generating recommendation report")
            report_result = self.report_agent.execute_with_recovery("generate_report", structured_summary, evaluation)
            if report_result['status'] != "success":
                return self._create_error_response(use_case, "Report generation failed", report_result)
            report = report_result['result']['report']
            
            # Step 4: Validate evaluation
            self._log("Running validation checks")
            validation = self.validator.validate_all(use_case, structured_summary, evaluation, report)
            
            # Step 5: Save to knowledge base
            self._log("Saving evaluation to knowledge base")
            save_result = save_evaluation_to_kb(report, structured_summary)
            
            # Step 6: Notify stakeholders
            self._log("Notifying stakeholders")
            notification = notify_stakeholders(
                use_case.get('use_case_id', 'unknown'),
                evaluation['recommendation'],
                use_case.get('submitted_by', 'unknown')
            )
            
            processing_time = time.time() - start_time
            
            return {
                "status": "SUCCESS",
                "use_case_id": use_case.get('use_case_id'),
                "processing_time_seconds": round(processing_time, 2),
                "execution_plan": execution_plan,
                "structured_summary": structured_summary,
                "similar_cases": similar_cases,
                "evaluation": evaluation,
                "report": report,
                "validation": validation,
                "actions_taken": {
                    "saved_to_kb": save_result,
                    "notification": notification
                },
                "processing_log": self.processing_log
            }
            
        except Exception as e:
            return self._create_error_response(use_case, f"System error: {str(e)}", {"exception": str(e)})
            
    def _log(self, message: str):
        log_entry = {
            "timestamp": datetime.now().isoformat(),
            "message": message
        }
        self.processing_log.append(log_entry)
        print(f"[{datetime.now().strftime('%H:%M:%S')}] {message}")
        
    def _create_error_response(self, use_case: Dict, error_message: str, details: Dict) -> Dict:
        return {
            "status": "ERROR",
            "use_case_id": use_case.get('use_case_id', 'unknown'),
            "error_message": error_message,
            "error_details": details,
            "processing_log": self.processing_log
        }
        
    def get_system_health(self) -> Dict[str, Any]:
        return {
            "agents": {
                "info_agent": self.info_agent.get_error_summary(),
                "eval_agent": self.eval_agent.get_error_summary(),
                "report_agent": self.report_agent.get_error_summary()
            },
            "knowledge_base_records": len(EVALUATION_DATABASE),
            "system_status": "OPERATIONAL"
        }

print("✓ Complete GenAI Evaluation System defined")

In [None]:
# Execute end-to-end evaluation for all sample use cases

print("=" * 80)
print("COMPLETE END-TO-END GENAI USE-CASE EVALUATION")
print("=" * 80)

# Initialize system
evaluation_system = GenAIEvaluationSystem()

# Process all sample use cases
final_results = []

for use_case in sample_use_cases:
    print(f"\n{'='*60}")
    print(f"EVALUATING: {use_case['title']}")
    print(f"{'='*60}")
    
    result = evaluation_system.evaluate_use_case(use_case)
    final_results.append(result)
    
    print(f"\n--- RESULT ---")
    print(f"Status: {result['status']}")
    print(f"Processing Time: {result['processing_time_seconds']}s")
    
    if result['status'] == "SUCCESS":
        print(f"Recommendation: {result['report']['recommendation']['approach'].upper()}")
        print(f"Confidence: {result['report']['recommendation']['confidence']}%")
        print(f"KB Record: {result['actions_taken']['saved_to_kb']['record_id']}")
        
        print(f"\nValidation: {result['validation']['validation_summary']['overall_status']}")
        print(f"  Rules Passed: {result['validation']['validation_summary']['passed']}/{result['validation']['validation_summary']['total_rules']}")

print("\n" + "=" * 80)
print("EVALUATION SUMMARY")
print("=" * 80)

print("\nRecommendations:")
for i, result in enumerate(final_results):
    if result['status'] == "SUCCESS":
        rec = result['report']['recommendation']['approach'].upper()
        conf = result['report']['recommendation']['confidence']
        print(f"  {sample_use_cases[i]['title']}: {rec} ({conf}% confidence)")

print("\nSystem Health:")
health = evaluation_system.get_system_health()
print(f"  Status: {health['system_status']}")
print(f"  Knowledge Base Records: {health['knowledge_base_records']}")

## Summary & Key Takeaways

This notebook demonstrated a comprehensive agentic AI implementation for GenAI use-case evaluation, applying all 12 lab techniques from the GAI-3101 course:

### Implementation Highlights

1. **Simple Python Agents (Lab 1)**: Created 3 specialized agents for the evaluation workflow
2. **Multi-Agent Communication (Lab 2)**: Orchestrated agents using AutoGen round-robin chat
3. **Deliberative Agents (Lab 4)**: Implemented LangGraph StateGraph for workflow orchestration
4. **Observation Tools (Lab 6)**: Built tools for knowledge base search, criteria retrieval, compliance lookup
5. **Action Tools (Lab 7)**: Implemented KB persistence, notifications, report export
6. **Hierarchical Planning (Lab 8)**: Decomposed evaluation into 10 primitive tasks
7. **Rule-Based Reasoning (Lab 9)**: Created 6 deterministic validation rules
8. **Error Recovery (Lab 12)**: Added retry logic, circuit breakers, and graceful degradation
9. **End-to-End System (Lab 11)**: Integrated all components into `GenAIEvaluationSystem`

### Business Impact

**Per Evaluation:**
- Lead time: 13.75 hours → 4 hours (71% reduction)
- Manual effort: 2.75 hours → 1 hour (64% reduction)
- Time savings: 1.75 hours per use case

**Annual (40 evaluations/month):**
- Hours saved: 840 hours/year
- Cost savings: $67,200/year
- Year 1 net benefit: $7,200
- Year 2+ benefit: $47,200/year

### Decision Tree Summary

The evaluation follows this decision tree:
1. **Clear deterministic rules?** → DETERMINISTIC
2. **Natural language output?** → GENAI
3. **Unstructured data analysis?** → GENAI
4. **Mixed requirements?** → HYBRID

### Production Considerations

1. Replace simulated KB with actual vector database (Pinecone, Weaviate)
2. Integrate with ticketing/governance systems
3. Add user interface for use case submission
4. Implement feedback loop for recommendation accuracy tracking
5. Set up monitoring and alerting
6. Train teams on self-service evaluation

---

**Notebook Version:** 1.0  
**Last Updated:** November 2025  
**Course:** GAI-3101 Custom Agentic AI Solutions