# Advanced Prompt Engineering Techniques

**Interactive Notebook** - Section 13: Prompt Engineering and Advanced Techniques

This notebook provides hands-on experience with cutting-edge prompt engineering techniques including chain-of-thought reasoning, tree-of-thoughts exploration, ReAct patterns, and enterprise-grade optimization strategies.

## 🎯 Learning Objectives

By the end of this notebook, you will be able to:
- Implement chain-of-thought reasoning for complex problems
- Use tree-of-thoughts for multi-path exploration
- Build ReAct agents that can use external tools
- Optimize prompts using systematic evaluation
- Deploy enterprise-grade prompt engineering systems

## 📋 Prerequisites

- Python 3.10+
- Basic understanding of LLM APIs
- Familiarity with prompt engineering concepts
- Access to OpenAI or Anthropic APIs (or local models)

**Estimated Time**: 2-3 hours

## 🔧 Setup and Installation

Let's start by installing the required dependencies and setting up our environment.

In [None]:
# Install required packages
!pip install -q openai anthropic pandas numpy matplotlib seaborn plotly ipywidgets tqdm

# Import necessary libraries
import os
import json
import time
import asyncio
from typing import Dict, List, Any, Optional
from dataclasses import dataclass
from datetime import datetime
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import ipywidgets as widgets
from IPython.display import display, HTML, Markdown

# Set up plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

# Configure pandas display
pd.set_option('display.max_colwidth', None)
pd.set_option('display.expand_frame_repr', False)

## ⚙️ Configuration

Set up your API keys and configuration parameters. You can either set environment variables or enter them directly in the cells below.

In [None]:
# API Configuration - Replace with your actual API keys
OPENAI_API_KEY = "sk-your-openai-key-here"  # Replace with your OpenAI API key
ANTHROPIC_API_KEY = "sk-ant-your-anthropic-key-here"  # Replace with your Anthropic API key

# Model Configuration
DEFAULT_MODEL = "gpt-3.5-turbo"  # Fallback to 3.5-turbo for cost efficiency
MAX_TOKENS = 2000
TEMPERATURE = 0.7

# Display configuration status
api_status = {
    "OpenAI": "✅ Configured" if OPENAI_API_KEY != "sk-your-openai-key-here" else "❌ Not Configured",
    "Anthropic": "✅ Configured" if ANTHROPIC_API_KEY != "sk-ant-your-anthropic-key-here" else "❌ Not Configured"
}

display(Markdown("### API Configuration Status"))
for api, status in api_status.items():
    display(Markdown(f"- **{api}**: {status}"))

if not any("✅" in status for status in api_status.values()):
    display(Markdown("⚠️ **Warning**: No API keys configured. Some examples will use mock responses."))

## 🏗️ Core Classes and Utilities

Let's define the core classes for our advanced prompt engineering system.

In [None]:
@dataclass
class PromptConfig:
    """Configuration for prompt generation"""
    model: str = DEFAULT_MODEL
    max_tokens: int = MAX_TOKENS
    temperature: float = TEMPERATURE
    technique: str = "standard"
    show_reasoning: bool = True

@dataclass
class TaskDescription:
    """Description of the task to be performed"""
    description: str
    domain: str = "general"
    complexity: str = "medium"
    output_format: str = "text"

@dataclass
class PromptResult:
    """Result of prompt generation and execution"""
    prompt: str
    response: str
    technique: str
    execution_time: float
    token_count: int
    cost: float
    quality_score: float = 0.0

@dataclass
class PerformanceMetrics:
    """Performance metrics for prompt evaluation"""
    accuracy: float
    completeness: float
    coherence: float
    relevance: float
    efficiency: float
    overall_score: float

In [None]:
class MockLLMClient:
    """Mock LLM client for demonstration purposes"""
    
    def __init__(self):
        self.responses = {
            "chain_of_thought": self._generate_cot_response,
            "tree_of_thoughts": self._generate_tot_response,
            "react": self._generate_react_response,
            "standard": self._generate_standard_response
        }
    
    async def generate_response(self, prompt: str, technique: str = "standard") -> str:
        """Generate a mock response based on the technique"""
        await asyncio.sleep(0.5)  # Simulate API latency
        
        if technique in self.responses:
            return self.responses[technique](prompt)
        else:
            return self._generate_standard_response(prompt)
    
    def _generate_cot_response(self, prompt: str) -> str:
        """Generate chain-of-thought response"""
        return f"""
Let me think through this step by step:

Step 1: First, I need to understand what the prompt is asking for.
The prompt appears to be requesting analysis of: "{prompt[:100]}..."

Step 2: Breaking down the key components:
- Main task: Analysis and reasoning
- Domain: General problem-solving
- Expected output: Structured response

Step 3: Applying logical reasoning:
Based on the input, I can identify several key insights:
1. The problem requires systematic analysis
2. Multiple perspectives should be considered
3. A structured approach will yield the best results

Step 4: Formulating the final answer:
After careful consideration, here's my comprehensive analysis...
"""
    
    def _generate_tot_response(self, prompt: str) -> str:
        """Generate tree-of-thoughts response"""
        return f"""
Exploring multiple reasoning paths for: "{prompt[:100]}..."

Path 1: Analytical Approach
- Break down the problem systematically
- Use logical deduction
- Consider all variables
Evaluation: High accuracy, moderate speed

Path 2: Creative Approach
- Think outside conventional boundaries
- Consider innovative solutions
- Explore unconventional angles
Evaluation: High creativity, variable reliability

Path 3: Practical Approach
- Focus on actionable steps
- Consider real-world constraints
- Prioritize implementable solutions
Evaluation: High practicality, immediate applicability

Selected Path: Analytical Approach (best balance of accuracy and completeness)

Final analysis based on selected path...
"""
    
    def _generate_react_response(self, prompt: str) -> str:
        """Generate ReAct (Reasoning + Acting) response"""
        return f"""
Thought: I need to analyze this prompt and determine what external tools or information might be helpful.

The prompt: "{prompt[:100]}..." requires analysis and reasoning.

Action: I should search for relevant information and tools that can help with this analysis.

Observation: I have access to analysis tools and can reason through the problem systematically.

Thought: Based on the available tools and information, I can approach this by:
1. Breaking down the problem into manageable components
2. Applying analytical reasoning
3. Considering multiple perspectives
4. Synthesizing a comprehensive answer

Action: Proceeding with systematic analysis using available reasoning capabilities.

Observation: The analysis is proceeding well and I'm generating structured insights.

Thought: I have sufficient information to provide a comprehensive response.

Final Answer: Here's my analysis based on the reasoning process...
"""
    
    def _generate_standard_response(self, prompt: str) -> str:
        """Generate standard response"""
        return f"""
Based on the input "{prompt[:100]}...", I can provide the following analysis:

This appears to be a request for analysis and reasoning. The key elements I can identify are:

1. The prompt requires thoughtful consideration
2. Multiple aspects need to be addressed
3. A structured response would be most helpful

After considering the request, here's my response:

[Comprehensive analysis would go here in a real implementation]
"""

# Initialize mock client for demonstration
mock_client = MockLLMClient()

## 🧠 Technique 1: Chain-of-Thought (CoT) Reasoning

Chain-of-Thought prompting encourages models to show their reasoning process step-by-step, leading to improved performance on complex reasoning tasks.

In [None]:
class ChainOfThoughtEngine:
    """Advanced Chain-of-Thought reasoning engine"""
    
    def __init__(self, llm_client):
        self.llm_client = llm_client
        self.templates = {
            'math': self._math_template,
            'logic': self._logic_template,
            'analysis': self._analysis_template,
            'creative': self._creative_template
        }
    
    def generate_cot_prompt(self, task: TaskDescription, context: Dict = None) -> str:
        """Generate a Chain-of-Thought enhanced prompt"""
        
        base_template = self.templates.get(task.domain, self.templates['analysis'])
        
        cot_enhancement = """

IMPORTANT: Please solve this step-by-step, showing your reasoning at each stage:

1. First, identify and analyze the key components of the problem
2. Break down complex elements into manageable parts
3. Solve each part systematically
4. Combine the results to reach the final answer
5. Verify your solution by checking it against the original problem

Show all intermediate steps and calculations. Your reasoning should be clear and logical.
"""
        
        return base_template.format(task=task, **(context or {})) + cot_enhancement
    
    async def execute_cot_reasoning(self, task: TaskDescription, context: Dict = None) -> PromptResult:
        """Execute Chain-of-Thought reasoning"""
        
        start_time = time.time()
        
        # Generate CoT-enhanced prompt
        cot_prompt = self.generate_cot_prompt(task, context)
        
        # Get response from LLM
        response = await self.llm_client.generate_response(cot_prompt, "chain_of_thought")
        
        execution_time = time.time() - start_time
        
        return PromptResult(
            prompt=cot_prompt,
            response=response,
            technique="chain_of_thought",
            execution_time=execution_time,
            token_count=len(cot_prompt.split()) + len(response.split()),
            cost=self._calculate_cost(cot_prompt, response),
            quality_score=self._estimate_quality(response)
        )
    
    def _math_template(self, task, **kwargs) -> str:
        return f"""
Mathematical Problem: {task.description}

Please solve this mathematical problem step-by-step:
"""
    
    def _logic_template(self, task, **kwargs) -> str:
        return f"""
Logical Reasoning Problem: {task.description}

Please analyze this logical reasoning problem step-by-step:
"""
    
    def _analysis_template(self, task, **kwargs) -> str:
        return f"""
Analysis Task: {task.description}

Domain: {task.domain}
Complexity: {task.complexity}

Please provide a step-by-step analysis:
"""
    
    def _creative_template(self, task, **kwargs) -> str:
        return f"""
Creative Task: {task.description}

Please approach this creative task with step-by-step reasoning:
"""
    
    def _calculate_cost(self, prompt: str, response: str) -> float:
        """Calculate approximate cost"""
        # Simplified cost calculation
        total_tokens = len(prompt.split()) + len(response.split())
        return total_tokens * 0.00002  # $0.02 per 1K tokens
    
    def _estimate_quality(self, response: str) -> float:
        """Estimate response quality based on characteristics"""
        # Look for step indicators
        step_indicators = ['step', 'first', 'second', 'then', 'finally', 'therefore']
        step_score = sum(1 for indicator in step_indicators if indicator.lower() in response.lower())
        
        # Look for logical connectors
        logical_connectors = ['because', 'since', 'therefore', 'however', 'moreover']
        logic_score = sum(1 for connector in logical_connectors if connector.lower() in response.lower())
        
        # Normalize to 0-1 scale
        return min(1.0, (step_score + logic_score) / 10.0)

# Initialize CoT engine
cot_engine = ChainOfThoughtEngine(mock_client)

### 🎮 Interactive CoT Demonstration

Let's test Chain-of-Thought reasoning with different types of problems. Use the widgets below to experiment:

In [None]:
# Create interactive widgets for CoT demonstration
task_dropdown = widgets.Dropdown(
    options=[
        ('Mathematical Problem', 'math'),
        ('Logical Reasoning', 'logic'), 
        ('Data Analysis', 'analysis'),
        ('Creative Problem', 'creative')
    ],
    value='analysis',
    description='Task Type:'
)

complexity_slider = widgets.IntSlider(
    value=3,
    min=1,
    max=5,
    step=1,
    description='Complexity:',
    style={'description_width': 'initial'}
)

task_input = widgets.Textarea(
    value='A company has 150 employees. If 20% are in sales, 30% in engineering, and the rest in other departments, how many employees are in other departments?',
    placeholder='Enter your task or problem...',
    layout=widgets.Layout(width='100%', height='100px')
)

execute_button = widgets.Button(
    description='Execute CoT Reasoning',
    button_style='success',
    layout=widgets.Layout(width='200px')
)

output_area = widgets.Output()

async def execute_cot_demo(b):
    with output_area:
        output_area.clear_output()
        
        display(Markdown("### 🔄 Executing Chain-of-Thought Reasoning..."))
        
        # Create task description
        task = TaskDescription(
            description=task_input.value,
            domain=task_dropdown.value,
            complexity=['low', 'medium', 'high', 'very high', 'expert'][complexity_slider.value - 1]
        )
        
        # Execute CoT reasoning
        result = await cot_engine.execute_cot_reasoning(task)
        
        # Display results
        display(Markdown(f"""### 📊 Results

**Technique**: Chain-of-Thought Reasoning  
**Domain**: {task.domain.title()}  
**Complexity**: {task.complexity.title()}  
**Execution Time**: {result.execution_time:.2f}s  
**Estimated Cost**: ${result.cost:.4f}  
**Quality Score**: {result.quality_score:.2f}/1.0

### 🤖 Generated Response
```markdown
{result.response}
```
"""))

execute_button.on_click(execute_cot_demo)

# Display widgets
display(Markdown("## 🎛️ Chain-of-Thought Configuration"))
display(widgets.VBox([task_dropdown, complexity_slider, task_input, execute_button]))
display(output_area)

## 🌳 Technique 2: Tree-of-Thoughts (ToT) Exploration

Tree-of-Thoughts extends CoT by exploring multiple reasoning paths simultaneously and selecting the most promising one.

In [None]:
class TreeOfThoughtsEngine:
    """Tree-of-Thoughts exploration engine"""
    
    def __init__(self, llm_client):
        self.llm_client = llm_client
        self.max_branches = 3
        self.max_depth = 2
    
    def generate_tot_prompt(self, task: TaskDescription) -> str:
        """Generate a Tree-of-Thoughts enhanced prompt"""
        
        return f"""
Complex Problem: {task.description}

Domain: {task.domain}
Complexity: {task.complexity}

TREE OF THOUGHTS APPROACH:
Please explore multiple solution paths simultaneously:

1. Generate {self.max_branches} different initial approaches to this problem
2. For each approach:
   a. Evaluate its potential effectiveness
   b. Identify pros and cons
   c. Estimate success probability (0-100%)
   d. Consider required resources and constraints
3. Select the most promising approach(es)
4. Develop the selected approach(es) further with sub-steps
5. Compare final solutions and select the best one

Document your reasoning tree structure clearly with branching points and decision criteria.

Format your response as:
- **Path 1**: [Approach name]
  - Evaluation: [Effectiveness assessment]
  - Pros: [Advantages]
  - Cons: [Disadvantages]
  - Success Probability: [0-100%]
  
- **Selected Path**: [Best approach]
- **Final Solution**: [Comprehensive answer based on selected path]
"""
    
    async def explore_thoughts(self, task: TaskDescription) -> PromptResult:
        """Execute Tree-of-Thoughts exploration"""
        
        start_time = time.time()
        
        # Generate ToT-enhanced prompt
        tot_prompt = self.generate_tot_prompt(task)
        
        # Get response from LLM
        response = await self.llm_client.generate_response(tot_prompt, "tree_of_thoughts")
        
        execution_time = time.time() - start_time
        
        return PromptResult(
            prompt=tot_prompt,
            response=response,
            technique="tree_of_thoughts",
            execution_time=execution_time,
            token_count=len(tot_prompt.split()) + len(response.split()),
            cost=self._calculate_cost(tot_prompt, response),
            quality_score=self._estimate_tot_quality(response)
        )
    
    def _calculate_cost(self, prompt: str, response: str) -> float:
        """Calculate approximate cost"""
        total_tokens = len(prompt.split()) + len(response.split())
        return total_tokens * 0.00002
    
    def _estimate_tot_quality(self, response: str) -> float:
        """Estimate ToT response quality"""
        # Look for path exploration indicators
        path_indicators = ['path', 'approach', 'branch', 'option']
        path_score = sum(1 for indicator in path_indicators if indicator.lower() in response.lower())
        
        # Look for evaluation criteria
        eval_indicators = ['evaluate', 'probability', 'pros', 'cons', 'effectiveness']
        eval_score = sum(1 for indicator in eval_indicators if indicator.lower() in response.lower())
        
        # Look for selection logic
        select_indicators = ['selected', 'best', 'chosen', 'final']
        select_score = sum(1 for indicator in select_indicators if indicator.lower() in response.lower())
        
        # Normalize to 0-1 scale
        return min(1.0, (path_score + eval_score + select_score) / 15.0)

# Initialize ToT engine
tot_engine = TreeOfThoughtsEngine(mock_client)

### 📊 CoT vs ToT Comparison

Let's compare Chain-of-Thought and Tree-of-Thoughts approaches on the same problem:

In [None]:
comparison_task = TaskDescription(
    description="Design a sustainable urban transportation system for a city of 1 million people",
    domain="analysis",
    complexity="high"
)

async def run_comparison():
    display(Markdown("### 🔄 Running CoT vs ToT Comparison..."))
    
    # Execute both techniques
    cot_result = await cot_engine.execute_cot_reasoning(comparison_task)
    tot_result = await tot_engine.explore_thoughts(comparison_task)
    
    # Create comparison DataFrame
    comparison_data = {
        'Technique': ['Chain-of-Thought', 'Tree-of-Thoughts'],
        'Execution Time (s)': [cot_result.execution_time, tot_result.execution_time],
        'Quality Score': [cot_result.quality_score, tot_result.quality_score],
        'Token Count': [cot_result.token_count, tot_result.token_count],
        'Estimated Cost ($)': [cot_result.cost, tot_result.cost]
    }
    
    df_comparison = pd.DataFrame(comparison_data)
    
    # Display comparison
    display(Markdown("### 📈 Performance Comparison"))
    display(df_comparison.style.background_gradient(cmap='Blues', subset=['Quality Score'])
                   .format({'Execution Time (s)': '{:.2f}', 'Estimated Cost ($)': '${:.4f}'}))
    
    # Create visual comparison
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=('Execution Time', 'Quality Score', 'Token Count', 'Cost'),
        specs=[[{"secondary_y": False}, {"secondary_y": False}],
               [{"secondary_y": False}, {"secondary_y": False}]]
    )
    
    # Execution Time
    fig.add_trace(
        go.Bar(x=['CoT', 'ToT'], y=[cot_result.execution_time, tot_result.execution_time],
               name='Execution Time', marker_color='lightblue'),
        row=1, col=1
    )
    
    # Quality Score
    fig.add_trace(
        go.Bar(x=['CoT', 'ToT'], y=[cot_result.quality_score, tot_result.quality_score],
               name='Quality Score', marker_color='lightgreen'),
        row=1, col=2
    )
    
    # Token Count
    fig.add_trace(
        go.Bar(x=['CoT', 'ToT'], y=[cot_result.token_count, tot_result.token_count],
               name='Token Count', marker_color='lightcoral'),
        row=2, col=1
    )
    
    # Cost
    fig.add_trace(
        go.Bar(x=['CoT', 'ToT'], y=[cot_result.cost, tot_result.cost],
               name='Cost', marker_color='lightyellow'),
        row=2, col=2
    )
    
    fig.update_layout(
        title_text="Chain-of-Thought vs Tree-of-Thoughts Comparison",
        showlegend=False,
        height=600
    )
    
    fig.show()
    
    # Show sample responses
    display(Markdown("### 📝 Sample Responses"))
    
    display(Markdown("**Chain-of-Thought Response (excerpt):**"))
    display(Markdown(f"```markdown\n{cot_result.response[:500]}...\n```"))
    
    display(Markdown("**Tree-of-Thoughts Response (excerpt):**"))
    display(Markdown(f"```markdown\n{tot_result.response[:500]}...\n```"))

# Run the comparison
await run_comparison()

## 🤖 Technique 3: ReAct (Reasoning + Acting)

ReAct combines reasoning with action-taking capabilities, allowing AI systems to interact with external tools and APIs.

In [None]:
class ReActEngine:
    """ReAct (Reasoning + Acting) engine"""
    
    def __init__(self, llm_client):
        self.llm_client = llm_client
        self.tools = {
            'search': self._search_tool,
            'calculate': self._calculate_tool,
            'analyze': self._analyze_tool,
            'summarize': self._summarize_tool
        }
    
    def generate_react_prompt(self, task: TaskDescription, available_tools: List[str] = None) -> str:
        """Generate a ReAct-enhanced prompt"""
        
        tools_desc = self._get_tools_description(available_tools or list(self.tools.keys()))
        
        return f"""
Task: {task.description}

Domain: {task.domain}
Complexity: {task.complexity}

REACT (Reasoning + Acting) Approach:
You are an AI assistant that can reason and take actions. For each step, first think about what you need to do, then take the appropriate action.

Available Tools:
{tools_desc}

Instructions:
1. Start by analyzing the problem and determining what information or actions you need
2. Use the Thought-Action-Observation cycle:
   - Thought: What do I need to do next?
   - Action: Which tool should I use? (specify tool name and parameters)
   - Observation: What did I learn from the action?
3. Continue this cycle until you have enough information to provide a comprehensive answer
4. Provide your final answer based on all observations

Format each step as:
Thought: [your reasoning]
Action: [tool_name]([parameters])
Observation: [result of action]

Important: Be specific about what you're looking for in each action.
"""
    
    async def execute_react_reasoning(self, task: TaskDescription, max_steps: int = 5) -> PromptResult:
        """Execute ReAct reasoning"""
        
        start_time = time.time()
        
        # Generate ReAct-enhanced prompt
        react_prompt = self.generate_react_prompt(task)
        
        # Get response from LLM
        response = await self.llm_client.generate_response(react_prompt, "react")
        
        execution_time = time.time() - start_time
        
        return PromptResult(
            prompt=react_prompt,
            response=response,
            technique="react",
            execution_time=execution_time,
            token_count=len(react_prompt.split()) + len(response.split()),
            cost=self._calculate_cost(react_prompt, response),
            quality_score=self._estimate_react_quality(response)
        )
    
    def _get_tools_description(self, tools: List[str]) -> str:
        """Get description of available tools"""
        descriptions = {
            'search': 'Search for information (query: str) -> results: str',
            'calculate': 'Perform mathematical calculations (expression: str) -> result: float',
            'analyze': 'Analyze data or text (data: str, analysis_type: str) -> insights: str',
            'summarize': 'Summarize text content (text: str, max_length: int) -> summary: str'
        }
        
        return '\n'.join(f"- {tool}: {descriptions.get(tool, 'No description')}" for tool in tools)
    
    def _search_tool(self, query: str) -> str:
        """Mock search tool"""
        return f"Search results for '{query}': Found relevant information including key concepts and examples."
    
    def _calculate_tool(self, expression: str) -> float:
        """Mock calculation tool"""
        try:
            # Simple expression evaluation for demonstration
            if '+' in expression:
                return sum(float(x) for x in expression.split('+'))
            elif '*' in expression:
                result = 1
                for x in expression.split('*'):
                    result *= float(x)
                return result
            else:
                return float(expression)
        except:
            return 0.0
    
    def _analyze_tool(self, data: str, analysis_type: str) -> str:
        """Mock analysis tool"""
        return f"Analysis of '{data[:50]}...' using {analysis_type}: Identified patterns and insights."
    
    def _summarize_tool(self, text: str, max_length: int) -> str:
        """Mock summarization tool"""
        return f"Summary of text (max {max_length} chars): {text[:max_length]}..."
    
    def _calculate_cost(self, prompt: str, response: str) -> float:
        """Calculate approximate cost"""
        total_tokens = len(prompt.split()) + len(response.split())
        return total_tokens * 0.00002
    
    def _estimate_react_quality(self, response: str) -> float:
        """Estimate ReAct response quality"""
        # Look for ReAct pattern indicators
        react_indicators = ['thought:', 'action:', 'observation:']
        react_score = sum(1 for indicator in react_indicators if indicator.lower() in response.lower())
        
        # Look for tool usage
        tool_indicators = ['search', 'calculate', 'analyze', 'summarize']
        tool_score = sum(1 for indicator in tool_indicators if indicator.lower() in response.lower())
        
        # Normalize to 0-1 scale
        return min(1.0, (react_score + tool_score) / 10.0)

# Initialize ReAct engine
react_engine = ReActEngine(mock_client)

### 🎮 Interactive ReAct Demonstration

Test the ReAct framework with different tool combinations:

In [None]:
# Create interactive widgets for ReAct demonstration
tool_checkboxes = widgets.SelectMultiple(
    options=['search', 'calculate', 'analyze', 'summarize'],
    value=['search', 'analyze'],
    description='Available Tools:',
    disabled=False
)

react_task_input = widgets.Textarea(
    value='What are the key factors affecting employee productivity in remote work environments?',
    placeholder='Enter a task that would benefit from tool usage...',
    layout=widgets.Layout(width='100%', height='80px')
)

max_steps_slider = widgets.IntSlider(
    value=3,
    min=1,
    max=5,
    step=1,
    description='Max Steps:',
    style={'description_width': 'initial'}
)

react_execute_button = widgets.Button(
    description='Execute ReAct Reasoning',
    button_style='info',
    layout=widgets.Layout(width='200px')
)

react_output_area = widgets.Output()

async def execute_react_demo(b):
    with react_output_area:
        react_output_area.clear_output()
        
        display(Markdown("### 🔄 Executing ReAct Reasoning..."))
        
        # Create task description
        task = TaskDescription(
            description=react_task_input.value,
            domain="analysis",
            complexity="medium"
        )
        
        # Execute ReAct reasoning
        result = await react_engine.execute_react_reasoning(task, max_steps_slider.value)
        
        # Display results
        display(Markdown(f"""### 📊 ReAct Results

**Available Tools**: {', '.join(tool_checkboxes.value)}  
**Max Steps**: {max_steps_slider.value}  
**Execution Time**: {result.execution_time:.2f}s  
**Estimated Cost**: ${result.cost:.4f}  
**Quality Score**: {result.quality_score:.2f}/1.0

### 🤖 ReAct Response
```markdown
{result.response}
```
"""))

react_execute_button.on_click(execute_react_demo)

# Display widgets
display(Markdown("## 🎛️ ReAct Configuration"))
display(widgets.VBox([tool_checkboxes, max_steps_slider, react_task_input, react_execute_button]))
display(react_output_area)

## 📈 Performance Analysis & Optimization

Let's analyze the performance of different prompt engineering techniques and identify optimization opportunities.

In [None]:
async def benchmark_techniques():
    """Benchmark different prompt engineering techniques"""
    
    # Test tasks
    test_tasks = [
        TaskDescription("Calculate the total cost of 150 items at $23.50 each with 8% tax", "math", "medium"),
        TaskDescription("Analyze the pros and cons of remote work policies", "analysis", "medium"),
        TaskDescription("Design a sustainable urban garden", "creative", "medium"),
        TaskDescription("Debug why a Python program is crashing", "logic", "high"),
    ]
    
    techniques = [
        ("Standard", lambda task: mock_client.generate_response(task.description, "standard")),
        ("Chain-of-Thought", lambda task: cot_engine.execute_cot_reasoning(task)),
        ("Tree-of-Thoughts", lambda task: tot_engine.explore_thoughts(task)),
        ("ReAct", lambda task: react_engine.execute_react_reasoning(task)),
    ]
    
    results = []
    
    display(Markdown("### 🔄 Running Performance Benchmark..."))
    
    for task in test_tasks:
        display(Markdown(f"**Testing**: {task.description[:50]}..."))
        
        for technique_name, technique_func in techniques:
            try:
                if technique_name == "Standard":
                    start_time = time.time()
                    response = await technique_func(task)
                    execution_time = time.time() - start_time
                    
                    result = PromptResult(
                        prompt=task.description,
                        response=response,
                        technique=technique_name.lower(),
                        execution_time=execution_time,
                        token_count=len(task.description) + len(response),
                        cost=execution_time * 0.001,
                        quality_score=0.7  # Mock quality score
                    )
                else:
                    result = await technique_func(task)
                
                results.append({
                    'task': task.description[:50] + '...',
                    'technique': technique_name,
                    'execution_time': result.execution_time,
                    'quality_score': result.quality_score,
                    'token_count': result.token_count,
                    'cost': result.cost
                })
                
            except Exception as e:
                print(f"Error with {technique_name}: {e}")
    
    return pd.DataFrame(results)

# Run benchmark
benchmark_df = await benchmark_techniques()

# Display comprehensive results
display(Markdown("### 📊 Comprehensive Performance Benchmark"))
display(benchmark_df.style.background_gradient(cmap='RdYlGn', subset=['quality_score'])
               .format({'execution_time': '{:.3f}s', 'cost': '${:.4f}'}))

# Create visualizations
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=('Execution Time by Technique', 'Quality Score by Technique', 
                   'Cost vs Quality', 'Token Efficiency'),
    specs=[[{"secondary_y": False}, {"secondary_y": False}],
           [{"secondary_y": False}, {"secondary_y": False}]]
)

# Execution Time by Technique
time_data = benchmark_df.groupby('technique')['execution_time'].mean().reset_index()
fig.add_trace(
    go.Bar(x=time_data['technique'], y=time_data['execution_time'],
           name='Avg Execution Time', marker_color='lightblue'),
    row=1, col=1
)

# Quality Score by Technique
quality_data = benchmark_df.groupby('technique')['quality_score'].mean().reset_index()
fig.add_trace(
    go.Bar(x=quality_data['technique'], y=quality_data['quality_score'],
           name='Avg Quality Score', marker_color='lightgreen'),
    row=1, col=2
)

# Cost vs Quality Scatter
for technique in benchmark_df['technique'].unique():
    technique_data = benchmark_df[benchmark_df['technique'] == technique]
    fig.add_trace(
        go.Scatter(
            x=technique_data['cost'],
            y=technique_data['quality_score'],
            mode='markers',
            name=technique,
            marker=dict(size=8),
            showlegend=False
        ),
        row=2, col=1
    )

# Token Efficiency (Quality per Token)
benchmark_df['efficiency'] = benchmark_df['quality_score'] / benchmark_df['token_count']
efficiency_data = benchmark_df.groupby('technique')['efficiency'].mean().reset_index()
fig.add_trace(
    go.Bar(x=efficiency_data['technique'], y=efficiency_data['efficiency'],
           name='Token Efficiency', marker_color='lightcoral'),
    row=2, col=2
)

fig.update_layout(
    title_text="Prompt Engineering Techniques Performance Analysis",
    height=800,
    showlegend=True
)

fig.show()

## 🎯 Optimization Strategies

Based on our analysis, here are key optimization strategies for prompt engineering:

In [None]:
# Analyze optimization opportunities
display(Markdown("### 📈 Optimization Insights"))

# Calculate key metrics
technique_stats = benchmark_df.groupby('technique').agg({
    'execution_time': ['mean', 'std'],
    'quality_score': ['mean', 'std'],
    'cost': ['mean', 'std'],
    'token_count': ['mean', 'std']
}).round(3)

display(Markdown("#### Technique Performance Statistics"))
display(technique_stats)

# Optimization recommendations
optimization_tips = [
    {
        'technique': 'Chain-of-Thought',
        'best_for': 'Complex mathematical and logical problems',
        'optimization': 'Use for problems requiring step-by-step reasoning',
        'cost_tip': 'Moderate cost, high accuracy for structured problems'
    },
    {
        'technique': 'Tree-of-Thoughts',
        'best_for': 'Creative and multi-solution problems',
        'optimization': 'Ideal for exploring multiple approaches',
        'cost_tip': 'Higher cost but better for ambiguous problems'
    },
    {
        'technique': 'ReAct',
        'best_for': 'Problems requiring external information',
        'optimization': 'Use when tools and APIs can enhance reasoning',
        'cost_tip': 'Variable cost depending on tool usage'
    },
    {
        'technique': 'Standard',
        'best_for': 'Simple, straightforward tasks',
        'optimization': 'Most efficient for basic queries',
        'cost_tip': 'Lowest cost, sufficient for simple tasks'
    }
]

for tip in optimization_tips:
    display(Markdown(f"""#### {tip['technique']}
- **Best For**: {tip['best_for']}
- **Optimization**: {tip['optimization']}
- **Cost Tip**: {tip['cost_tip']}
"""))

## 🏭 Production Systems Integration

Let's explore how to integrate these techniques into production systems:

In [None]:
class ProductionPromptEngine:
    """Enterprise-grade prompt engineering system"""
    
    def __init__(self, llm_client):
        self.llm_client = llm_client
        self.cot_engine = ChainOfThoughtEngine(llm_client)
        self.tot_engine = TreeOfThoughtsEngine(llm_client)
        self.react_engine = ReActEngine(llm_client)
        
        # Performance tracking
        self.performance_history = []
        self.cache = {}
    
    async def optimize_and_execute(self, task: TaskDescription, 
                                  context: Dict = None,
                                  technique: str = "auto") -> PromptResult:
        """Automatically select and execute best technique"""
        
        # Technique selection logic
        if technique == "auto":
            technique = self._select_best_technique(task, context)
        
        # Execute with selected technique
        if technique == "chain_of_thought":
            return await self.cot_engine.execute_cot_reasoning(task, context)
        elif technique == "tree_of_thoughts":
            return await self.tot_engine.explore_thoughts(task)
        elif technique == "react":
            return await self.react_engine.execute_react_reasoning(task)
        else:
            # Standard response
            start_time = time.time()
            response = await self.llm_client.generate_response(task.description, "standard")
            execution_time = time.time() - start_time
            
            return PromptResult(
                prompt=task.description,
                response=response,
                technique="standard",
                execution_time=execution_time,
                token_count=len(task.description) + len(response),
                cost=execution_time * 0.001,
                quality_score=0.7
            )
    
    def _select_best_technique(self, task: TaskDescription, context: Dict) -> str:
        """Automatically select the best technique based on task characteristics"""
        
        # Simple heuristic-based selection
        if task.domain == "math":
            return "chain_of_thought"
        elif task.domain == "creative":
            return "tree_of_thoughts"
        elif task.complexity == "high":
            return "tree_of_thoughts"
        elif context and context.get("tools_available"):
            return "react"
        else:
            return "chain_of_thought"  # Default to CoT
    
    def get_performance_dashboard(self) -> Dict:
        """Generate performance dashboard data"""
        if not self.performance_history:
            return {"message": "No performance data available"}
        
        df = pd.DataFrame(self.performance_history)
        
        dashboard = {
            "total_requests": len(df),
            "avg_execution_time": df['execution_time'].mean(),
            "avg_quality_score": df['quality_score'].mean(),
            "total_cost": df['cost'].sum(),
            "technique_usage": df['technique'].value_counts().to_dict(),
            "performance_by_technique": df.groupby('technique')[['quality_score', 'execution_time']].mean().to_dict()
        }
        
        return dashboard

# Initialize production engine
production_engine = ProductionPromptEngine(mock_client)

# Demonstrate auto-optimization
demo_tasks = [
    TaskDescription("Calculate compound interest on $10,000 at 5% for 10 years", "math", "medium"),
    TaskDescription("Design a mobile app for habit tracking", "creative", "high"),
    TaskDescription("Find the best restaurants in Tokyo for sushi", "analysis", "medium", "tools_available"),
]

async def demonstrate_production_system():
    display(Markdown("### 🏭 Production System Demonstration"))
    
    results = []
    
    for task in demo_tasks:
        display(Markdown(f"**Task**: {task.description}"))
        
        # Auto-select and execute technique
        result = await production_engine.optimize_and_execute(task)
        
        selected_technique = production_engine._select_best_technique(task, {})
        
        display(Markdown(f"- **Selected Technique**: {selected_technique.replace('_', ' ').title()}"))
        display(Markdown(f"- **Execution Time**: {result.execution_time:.2f}s"))
        display(Markdown(f"- **Quality Score**: {result.quality_score:.2f}/1.0"))
        display(Markdown(f"- **Cost**: ${result.cost:.4f}"))
        
        results.append(result)
        production_engine.performance_history.append({
            'technique': result.technique,
            'execution_time': result.execution_time,
            'quality_score': result.quality_score,
            'cost': result.cost
        })
        
        display(Markdown("---"))
    
    # Show dashboard
    dashboard = production_engine.get_performance_dashboard()
    
    display(Markdown("### 📊 Performance Dashboard"))
    for key, value in dashboard.items():
        if isinstance(value, dict):
            display(Markdown(f"**{key.replace('_', ' ').title()}:**"))
            for k, v in value.items():
                if isinstance(v, dict):
                    for sub_k, sub_v in v.items():
                        display(Markdown(f"- {sub_k}: {sub_v:.3f}"))
                else:
                    display(Markdown(f"- {k}: {v}"))
        else:
            display(Markdown(f"**{key.replace('_', ' ').title()}:** {value}"))

# Run production demonstration
await demonstrate_production_system()

## 📋 Best Practices and Guidelines

Based on our comprehensive analysis, here are the key best practices for advanced prompt engineering:

In [None]:
# Create best practices summary
best_practices = {
    "Technique Selection": {
        "Chain-of-Thought": "Best for structured, logical problems with clear steps",
        "Tree-of-Thoughts": "Ideal for creative problems with multiple solutions",
        "ReAct": "Perfect for tasks requiring external tools or information",
        "Standard": "Most efficient for simple, straightforward queries"
    },
    "Performance Optimization": {
        "Cache Results": "Implement prompt caching for repeated queries",
        "Batch Processing": "Process similar prompts together for efficiency",
        "Model Selection": "Choose appropriate model based on complexity",
        "Token Management": "Optimize prompts to reduce token usage"
    },
    "Quality Assurance": {
        "Output Validation": "Validate responses against expected criteria",
        "A/B Testing": "Continuously test and refine prompts",
        "Human Review": "Include human oversight for critical applications",
        "Feedback Loops": "Collect and incorporate user feedback"
    },
    "Production Deployment": {
        "Monitoring": "Track performance metrics and costs",
        "Scalability": "Design for horizontal scaling",
        "Error Handling": "Implement robust error recovery",
        "Security": "Include input validation and rate limiting"
    }
}

display(Markdown("## 🎯 Best Practices Summary"))

for category, practices in best_practices.items():
    display(Markdown(f"### {category}"))
    for practice, description in practices.items():
        display(Markdown(f"- **{practice}**: {description}"))
    display(Markdown(""))

## 🎉 Conclusion & Next Steps

### 🏆 What You've Learned

✅ **Chain-of-Thought Reasoning**: Step-by-step logical reasoning for complex problems  
✅ **Tree-of-Thoughts Exploration**: Multi-path problem solving with optimal path selection  
✅ **ReAct Framework**: Integration of reasoning with external tools and APIs  
✅ **Performance Analysis**: Comprehensive benchmarking and optimization strategies  
✅ **Production Integration**: Enterprise-grade deployment patterns  

### 🚀 Real-World Applications

- **Customer Service**: Automated support with intelligent reasoning
- **Content Creation**: Enhanced writing and creative processes
- **Data Analysis**: Complex analytical workflows
- **Education**: Intelligent tutoring and learning systems
- **Research**: Automated literature review and synthesis

### 📚 Further Learning

1. **Advanced Topics**: Meta-prompting, self-consistency decoding
2. **Model Fine-tuning**: Custom models for specific domains
3. **Multi-Modal Integration**: Combining text, images, and audio
4. **Safety & Ethics**: Responsible AI deployment practices
5. **Emerging Research**: Latest papers and techniques

### 🛠️ Practice Exercises

1. **Apply CoT to your specific domain problems**
2. **Build a ReAct agent for your workflow**
3. **Optimize existing prompts using these techniques**
4. **Implement A/B testing for prompt variants**
5. **Deploy a production prompt engineering system**

### 🤝 Community & Resources

- **GitHub Repository**: Complete code examples and implementations
- **Documentation**: Detailed guides and API references
- **Community Forum**: Ask questions and share insights
- **Workshop Schedule**: Live training sessions
- **Research Papers**: Latest advances in prompt engineering

---

**🎯 Congratulations!** You've completed the Advanced Prompt Engineering interactive notebook. You now have hands-on experience with cutting-edge prompt engineering techniques and are ready to apply them in real-world applications.

**Next Steps**: Explore the other interactive notebooks in this series to dive deeper into specific applications and advanced topics!