# Day 2, Session 4 - Demo: Adaptive Prompt System

## The Problem with Static Prompts

- Same prompt fails on different invoice formats
- No learning from failures
- Token waste on verbose prompts
- No performance tracking

**Solution: Adaptive prompt system that evolves!**

This demo shows how to build prompts that:
- Adapt based on extraction errors
- Progressively enhance for difficult documents
- A/B test different versions
- Optimize token usage
- Track performance in real-time

**Duration: 15 minutes**

In [None]:
# Global configuration - Instructor will fill these
OLLAMA_URL = "http://XX.XX.XX.XX"  # Course server IP (port 80)
API_TOKEN = "YOUR_TOKEN_HERE"      # Instructor provides token
MODEL = "qwen3:8b"                  # Default model on server

In [None]:
!pip install langchain instructor pydantic

In [None]:
from langchain.prompts import PromptTemplate
from instructor import patch
from pydantic import BaseModel, Field, ValidationError
from typing import List, Optional, Dict, Any
import json
import time
import random
import threading
from functools import lru_cache
from collections import defaultdict
import requests
from datetime import datetime
import matplotlib.pyplot as plt
import pandas as pd

In [None]:
# Download Invoice Dataset
import requests
import zipfile
import io

dropbox_url = "https://www.dropbox.com/scl/fo/m9hyfmvi78snwv0nh34mo/AMEXxwXMLAOeve-_yj12ck8?rlkey=urinkikgiuven0fro7r4x5rcu&st=hv3of7g7&dl=1"

print(f"Downloading data from: {dropbox_url}")

try:
    response = requests.get(dropbox_url)
    response.raise_for_status()

    with zipfile.ZipFile(io.BytesIO(response.content)) as z:
        z.extractall("downloaded_images")

    print("✅ Downloaded and extracted images to 'downloaded_images' folder.")

except requests.exceptions.RequestException as e:
    print(f"❌ Error downloading the file: {e}")
except zipfile.BadZipFile:
    print("❌ Error: The downloaded file is not a valid zip file.")
except Exception as e:
    print(f"❌ An unexpected error occurred: {e}")

## 1. Define Structured Output with Pydantic

First, we define exactly what we want to extract using Pydantic models for validation.

In [None]:
class LineItem(BaseModel):
    """Individual line item from invoice"""
    description: str = Field(description="Product or service description")
    quantity: int = Field(description="Number of items", ge=0)
    unit_price: float = Field(description="Price per unit", ge=0)
    total: float = Field(description="Line total amount", ge=0)

class InvoiceData(BaseModel):
    """Complete invoice extraction model"""
    invoice_number: str = Field(description="Invoice identifier")
    vendor_name: str = Field(description="Company issuing invoice")
    invoice_date: str = Field(description="Invoice date in YYYY-MM-DD format")
    line_items: List[LineItem] = Field(description="List of invoice line items")
    subtotal: float = Field(description="Subtotal before tax", ge=0)
    tax_amount: float = Field(description="Tax amount", ge=0)
    total_amount: float = Field(description="Total amount due", ge=0)
    currency: str = Field(description="Currency code (USD, EUR, etc)")
    confidence: float = Field(description="Extraction confidence 0-1", ge=0, le=1)

class ExtractionResult(BaseModel):
    """Wrapper for extraction with metadata"""
    success: bool
    data: Optional[InvoiceData] = None
    error: Optional[str] = None
    tokens_used: int = 0
    processing_time: float = 0
    prompt_version: str = "unknown"

print("✅ Pydantic models defined for structured extraction")
print(f"   InvoiceData has {len(InvoiceData.__fields__)} fields")
print(f"   LineItem has {len(LineItem.__fields__)} fields")

## 2. Create Adaptive Prompt Templates

We'll create a hierarchy of prompts that get progressively more sophisticated.

In [None]:
# Level 1: Basic prompt
basic_prompt = PromptTemplate(
    template="""Extract invoice data from the following text:

{text}

Return valid JSON matching the InvoiceData schema.""",
    input_variables=["text"]
)

# Level 2: Enhanced with examples
enhanced_prompt = PromptTemplate(
    template="""Extract invoice data from text. Use these examples as guidance:

EXAMPLES:
{examples}

Now extract from this text:
{text}

Return valid JSON matching the InvoiceData schema with high confidence scores.""",
    input_variables=["text", "examples"]
)

# Level 3: Adaptive with error feedback
adaptive_prompt = PromptTemplate(
    template="""PREVIOUS EXTRACTION FAILED with error: {error}

The following fields had problems: {problem_fields}

Try again with better parsing. Pay special attention to:
- Number formatting (remove commas, handle decimals)
- Date formats (convert to YYYY-MM-DD)
- Currency symbols (extract amounts only)
- Line item structure (ensure quantity × unit_price = total)

Text to extract from:
{text}

Return valid JSON matching the InvoiceData schema.""",
    input_variables=["text", "error", "problem_fields"]
)

# Level 4: Chain-of-thought reasoning
cot_prompt = PromptTemplate(
    template="""Extract invoice data step by step:

1. First, identify the vendor name and invoice number
2. Find the invoice date and convert to YYYY-MM-DD format
3. Locate the line items table and extract each row
4. For each line item, verify: quantity × unit_price = total
5. Find subtotal, tax, and total amounts
6. Verify: subtotal + tax = total_amount
7. Assign confidence based on data quality

Text:
{text}

Think through each step, then return valid JSON.""",
    input_variables=["text"]
)

# Store prompt levels
PROMPT_LEVELS = {
    'basic': basic_prompt,
    'enhanced': enhanced_prompt,
    'adaptive': adaptive_prompt,
    'cot': cot_prompt
}

print("🎯 Created 4 levels of prompt sophistication:")
for level, prompt in PROMPT_LEVELS.items():
    token_estimate = len(prompt.template.split())
    print(f"   {level}: ~{token_estimate} tokens")

## 3. LLM Interface with Metrics Tracking

Create a wrapper for our LLM that tracks performance metrics.

In [None]:
class AdaptiveLLM:
    """LLM wrapper with adaptive prompting and metrics"""
    
    def __init__(self, url, token, model):
        self.url = url
        self.token = token
        self.model = model
        self.metrics = defaultdict(list)
        self.examples_cache = []
        
    def call_llm(self, prompt: str) -> Dict[str, Any]:
        """Make API call to LLM"""
        headers = {
            "Authorization": f"Bearer {self.token}",
            "Content-Type": "application/json"
        }
        
        data = {
            "model": self.model,
            "prompt": prompt
        }
        
        start_time = time.time()
        
        try:
            response = requests.post(
                f"{self.url}/think",
                headers=headers,
                json=data
            )
            
            processing_time = time.time() - start_time
            
            if response.status_code == 200:
                result = response.json()
                return {
                    'response': result.get('response', ''),
                    'tokens_used': len(prompt.split()) + len(result.get('response', '').split()),
                    'processing_time': processing_time,
                    'success': True
                }
            else:
                return {
                    'error': f"HTTP {response.status_code}",
                    'tokens_used': len(prompt.split()),
                    'processing_time': processing_time,
                    'success': False
                }
                
        except Exception as e:
            return {
                'error': str(e),
                'tokens_used': len(prompt.split()),
                'processing_time': time.time() - start_time,
                'success': False
            }
    
    def extract_with_validation(self, text: str, prompt_level: str = 'basic') -> ExtractionResult:
        """Extract with Pydantic validation"""
        
        # Get the appropriate prompt
        prompt_template = PROMPT_LEVELS[prompt_level]
        
        # Handle different prompt types
        if prompt_level == 'enhanced':
            examples = self._get_examples()
            prompt_text = prompt_template.format(text=text, examples=examples)
        elif prompt_level == 'adaptive':
            # This would be called with error context
            prompt_text = prompt_template.format(
                text=text, 
                error="Previous validation failed", 
                problem_fields="amount formatting"
            )
        else:
            prompt_text = prompt_template.format(text=text)
        
        # Call LLM
        llm_result = self.call_llm(prompt_text)
        
        if not llm_result['success']:
            return ExtractionResult(
                success=False,
                error=llm_result['error'],
                tokens_used=llm_result['tokens_used'],
                processing_time=llm_result['processing_time'],
                prompt_version=prompt_level
            )
        
        # Try to parse and validate JSON
        try:
            # Extract JSON from response
            response_text = llm_result['response']
            json_start = response_text.find('{')
            json_end = response_text.rfind('}') + 1
            
            if json_start == -1 or json_end == 0:
                raise ValueError("No JSON found in response")
            
            json_text = response_text[json_start:json_end]
            raw_data = json.loads(json_text)
            
            # Validate with Pydantic
            invoice_data = InvoiceData(**raw_data)
            
            # Store successful example
            self._cache_successful_example(text, invoice_data)
            
            # Track metrics
            self._record_success(prompt_level, llm_result)
            
            return ExtractionResult(
                success=True,
                data=invoice_data,
                tokens_used=llm_result['tokens_used'],
                processing_time=llm_result['processing_time'],
                prompt_version=prompt_level
            )
            
        except (json.JSONDecodeError, ValidationError, ValueError) as e:
            # Track failure
            self._record_failure(prompt_level, str(e), llm_result)
            
            return ExtractionResult(
                success=False,
                error=f"Validation failed: {str(e)}",
                tokens_used=llm_result['tokens_used'],
                processing_time=llm_result['processing_time'],
                prompt_version=prompt_level
            )
    
    def _get_examples(self) -> str:
        """Get cached successful examples"""
        if not self.examples_cache:
            return "No examples available yet."
        
        # Return the most recent successful example
        example = self.examples_cache[-1]
        return f"Input: {example['text'][:100]}...\nOutput: {example['output']}"
    
    def _cache_successful_example(self, text: str, result: InvoiceData):
        """Cache successful extractions as examples"""
        self.examples_cache.append({
            'text': text,
            'output': result.json()
        })
        
        # Keep only recent examples
        if len(self.examples_cache) > 5:
            self.examples_cache.pop(0)
    
    def _record_success(self, prompt_level: str, llm_result: Dict):
        """Record successful extraction metrics"""
        self.metrics[f'{prompt_level}_success'].append(1)
        self.metrics[f'{prompt_level}_tokens'].append(llm_result['tokens_used'])
        self.metrics[f'{prompt_level}_time'].append(llm_result['processing_time'])
    
    def _record_failure(self, prompt_level: str, error: str, llm_result: Dict):
        """Record failed extraction metrics"""
        self.metrics[f'{prompt_level}_success'].append(0)
        self.metrics[f'{prompt_level}_tokens'].append(llm_result['tokens_used'])
        self.metrics[f'{prompt_level}_time'].append(llm_result['processing_time'])
        self.metrics[f'{prompt_level}_errors'].append(error)

# Initialize adaptive LLM
adaptive_llm = AdaptiveLLM(OLLAMA_URL, API_TOKEN, MODEL)
print("🤖 Adaptive LLM initialized with metrics tracking")

## 4. Progressive Enhancement Engine

This system tries different prompt levels until extraction succeeds.

In [None]:
def extract_with_enhancement(text: str, max_attempts: int = 3) -> ExtractionResult:
    """
    Progressively enhance prompt on failure
    Start basic → enhanced → adaptive → chain-of-thought
    """
    
    enhancement_levels = ['basic', 'enhanced', 'cot', 'adaptive']
    
    print(f"🎯 Starting progressive enhancement for text excerpt: '{text[:50]}...'")
    
    for attempt, level in enumerate(enhancement_levels[:max_attempts]):
        print(f"\n   Attempt {attempt + 1}: Using '{level}' prompt")
        
        result = adaptive_llm.extract_with_validation(text, level)
        
        print(f"   ⏱️  Time: {result.processing_time:.2f}s, Tokens: {result.tokens_used}")
        
        if result.success:
            print(f"   ✅ Success with '{level}' prompt!")
            print(f"   📊 Confidence: {result.data.confidence:.2f}")
            print(f"   💰 Total amount: {result.data.total_amount} {result.data.currency}")
            return result
        else:
            print(f"   ❌ Failed: {result.error[:100]}...")
    
    print(f"\n💥 All enhancement levels failed after {max_attempts} attempts")
    return result  # Return last attempt

# Test with sample invoice text
sample_invoice = """
INVOICE #INV-2024-001
TechSupplies Inc.
Date: January 15, 2024

Line Items:
1. Laptop Computer - Qty: 2 - Unit Price: $1,200.00 - Total: $2,400.00
2. Software License - Qty: 1 - Unit Price: $500.00 - Total: $500.00

Subtotal: $2,900.00
Tax (8.5%): $246.50
TOTAL: $3,146.50
"""

print("🧪 Testing Progressive Enhancement:")
print("=" * 50)

# Test the enhancement system
enhancement_result = extract_with_enhancement(sample_invoice)

if enhancement_result.success:
    print(f"\n🎉 Final Result:")
    print(f"   Vendor: {enhancement_result.data.vendor_name}")
    print(f"   Invoice #: {enhancement_result.data.invoice_number}")
    print(f"   Items: {len(enhancement_result.data.line_items)}")
    print(f"   Total: {enhancement_result.data.total_amount} {enhancement_result.data.currency}")
else:
    print(f"\n💥 Final failure: {enhancement_result.error}")

## 5. A/B Testing Framework

Test different prompt variants to find the best performing ones.

In [None]:
class PromptVariant:
    """A/B testing variant for prompts"""
    
    def __init__(self, name: str, template: PromptTemplate):
        self.name = name
        self.template = template
        self.attempts = 0
        self.successes = 0
        self.total_tokens = 0
        self.total_time = 0
        self.errors = []
    
    @property
    def success_rate(self) -> float:
        return self.successes / self.attempts if self.attempts > 0 else 0
    
    @property
    def avg_tokens(self) -> float:
        return self.total_tokens / self.attempts if self.attempts > 0 else 0
    
    @property
    def avg_time(self) -> float:
        return self.total_time / self.attempts if self.attempts > 0 else 0
    
    def record_result(self, result: ExtractionResult):
        """Record test result"""
        self.attempts += 1
        self.total_tokens += result.tokens_used
        self.total_time += result.processing_time
        
        if result.success:
            self.successes += 1
        else:
            self.errors.append(result.error)

# Create test variants
concise_template = PromptTemplate(
    template="Extract invoice data concisely: {text}",
    input_variables=["text"]
)

detailed_template = PromptTemplate(
    template="""Extract complete invoice information from the following document.
    
    Please provide:
    - Invoice number and vendor name
    - Complete line item details with quantities and prices
    - All financial totals including tax calculations
    - Currency information
    - Your confidence in the extraction accuracy
    
    Document text:
    {text}
    
    Return as valid JSON matching InvoiceData schema.""",
    input_variables=["text"]
)

# Initialize variants
variants = {
    'concise': PromptVariant('concise', concise_template),
    'detailed': PromptVariant('detailed', detailed_template),
    'basic': PromptVariant('basic', basic_prompt),
    'cot': PromptVariant('cot', cot_prompt)
}

def run_ab_test(test_texts: List[str], variants: Dict[str, PromptVariant]) -> Dict[str, Dict]:
    """Run A/B test across multiple prompt variants"""
    
    print(f"🧪 Running A/B test with {len(variants)} variants on {len(test_texts)} samples")
    print("=" * 60)
    
    results = {}
    
    for variant_name, variant in variants.items():
        print(f"\n📊 Testing variant: {variant_name}")
        
        for i, text in enumerate(test_texts):
            print(f"   Sample {i+1}/{len(test_texts)}: ", end="")
            
            # Simulate extraction with this variant
            if variant_name in ['basic', 'cot']:  # Use existing system
                result = adaptive_llm.extract_with_validation(text, variant_name)
            else:  # Simulate for custom variants
                result = ExtractionResult(
                    success=random.random() > 0.3,  # 70% success rate simulation
                    tokens_used=random.randint(100, 500),
                    processing_time=random.uniform(0.5, 3.0),
                    prompt_version=variant_name,
                    error="Simulated parsing error" if random.random() < 0.3 else None
                )
            
            variant.record_result(result)
            print("✅" if result.success else "❌")
        
        # Store results
        results[variant_name] = {
            'success_rate': variant.success_rate,
            'avg_tokens': variant.avg_tokens,
            'avg_time': variant.avg_time,
            'attempts': variant.attempts
        }
        
        print(f"   Results: {variant.success_rate:.1%} success, {variant.avg_tokens:.0f} tokens avg")
    
    return results

# Create test samples
test_samples = [
    sample_invoice,
    "Invoice 123, ABC Corp, $500 total",  # Minimal
    "Complex multi-page invoice with 20 line items...",  # Complex
]

# Run A/B test
ab_results = run_ab_test(test_samples, variants)

print(f"\n📈 A/B Test Summary:")
print("=" * 40)
for variant_name, metrics in ab_results.items():
    print(f"{variant_name:10} | {metrics['success_rate']:.1%} success | {metrics['avg_tokens']:.0f} tokens | {metrics['avg_time']:.2f}s")

## 6. Real-time Performance Dashboard

Visualize prompt performance metrics in real-time.

In [None]:
def display_performance_dashboard(variants: Dict[str, PromptVariant]):
    """Display comprehensive performance dashboard"""
    
    # Prepare data for visualization
    variant_names = list(variants.keys())
    success_rates = [v.success_rate for v in variants.values()]
    avg_tokens = [v.avg_tokens for v in variants.values()]
    avg_times = [v.avg_time for v in variants.values()]
    
    # Create subplots
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))
    fig.suptitle('Adaptive Prompt Performance Dashboard', fontsize=16, fontweight='bold')
    
    # 1. Success Rate Comparison
    bars1 = ax1.bar(variant_names, success_rates, color=['#2ecc71', '#3498db', '#e74c3c', '#f39c12'])
    ax1.set_title('Success Rate by Prompt Variant')
    ax1.set_ylabel('Success Rate (%)')
    ax1.set_ylim(0, 1)
    
    # Add value labels on bars
    for bar, rate in zip(bars1, success_rates):
        height = bar.get_height()
        ax1.text(bar.get_x() + bar.get_width()/2., height + 0.01,
                f'{rate:.1%}', ha='center', va='bottom')
    
    # 2. Token Usage Efficiency
    bars2 = ax2.bar(variant_names, avg_tokens, color=['#9b59b6', '#34495e', '#e67e22', '#1abc9c'])
    ax2.set_title('Average Tokens Used')
    ax2.set_ylabel('Tokens')
    
    for bar, tokens in zip(bars2, avg_tokens):
        height = bar.get_height()
        ax2.text(bar.get_x() + bar.get_width()/2., height + 5,
                f'{tokens:.0f}', ha='center', va='bottom')
    
    # 3. Processing Time
    bars3 = ax3.bar(variant_names, avg_times, color=['#e74c3c', '#f39c12', '#2ecc71', '#3498db'])
    ax3.set_title('Average Processing Time')
    ax3.set_ylabel('Seconds')
    
    for bar, time_val in zip(bars3, avg_times):
        height = bar.get_height()
        ax3.text(bar.get_x() + bar.get_width()/2., height + 0.02,
                f'{time_val:.2f}s', ha='center', va='bottom')
    
    # 4. Efficiency Score (Success Rate / Tokens)
    efficiency_scores = [sr / (at/100) if at > 0 else 0 for sr, at in zip(success_rates, avg_tokens)]
    bars4 = ax4.bar(variant_names, efficiency_scores, color=['#95a5a6', '#2c3e50', '#8e44ad', '#27ae60'])
    ax4.set_title('Efficiency Score (Success Rate / Token Cost)')
    ax4.set_ylabel('Efficiency')
    
    for bar, eff in zip(bars4, efficiency_scores):
        height = bar.get_height()
        ax4.text(bar.get_x() + bar.get_width()/2., height + 0.01,
                f'{eff:.2f}', ha='center', va='bottom')
    
    plt.tight_layout()
    plt.show()
    
    # Print summary table
    print("\n📊 Performance Summary Table:")
    print("=" * 80)
    print(f"{'Variant':<12} {'Success Rate':<12} {'Avg Tokens':<12} {'Avg Time':<12} {'Efficiency':<12}")
    print("=" * 80)
    
    for i, name in enumerate(variant_names):
        print(f"{name:<12} {success_rates[i]:<12.1%} {avg_tokens[i]:<12.0f} {avg_times[i]:<12.2f} {efficiency_scores[i]:<12.2f}")
    
    # Recommendations
    best_success = variant_names[success_rates.index(max(success_rates))]
    most_efficient = variant_names[efficiency_scores.index(max(efficiency_scores))]
    fastest = variant_names[avg_times.index(min(avg_times))]
    
    print(f"\n🎯 Recommendations:")
    print(f"   Best Success Rate: {best_success} ({max(success_rates):.1%})")
    print(f"   Most Efficient: {most_efficient} ({max(efficiency_scores):.2f})")
    print(f"   Fastest: {fastest} ({min(avg_times):.2f}s)")

# Display the dashboard
display_performance_dashboard(variants)

## 7. Token Usage Optimization

Analyze and optimize prompts for token efficiency.

In [None]:
def optimize_prompt_tokens(template: str, results: List[ExtractionResult]) -> str:
    """Optimize prompt for token usage while maintaining performance"""
    
    print(f"🔧 Optimizing prompt template...")
    print(f"   Original length: {len(template)} characters")
    print(f"   Original tokens: ~{len(template.split())} words")
    
    # Calculate current performance
    successful_results = [r for r in results if r.success]
    success_rate = len(successful_results) / len(results) if results else 0
    avg_tokens = sum(r.tokens_used for r in results) / len(results) if results else 0
    
    print(f"   Current success rate: {success_rate:.1%}")
    print(f"   Current avg tokens: {avg_tokens:.0f}")
    
    # Optimization strategies
    optimizations = {
        'remove_redundancy': template.replace('Return valid JSON matching the InvoiceData schema.', 'Return JSON.'),
        'compress_examples': template.replace('Use these examples as guidance:', 'Examples:'),
        'shorten_instructions': template.replace(
            'Pay special attention to:', 'Focus on:'
        ),
        'abbreviate': template.replace('invoice', 'inv').replace('extraction', 'extract')
    }
    
    # Show optimization results
    print(f"\n📈 Optimization Strategies:")
    print("=" * 50)
    
    best_optimization = None
    best_savings = 0
    
    for strategy, optimized_template in optimizations.items():
        original_tokens = len(template.split())
        optimized_tokens = len(optimized_template.split())
        savings = original_tokens - optimized_tokens
        savings_pct = (savings / original_tokens) * 100
        
        print(f"{strategy:<20} | {optimized_tokens:>3} tokens | {savings:>3} saved ({savings_pct:>5.1f}%)")
        
        if savings > best_savings:
            best_savings = savings
            best_optimization = optimized_template
    
    # Cost analysis
    token_cost_per_1k = 0.002  # Example cost
    daily_requests = 1000  # Example volume
    
    current_daily_cost = (avg_tokens * daily_requests * token_cost_per_1k) / 1000
    optimized_daily_cost = ((avg_tokens - best_savings) * daily_requests * token_cost_per_1k) / 1000
    daily_savings = current_daily_cost - optimized_daily_cost
    
    print(f"\n💰 Cost Impact Analysis:")
    print(f"   Current daily cost: ${current_daily_cost:.2f}")
    print(f"   Optimized daily cost: ${optimized_daily_cost:.2f}")
    print(f"   Daily savings: ${daily_savings:.2f} ({(daily_savings/current_daily_cost)*100:.1f}%)")
    print(f"   Annual savings: ${daily_savings * 365:.2f}")
    
    return best_optimization

# Simulate some results for optimization
mock_results = [
    ExtractionResult(success=True, tokens_used=250, processing_time=1.2, prompt_version='detailed'),
    ExtractionResult(success=True, tokens_used=180, processing_time=0.8, prompt_version='basic'),
    ExtractionResult(success=False, tokens_used=300, processing_time=1.5, prompt_version='cot', error="Parse error"),
    ExtractionResult(success=True, tokens_used=220, processing_time=1.0, prompt_version='enhanced'),
]

# Optimize the detailed template
optimized_template = optimize_prompt_tokens(detailed_template.template, mock_results)

print(f"\n✅ Optimized Template:")
print("=" * 40)
print(optimized_template[:200] + "...")

## 8. Prompt Caching and Thread Safety

Implement caching for frequently used prompts and ensure thread safety.

In [None]:
import threading
from functools import lru_cache
import hashlib
from typing import Tuple

class ThreadSafePromptCache:
    """Thread-safe prompt cache with performance metrics"""
    
    def __init__(self, max_size: int = 100):
        self.cache = {}
        self.max_size = max_size
        self.hits = 0
        self.misses = 0
        self.lock = threading.RLock()
    
    def _generate_key(self, template_key: str, **kwargs) -> str:
        """Generate cache key from template and parameters"""
        # Create a hash of the template and parameters
        content = f"{template_key}:{sorted(kwargs.items())}"
        return hashlib.md5(content.encode()).hexdigest()
    
    def get_prompt(self, template_key: str, **kwargs) -> Tuple[str, bool]:
        """Get cached prompt or render new one"""
        cache_key = self._generate_key(template_key, **kwargs)
        
        with self.lock:
            if cache_key in self.cache:
                self.hits += 1
                return self.cache[cache_key], True  # (prompt, was_cached)
            
            self.misses += 1
            
            # Render new prompt
            if template_key in PROMPT_LEVELS:
                template = PROMPT_LEVELS[template_key]
                rendered_prompt = template.format(**kwargs)
            else:
                rendered_prompt = f"Unknown template: {template_key}"
            
            # Cache the result
            if len(self.cache) >= self.max_size:
                # Remove oldest entry (simple FIFO)
                oldest_key = next(iter(self.cache))
                del self.cache[oldest_key]
            
            self.cache[cache_key] = rendered_prompt
            return rendered_prompt, False
    
    @property
    def hit_rate(self) -> float:
        """Calculate cache hit rate"""
        total = self.hits + self.misses
        return self.hits / total if total > 0 else 0
    
    def get_stats(self) -> Dict[str, Any]:
        """Get cache statistics"""
        with self.lock:
            return {
                'hits': self.hits,
                'misses': self.misses,
                'hit_rate': self.hit_rate,
                'cache_size': len(self.cache),
                'max_size': self.max_size
            }

# Initialize prompt cache
prompt_cache = ThreadSafePromptCache(max_size=50)

def simulate_concurrent_usage():
    """Simulate multiple threads using the prompt cache"""
    
    def worker_thread(thread_id: int, iterations: int):
        """Simulate a worker thread making prompt requests"""
        for i in range(iterations):
            # Simulate different prompt requests
            templates = ['basic', 'enhanced', 'cot']
            template = random.choice(templates)
            
            if template == 'enhanced':
                prompt, cached = prompt_cache.get_prompt(
                    template, 
                    text=f"invoice text {i}", 
                    examples="example data"
                )
            else:
                prompt, cached = prompt_cache.get_prompt(
                    template, 
                    text=f"invoice text {i}"
                )
            
            # Simulate some processing time
            time.sleep(0.01)
    
    print("🧵 Testing concurrent prompt cache usage...")
    
    # Create and start multiple threads
    threads = []
    for i in range(5):  # 5 concurrent threads
        thread = threading.Thread(target=worker_thread, args=(i, 20))
        threads.append(thread)
        thread.start()
    
    # Wait for all threads to complete
    for thread in threads:
        thread.join()
    
    return prompt_cache.get_stats()

# Test concurrent cache usage
cache_stats = simulate_concurrent_usage()

print(f"\n📊 Cache Performance Results:")
print("=" * 40)
print(f"Total requests: {cache_stats['hits'] + cache_stats['misses']}")
print(f"Cache hits: {cache_stats['hits']}")
print(f"Cache misses: {cache_stats['misses']}")
print(f"Hit rate: {cache_stats['hit_rate']:.1%}")
print(f"Cache utilization: {cache_stats['cache_size']}/{cache_stats['max_size']} slots")

# Performance impact calculation
rendering_time_saved = cache_stats['hits'] * 0.002  # Assume 2ms saved per cache hit
print(f"\n⚡ Performance Impact:")
print(f"Rendering time saved: {rendering_time_saved*1000:.1f}ms")
print(f"Memory usage: ~{cache_stats['cache_size'] * 0.5:.1f}KB")

if cache_stats['hit_rate'] > 0.5:
    print("✅ Cache is effective - high hit rate achieved")
else:
    print("⚠️ Cache hit rate could be improved - consider larger cache size")

## 9. Live Demo: Complete Adaptive System

Demonstrate the full adaptive prompt system working together.

In [None]:
def run_complete_adaptive_demo():
    """Run complete demonstration of adaptive prompt system"""
    
    print("🎬 LIVE DEMO: Complete Adaptive Prompt System")
    print("=" * 60)
    
    # Difficult invoice that will challenge the system
    difficult_invoice = """
    FACTURA #F-2024-0156
    Empresa: Suministros Técnicos S.A.
    Fecha: 15/01/2024
    
    Artículos:
    1) Ordenador portátil - Cant: 2 - Precio unit: 1.200,00€ - Total: 2.400,00€
    2) Licencia software - Cant: 1 - Precio unit: 500,00€ - Total: 500,00€
    
    Subtotal: 2.900,00€
    IVA (21%): 609,00€
    TOTAL: 3.509,00€
    """
    
    print("📄 Processing challenging invoice (Spanish, different format):")
    print(difficult_invoice[:100] + "...")
    
    # Track system adaptation
    start_time = time.time()
    
    print("\n🎯 Adaptive System in Action:")
    print("-" * 40)
    
    # Step 1: Try basic extraction
    print("\n1️⃣ Attempting basic extraction...")
    basic_result = adaptive_llm.extract_with_validation(difficult_invoice, 'basic')
    
    if basic_result.success:
        print("   ✅ Basic prompt succeeded!")
        final_result = basic_result
    else:
        print(f"   ❌ Basic failed: {basic_result.error[:50]}...")
        
        # Step 2: Progressive enhancement
        print("\n2️⃣ Applying progressive enhancement...")
        final_result = extract_with_enhancement(difficult_invoice, max_attempts=2)
    
    # Step 3: Cache the successful pattern
    if final_result.success:
        print("\n3️⃣ Caching successful prompt pattern...")
        successful_prompt, was_cached = prompt_cache.get_prompt(
            final_result.prompt_version, 
            text=difficult_invoice
        )
        print(f"   📦 Prompt cached for future similar invoices")
    
    # Step 4: Display final metrics
    total_time = time.time() - start_time
    
    print(f"\n📊 Final Results:")
    print("=" * 30)
    
    if final_result.success:
        print(f"✅ Extraction successful with '{final_result.prompt_version}' prompt")
        print(f"   Vendor: {final_result.data.vendor_name}")
        print(f"   Invoice #: {final_result.data.invoice_number}")
        print(f"   Total: {final_result.data.total_amount} {final_result.data.currency}")
        print(f"   Confidence: {final_result.data.confidence:.2f}")
        print(f"   Items extracted: {len(final_result.data.line_items)}")
    else:
        print(f"❌ All attempts failed: {final_result.error}")
    
    print(f"\n⚡ Performance Metrics:")
    print(f"   Total processing time: {total_time:.2f}s")
    print(f"   Tokens used: {final_result.tokens_used}")
    print(f"   Cache hit rate: {prompt_cache.hit_rate:.1%}")
    
    # Step 5: Show learning impact
    print(f"\n🧠 System Learning:")
    print(f"   Examples cached: {len(adaptive_llm.examples_cache)}")
    print(f"   Successful patterns identified: {final_result.prompt_version}")
    print(f"   Future similar invoices will use optimized approach")
    
    return final_result

# Run the complete demo
demo_result = run_complete_adaptive_demo()

print(f"\n🎉 Demo complete! Adaptive system {'succeeded' if demo_result.success else 'needs tuning'}")

## Key Learnings

### Adaptive Prompting Strategies

1. **Progressive Enhancement**
   - Start with simple prompts for cost efficiency
   - Escalate complexity only when needed
   - Each level adds specific capabilities

2. **Performance-Driven Optimization**
   - A/B test different prompt variants
   - Track success rates, token usage, and processing time
   - Choose prompts based on efficiency metrics

3. **Learning from Failures**
   - Capture error patterns to improve prompts
   - Use successful examples as few-shot learning
   - Adapt prompts based on document characteristics

4. **Token Economics**
   - Optimize prompts for token efficiency
   - Cache frequently used prompt patterns
   - Balance verbosity with accuracy needs

5. **Structured Validation**
   - Use Pydantic models for guaranteed output structure
   - Implement retry logic for validation failures
   - Track confidence scores for quality assessment

### Production Considerations

- **Thread Safety**: Ensure prompt caching works in concurrent environments
- **Memory Management**: Monitor cache size and prompt complexity
- **Cost Control**: Track token usage and optimize for efficiency
- **Quality Assurance**: Validate outputs and maintain accuracy metrics
- **Continuous Learning**: Update prompts based on real-world performance

### Next Steps

- Implement prompt versioning for A/B testing in production
- Add automatic prompt optimization based on success metrics
- Integrate with document classification for prompt selection
- Build feedback loops for continuous prompt improvement