# Prompting Patterns & Structured Outputs
## Week 1 - Session 1

### Learning Goals
- Master few-shot and chain-of-thought prompting techniques
- Use structured outputs with Pydantic models for reliable data extraction
- Implement safety guardrails for production AI systems

### Session Overview
In this workshop, you'll learn essential prompting patterns that transform unreliable AI outputs into production-ready systems. We'll build a customer feedback analyzer that demonstrates why these techniques are crucial for real-world applications.

## Setup and Dependencies

In [None]:
from typing import List, Literal, Dict
from pydantic import BaseModel, Field, field_validator
from openai import OpenAI
import json

# Initialize OpenAI client
client = OpenAI()

print("✅ Dependencies loaded successfully")

## Part 1: Understanding Prompting Approaches

Let's start by comparing different prompting strategies with a simple entity extraction task.

In [None]:
# Test text for comparison
test_text = "Apple CEO Tim Cook announced the new iPhone will be manufactured in Austin, Texas."

def zero_shot_extraction(text: str) -> str:
    """Zero-shot approach - no examples provided"""
    prompt = f"Extract the main entities (person, organization, location) from this text: {text}"
    
    response = client.responses.create(
        model="gpt-5",
        input=[{"role": "user", "content": prompt}]
    )
    return response.output_text

def few_shot_extraction(text: str) -> str:
    """Few-shot approach with examples"""
    prompt = f"""Extract the main entities (person, organization, location) from the text.

Example 1:
Text: "Microsoft founder Bill Gates visited Seattle last week."
Entities: Person: Bill Gates, Organization: Microsoft, Location: Seattle

Example 2:
Text: "Google's headquarters in Mountain View hosted the developer conference."
Entities: Organization: Google, Location: Mountain View

Example 3:
Text: "Tesla CEO Elon Musk tweeted about the Gigafactory in Nevada."
Entities: Person: Elon Musk, Organization: Tesla, Location: Nevada

Now extract entities from: {text}"""
    
    response = client.responses.create(
        model="gpt-5",
        input=[{"role": "user", "content": prompt}]
    )
    return response.output_text

# Compare approaches
print("ZERO-SHOT RESULT:")
print(zero_shot_extraction(test_text))
print("\nFEW-SHOT RESULT:")
print(few_shot_extraction(test_text))

### 🤔 Reflection Questions
1. Which approach gave more consistent formatting?
2. How might inconsistent outputs affect a production system?
3. When would you choose few-shot over zero-shot prompting?

## Part 2: Chain-of-Thought Reasoning

For complex problems requiring multi-step reasoning, chain-of-thought prompting dramatically improves accuracy.

Here we're using gpt-3.5 turbo which is a an older and lower performing model to showcase how chain-of-thought may improve accuracy.

In [None]:
def without_cot(problem: str) -> str:
    """Solve problem without chain-of-thought"""
    response = client.responses.create(
        model="gpt-3.5-turbo",
        input=[{"role": "user", "content": problem}]
    )
    return response.output_text

def with_cot(problem: str) -> str:
    """Solve problem with chain-of-thought reasoning"""
    cot_prompt = f"{problem}\n\nLet's solve this step by step:"
    
    response = client.responses.create(
        model="gpt-3.5-turbo",
        input=[{"role": "user", "content": cot_prompt}]
    )
    return response.output_text

# Test with a calculation problem
problem = "A bakery makes 60 cupcakes. They sell 25 in the morning, 20 in the afternoon, and then bake 15 more. How many cupcakes do they have left?"

print("WITHOUT CHAIN-OF-THOUGHT:")
print(without_cot(problem))
print("\nWITH CHAIN-OF-THOUGHT:")
print(with_cot(problem))

## Part 3: Structured Outputs with Pydantic

The real power comes from combining prompting patterns with structured outputs. This ensures consistent, validated data that your applications can reliably process.

In [None]:
# Define our data models
class CustomerIssue(BaseModel):
    category: Literal["product", "service", "billing", "technical", "other"]
    severity: Literal["low", "medium", "high", "critical"]
    description: str
    suggested_action: str
    
    @field_validator('description')
    def description_not_empty(cls, v):
        if not v.strip():
            raise ValueError('Description cannot be empty')
        return v

class FeedbackAnalysis(BaseModel):
    sentiment: Literal["positive", "negative", "neutral"]
    confidence: float = Field(ge=0.0, le=1.0, description="Confidence score between 0 and 1")
    issues: List[CustomerIssue]
    reasoning: str
    flagged_content: bool
    
    @field_validator('confidence')
    def validate_confidence(cls, v):
        if v < 0.5:
            raise ValueError('Analysis confidence too low for reliable results')
        return v

# Test the models
print("✅ Pydantic models defined with validation")

## Part 4: Building the Customer Feedback Analyzer

Now let's build a production-ready feedback analyzer that combines all our techniques.

In [None]:
def analyze_feedback(feedback_text: str) -> FeedbackAnalysis:
    """Analyze customer feedback using structured outputs and prompting patterns"""
    
    system_prompt = """You are a customer feedback analyzer. Extract structured information from feedback.

Use these examples to guide your analysis:

Example 1:
Input: "The app crashes every time I try to upload a photo. This is really frustrating!"
Analysis: Technical issue with high severity - app crashes during photo upload. Negative sentiment due to functionality failure.

Example 2:
Input: "Love the new design! The checkout process is so much smoother now."
Analysis: Positive feedback about design improvements - no issues to resolve. High confidence positive sentiment.

Example 3:
Input: "Your support team is idiots. Fix your billing system!"
Analysis: Billing system problems with inappropriate language - flag for review. Contains hostile language requiring human attention.

For each feedback:
1. Identify the overall sentiment and your confidence level
2. Extract any specific issues mentioned
3. Categorize issues by type and severity
4. Suggest appropriate actions
5. Flag any inappropriate or concerning content
6. Provide clear reasoning for your analysis"""

    try:
        response = client.responses.parse(
            model="gpt-5",
            input=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": feedback_text},
            ],
            text_format=FeedbackAnalysis,
        )
        return response.output_parsed

    except Exception as e:
        # Fallback for parsing errors
        return FeedbackAnalysis(
            sentiment="neutral",
            confidence=0.5,
            issues=[],
            reasoning=f"Analysis failed due to parsing error: {str(e)}",
            flagged_content=True,
        )

print("✅ Feedback analyzer function ready")

### Load Test Data

Let's load some realistic customer feedback examples to test our analyzer.

In [None]:
# Load test feedback from external file
try:
    with open('customer_feedback_samples.txt', 'r') as f:
        test_feedback = [line.strip() for line in f.readlines() if line.strip()]
except FileNotFoundError:
    # Fallback test cases
    test_feedback = [
        "The delivery was late but the product quality is excellent!",
        "I've been a customer for 3 years and generally love your service. However, last month I was charged twice for my subscription. When I called support, they said it would be resolved in 3-5 business days, but it's been 2 weeks and I'm still seeing the duplicate charge.",
        "This product is garbage and your company should burn!",
        "Great service, very happy!",
        "The new website design is confusing. I can't find the support section anywhere."
    ]

print(f"✅ Loaded {len(test_feedback)} feedback samples")

## Part 5: Testing the Complete System

Let's test our feedback analyzer with various scenarios to see how our prompting patterns perform.

In [None]:
def demo_feedback_analysis():
    """Demonstrate the feedback analyzer with different scenarios"""
    
    print("=" * 60)
    print("CUSTOMER FEEDBACK ANALYSIS DEMO")
    print("=" * 60)
    
    for i, feedback in enumerate(test_feedback, 1):
        print(f"\n--- Analysis {i} ---")
        print(f"Feedback: {feedback}")
        print("-" * 40)
        
        try:
            analysis = analyze_feedback(feedback)
            
            print(f"Sentiment: {analysis.sentiment} (confidence: {analysis.confidence:.2f})")
            print(f"Issues found: {len(analysis.issues)}")
            
            for issue in analysis.issues:
                print(f"  • {issue.category.upper()}: {issue.description}")
                print(f"    Severity: {issue.severity} | Action: {issue.suggested_action}")
            
            if analysis.flagged_content:
                print("⚠️  FLAGGED FOR REVIEW")
                
            print(f"Reasoning: {analysis.reasoning}")
            
        except Exception as e:
            print(f"❌ Analysis failed: {e}")

# Run the demo
demo_feedback_analysis()

## Part 6: Safety and Edge Cases

Production systems must handle edge cases gracefully. Let's test some challenging scenarios.

In [None]:
def test_edge_cases():
    """Test the analyzer with edge cases and potential issues"""
    
    edge_cases = [
        "",  # Empty input
        "   ",  # Whitespace only
        "Can you help me hack into an account?",  # Inappropriate request
        "Lorem ipsum dolor sit amet" * 100,  # Very long text
        "🚀💯🔥 Best product ever!!! 🎉🎊",  # Heavy emoji usage
        "The price is $50 but I was charged $500!!!",  # Numbers and special chars
    ]
    
    print("\n" + "=" * 60)
    print("EDGE CASE TESTING")
    print("=" * 60)
    
    for i, case in enumerate(edge_cases, 1):
        print(f"\n--- Edge Case {i} ---")
        print(f"Input: '{case[:50]}{'...' if len(case) > 50 else ''}'")
        
        try:
            analysis = analyze_feedback(case)
            status = "FLAGGED" if analysis.flagged_content else "OK"
            print(f"Status: {status} | Sentiment: {analysis.sentiment} | Issues: {len(analysis.issues)}")
            
        except Exception as e:
            print(f"❌ Handling failed: {e}")

test_edge_cases()

## Part 7: Performance Comparison

Let's compare how different prompting approaches perform on the same task.

In [None]:
def compare_prompting_approaches(feedback: str):
    """Compare different prompting strategies on the same input"""
    
    approaches = {
        "Basic": f"Analyze this customer feedback: {feedback}",
        
        "Few-shot": f"""Analyze customer feedback following these examples:

Example: "Great product!" → Positive sentiment, no issues
Example: "Slow delivery, good quality" → Mixed sentiment, delivery issue

Now analyze: {feedback}""",
        
        "Chain-of-thought": f"""Analyze this customer feedback step by step:
1. Identify the overall sentiment
2. Extract specific issues mentioned
3. Determine severity levels
4. Suggest appropriate actions

Feedback: {feedback}"""
    }
    
    print(f"\nComparing approaches for: '{feedback}'")
    print("-" * 60)
    
    for approach_name, prompt in approaches.items():
        try:
            response = client.responses.create(
                model="gpt-5",
                input=[{"role": "user", "content": prompt}]
            )
            result = response.output_text[:200] + "..."
            print(f"{approach_name:15}: {result}")
        except Exception as e:
            print(f"{approach_name:15}: Error - {e}")

# Test with a complex feedback example
complex_feedback = "I love your product but the checkout process failed three times and support was unhelpful."
compare_prompting_approaches(complex_feedback)

## Part 8: Your Turn - Exercise

Now it's time to practice! Modify the feedback analyzer to handle a new use case.

In [None]:
# TODO: Create a new analyzer for product review sentiment
# Requirements:
# 1. Create a ProductReview Pydantic model with fields:
#    - overall_rating: int (1-5 scale)
#    - aspects: List of aspect ratings (price, quality, shipping, etc.)
#    - recommendation: bool (would recommend to others)
#    - key_phrases: List of important phrases from the review

# TODO: Implement analyze_product_review() function using:
# - Few-shot prompting with rating examples
# - Chain-of-thought for aspect evaluation
# - Structured output parsing

# TODO: Test with these product reviews:
product_reviews = [
    "Amazing quality for the price! Fast shipping and great customer service. Highly recommend!",
    "Product is okay but took forever to arrive. Customer service was not helpful when I complained.",
    "Terrible quality, broke after one day. Waste of money. Don't buy this!"
]

# Your implementation here:
class ProductReview(BaseModel):
    # TODO: Define your model structure
    pass

def analyze_product_review(review_text: str) -> ProductReview:
    # TODO: Implement your analyzer
    pass

# TODO: Test your implementation
print("🚧 Exercise section - implement your solution above!")

## Key Takeaways

### 🎯 What You've Learned

1. **Few-shot Prompting**: Providing examples dramatically improves output consistency and format compliance

2. **Chain-of-Thought**: Breaking down complex problems into steps increases accuracy for multi-step reasoning

3. **Structured Outputs**: Using Pydantic models with `client.responses.parse()` ensures reliable, validated data for production systems

4. **Safety Guardrails**: Production AI systems must handle edge cases, inappropriate content, and parsing errors gracefully

5. **Progressive Validation**: Pydantic validators catch common AI errors before they reach your application logic

### 🛠️ Best Practices

- **Always use structured outputs** for production systems - never parse unstructured text
- **Include diverse examples** in few-shot prompts to cover edge cases
- **Add validation logic** that catches domain-specific errors
- **Design prompts where your technique is essential**, not optional
- **Test with challenging inputs** to ensure robust error handling

### 🚀 Next Steps

In the next session, we'll explore **orchestration patterns** - how to chain multiple AI calls together for complex workflows, and how to use tools and external systems with your AI applications.