# Structured Output with LangChain and OpenAI

## Tutorial Overview

This comprehensive tutorial covers **structured output** in LangChain - a powerful feature that allows you to get predictable, validated responses from LLMs in specific formats like JSON objects, Pydantic models, or dataclasses.

### What You'll Learn:
1. What is structured output and why it matters
2. Different schema types (Pydantic, TypedDict, JSON Schema)
3. Using `with_structured_output()` method
4. Real-world examples and use cases
5. Validation and error handling
6. Advanced patterns and best practices

---

## 1. Setup and Installation

First, let's install the required packages and set up our environment.

In [None]:
# Install required packages
# !pip install langchain langchain-openai python-dotenv pydantic

In [None]:
# Import necessary libraries
import os
from dotenv import load_dotenv
from typing import Optional, List
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI

# Load environment variables
load_dotenv()

# Verify API key is loaded
if not os.getenv("OPENAI_API_KEY"):
    raise ValueError("OPENAI_API_KEY not found in environment variables")

print("‚úÖ Environment setup complete!")

---

## 2. What is Structured Output?

### The Problem with Unstructured Responses

Traditional LLM responses are unstructured text that requires parsing:
- **Unpredictable format**: Responses vary in structure
- **Difficult to parse**: Requires complex regex or string manipulation
- **Error-prone**: Parsing can fail unexpectedly
- **No validation**: No guarantee the data is in the expected format

### The Solution: Structured Output

Structured output allows you to:
‚úÖ **Get predictable data formats** (JSON, Pydantic models, dataclasses)
‚úÖ **Automatic validation** of response data
‚úÖ **Type safety** with Python type hints
‚úÖ **Easy integration** with your application logic
‚úÖ **Nested structures** for complex data

### Key Differences from Tools

| Feature | Structured Output | Tools |
|---------|------------------|-------|
| Response guarantee | Always responds in specified format | May or may not call a tool |
| Number of responses | Single response | Can call multiple tools |
| Use case | Data extraction, classification | Function execution, actions |

---

## 3. Basic Example: Getting Started

Let's start with a simple example to extract person information.

In [None]:
# Initialize the ChatOpenAI model
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0  # Lower temperature for more consistent outputs
)

print("‚úÖ Model initialized successfully!")

In [None]:
# Define a simple Pydantic model for person information
class Person(BaseModel):
    """Information about a person."""
    name: str = Field(description="The person's full name")
    age: int = Field(description="The person's age in years")
    email: Optional[str] = Field(default=None, description="The person's email address")

# Create a model with structured output
structured_llm = llm.with_structured_output(Person)

# Test it with a query
response = structured_llm.invoke("My name is John Doe and I'm 30 years old. My email is john@example.com")

print("Response type:", type(response))
print("\nStructured Output:")
print(f"Name: {response.name}")
print(f"Age: {response.age}")
print(f"Email: {response.email}")
print("\nFull object:", response)

---

## 4. Schema Types

LangChain supports three main schema types for structured output:

### 4.1 Pydantic Models (Recommended)

**Advantages:**
- Rich feature set with field validation
- Detailed field descriptions
- Nested structures support
- Automatic type conversion
- Runtime validation

In [None]:
# Example: Movie Information Extraction
class Actor(BaseModel):
    """Information about an actor."""
    name: str = Field(description="Actor's full name")
    role: str = Field(description="Character name they played")

class MovieDetails(BaseModel):
    """Detailed information about a movie."""
    title: str = Field(description="Movie title")
    year: int = Field(description="Release year")
    cast: List[Actor] = Field(description="List of main actors")
    genres: List[str] = Field(description="Movie genres")
    budget: Optional[float] = Field(default=None, description="Budget in millions USD")
    rating: Optional[float] = Field(default=None, description="IMDb rating out of 10")

# Create structured model
movie_extractor = llm.with_structured_output(MovieDetails)

# Extract movie information
movie_info = movie_extractor.invoke(
    "Tell me about The Matrix from 1999. It starred Keanu Reeves as Neo and "
    "Laurence Fishburne as Morpheus. It's a sci-fi action movie with a budget of 63 million dollars "
    "and has an IMDb rating of 8.7."
)

print("Movie Information Extracted:")
print(f"Title: {movie_info.title}")
print(f"Year: {movie_info.year}")
print(f"Genres: {', '.join(movie_info.genres)}")
print(f"Budget: ${movie_info.budget}M")
print(f"Rating: {movie_info.rating}/10")
print("\nCast:")
for actor in movie_info.cast:
    print(f"  - {actor.name} as {actor.role}")

### 4.2 TypedDict (Simpler Alternative)

**Use when:**
- You don't need runtime validation
- You want a lighter-weight solution
- Performance is critical

In [None]:
from typing import TypedDict

class ProductInfo(TypedDict):
    """Product information."""
    name: str
    price: float
    category: str
    in_stock: bool

# Note: TypedDict support may vary by provider
# For maximum compatibility, use Pydantic models
print("TypedDict schema defined (use Pydantic for better support)")

### 4.3 JSON Schema (Maximum Control)

**Use when:**
- You need maximum control over the schema
- You're integrating with external systems
- You need cross-language compatibility

In [None]:
# JSON Schema example
json_schema = {
    "type": "object",
    "properties": {
        "sentiment": {
            "type": "string",
            "enum": ["positive", "negative", "neutral"],
            "description": "The sentiment of the text"
        },
        "confidence": {
            "type": "number",
            "minimum": 0,
            "maximum": 1,
            "description": "Confidence score between 0 and 1"
        }
    },
    "required": ["sentiment", "confidence"]
}

# Note: For JSON Schema, you typically use method="json_schema"
print("JSON Schema defined")

---

## 5. Real-World Use Cases

### 5.1 Sentiment Analysis

In [None]:
from typing import Literal

class SentimentAnalysis(BaseModel):
    """Sentiment analysis result."""
    sentiment: Literal["positive", "negative", "neutral"] = Field(
        description="Overall sentiment of the text"
    )
    confidence: float = Field(
        description="Confidence score between 0 and 1",
        ge=0.0,
        le=1.0
    )
    key_phrases: List[str] = Field(
        description="Key phrases that influenced the sentiment"
    )
    emotions: List[str] = Field(
        description="Detected emotions (e.g., joy, anger, sadness)"
    )

# Create sentiment analyzer
sentiment_analyzer = llm.with_structured_output(SentimentAnalysis)

# Analyze different texts
texts = [
    "I absolutely love this product! It exceeded all my expectations and the customer service was amazing!",
    "This is the worst experience I've ever had. Complete waste of money and time.",
    "The product is okay. Nothing special, but it does what it's supposed to do."
]

print("Sentiment Analysis Results:\n")
for i, text in enumerate(texts, 1):
    result = sentiment_analyzer.invoke(f"Analyze the sentiment: {text}")
    print(f"Text {i}: {text[:50]}...")
    print(f"  Sentiment: {result.sentiment.upper()}")
    print(f"  Confidence: {result.confidence:.2%}")
    print(f"  Emotions: {', '.join(result.emotions)}")
    print(f"  Key Phrases: {', '.join(result.key_phrases)}")
    print()

### 5.2 Data Extraction from Unstructured Text

In [None]:
class ContactInformation(BaseModel):
    """Contact information extracted from text."""
    full_name: str = Field(description="Person's full name")
    phone: Optional[str] = Field(default=None, description="Phone number")
    email: Optional[str] = Field(default=None, description="Email address")
    company: Optional[str] = Field(default=None, description="Company name")
    job_title: Optional[str] = Field(default=None, description="Job title")
    address: Optional[str] = Field(default=None, description="Physical address")

# Create contact extractor
contact_extractor = llm.with_structured_output(ContactInformation)

# Extract from business card text
business_card = """
Dr. Sarah Johnson
Chief Technology Officer
TechCorp Solutions Inc.
Email: sarah.johnson@techcorp.com
Phone: +1 (555) 123-4567
123 Innovation Drive, Silicon Valley, CA 94025
"""

contact = contact_extractor.invoke(f"Extract contact information from: {business_card}")

print("Extracted Contact Information:")
print(f"Name: {contact.full_name}")
print(f"Title: {contact.job_title}")
print(f"Company: {contact.company}")
print(f"Email: {contact.email}")
print(f"Phone: {contact.phone}")
print(f"Address: {contact.address}")

### 5.3 Content Classification and Tagging

In [None]:
class ArticleClassification(BaseModel):
    """Article classification and metadata."""
    title: str = Field(description="Suggested article title")
    category: Literal["Technology", "Business", "Science", "Health", "Entertainment", "Sports", "Politics"] = Field(
        description="Primary category"
    )
    tags: List[str] = Field(description="Relevant tags (3-5 tags)")
    summary: str = Field(description="Brief summary in 1-2 sentences")
    reading_time_minutes: int = Field(description="Estimated reading time in minutes")
    target_audience: str = Field(description="Target audience description")

# Create article classifier
article_classifier = llm.with_structured_output(ArticleClassification)

# Classify an article
article_text = """
Artificial Intelligence is revolutionizing healthcare by enabling early disease detection, 
personalized treatment plans, and drug discovery. Machine learning algorithms can now analyze 
medical images with accuracy comparable to expert radiologists. Recent studies show that AI-powered 
diagnostic tools have reduced diagnosis time by 40% while improving accuracy. Major hospitals are 
implementing AI systems to predict patient deterioration and optimize resource allocation. However, 
challenges remain in data privacy, algorithmic bias, and regulatory approval.
"""

classification = article_classifier.invoke(f"Classify this article: {article_text}")

print("Article Classification:")
print(f"Title: {classification.title}")
print(f"Category: {classification.category}")
print(f"Tags: {', '.join(classification.tags)}")
print(f"Summary: {classification.summary}")
print(f"Reading Time: {classification.reading_time_minutes} minutes")
print(f"Target Audience: {classification.target_audience}")

### 5.4 E-commerce Product Analysis

In [None]:
class ProductReview(BaseModel):
    """Structured product review analysis."""
    overall_rating: int = Field(description="Overall rating from 1-5 stars", ge=1, le=5)
    pros: List[str] = Field(description="Positive aspects mentioned")
    cons: List[str] = Field(description="Negative aspects mentioned")
    would_recommend: bool = Field(description="Whether the reviewer would recommend the product")
    quality_score: int = Field(description="Product quality score 1-10", ge=1, le=10)
    value_for_money: int = Field(description="Value for money score 1-10", ge=1, le=10)
    key_features: List[str] = Field(description="Key features highlighted")

# Create review analyzer
review_analyzer = llm.with_structured_output(ProductReview)

# Analyze a product review
review_text = """
I've been using this laptop for 3 months now and I'm really impressed! The battery life is 
exceptional - easily lasts 12 hours on a single charge. The display is crisp and vibrant, 
perfect for photo editing. The build quality feels premium with the aluminum chassis. 
However, it does get a bit warm during intensive tasks, and the price is quite steep. 
The keyboard is comfortable for long typing sessions. Overall, despite the high price, 
I would definitely recommend this to professionals who need reliability and performance.
"""

review_analysis = review_analyzer.invoke(f"Analyze this product review: {review_text}")

print("Product Review Analysis:")
print(f"Overall Rating: {'‚≠ê' * review_analysis.overall_rating}")
print(f"Quality Score: {review_analysis.quality_score}/10")
print(f"Value for Money: {review_analysis.value_for_money}/10")
print(f"Would Recommend: {'Yes ‚úÖ' if review_analysis.would_recommend else 'No ‚ùå'}")
print("\nPros:")
for pro in review_analysis.pros:
    print(f"  ‚úì {pro}")
print("\nCons:")
for con in review_analysis.cons:
    print(f"  ‚úó {con}")
print("\nKey Features:")
for feature in review_analysis.key_features:
    print(f"  ‚Ä¢ {feature}")

### 5.5 Resume/CV Parsing

In [None]:
class Education(BaseModel):
    """Education details."""
    degree: str = Field(description="Degree name")
    institution: str = Field(description="Educational institution")
    year: Optional[int] = Field(default=None, description="Graduation year")
    field: Optional[str] = Field(default=None, description="Field of study")

class WorkExperience(BaseModel):
    """Work experience details."""
    job_title: str = Field(description="Job title")
    company: str = Field(description="Company name")
    duration: str = Field(description="Duration of employment")
    responsibilities: List[str] = Field(description="Key responsibilities")

class ResumeData(BaseModel):
    """Structured resume data."""
    name: str = Field(description="Candidate's full name")
    email: Optional[str] = Field(default=None, description="Email address")
    phone: Optional[str] = Field(default=None, description="Phone number")
    skills: List[str] = Field(description="Technical and soft skills")
    education: List[Education] = Field(description="Educational background")
    experience: List[WorkExperience] = Field(description="Work experience")
    summary: str = Field(description="Professional summary")

# Create resume parser
resume_parser = llm.with_structured_output(ResumeData)

# Parse a resume
resume_text = """
ALEX MARTINEZ
Email: alex.martinez@email.com | Phone: (555) 987-6543

PROFESSIONAL SUMMARY
Senior Software Engineer with 8+ years of experience in full-stack development, 
specializing in cloud-native applications and microservices architecture.

SKILLS
Python, JavaScript, React, Node.js, AWS, Docker, Kubernetes, PostgreSQL, MongoDB, CI/CD

EXPERIENCE
Senior Software Engineer - CloudTech Inc. (2020-Present)
- Led development of microservices architecture serving 1M+ users
- Implemented CI/CD pipelines reducing deployment time by 60%
- Mentored team of 5 junior developers

Software Engineer - StartupXYZ (2016-2020)
- Built RESTful APIs using Node.js and Express
- Developed responsive web applications with React
- Optimized database queries improving performance by 40%

EDUCATION
Bachelor of Science in Computer Science - Tech University (2016)
Master of Science in Software Engineering - Innovation Institute (2018)
"""

resume_data = resume_parser.invoke(f"Parse this resume: {resume_text}")

print("Parsed Resume Data:")
print(f"\nName: {resume_data.name}")
print(f"Email: {resume_data.email}")
print(f"Phone: {resume_data.phone}")
print(f"\nSummary: {resume_data.summary}")
print(f"\nSkills: {', '.join(resume_data.skills)}")
print("\nEducation:")
for edu in resume_data.education:
    print(f"  ‚Ä¢ {edu.degree} in {edu.field} - {edu.institution} ({edu.year})")
print("\nExperience:")
for exp in resume_data.experience:
    print(f"  ‚Ä¢ {exp.job_title} at {exp.company} ({exp.duration})")
    for resp in exp.responsibilities:
        print(f"    - {resp}")

---

## 6. Advanced Features

### 6.1 Including Raw Response

Sometimes you need both the structured output AND the raw AI message (for metadata like token counts).

In [None]:
class SimpleQuery(BaseModel):
    """A simple query response."""
    answer: str = Field(description="The answer to the question")
    confidence: float = Field(description="Confidence level 0-1")

# Create model with include_raw=True
model_with_raw = llm.with_structured_output(SimpleQuery, include_raw=True)

# Invoke and get both structured output and raw message
result = model_with_raw.invoke("What is the capital of France?")

print("Structured Output:")
print(f"  Answer: {result['parsed'].answer}")
print(f"  Confidence: {result['parsed'].confidence}")
print("\nRaw Message Metadata:")
print(f"  Type: {type(result['raw'])}")
print(f"  Content: {result['raw'].content[:100]}...")
if hasattr(result['raw'], 'usage_metadata'):
    print(f"  Usage: {result['raw'].usage_metadata}")

### 6.2 Different Methods for Structured Output

OpenAI and other providers support different methods:
- `json_schema`: Uses dedicated structured output features (recommended)
- `function_calling`: Derives structured output via tool calls
- `json_mode`: Generates valid JSON (schema must be in prompt)

In [None]:
class TaskPriority(BaseModel):
    """Task priority classification."""
    task: str = Field(description="The task description")
    priority: Literal["high", "medium", "low"] = Field(description="Priority level")
    urgency: bool = Field(description="Whether the task is urgent")

# Using json_schema method (strict mode)
strict_model = llm.with_structured_output(
    TaskPriority,
    method="json_schema",
    strict=True  # Enforces strict schema adherence
)

task_result = strict_model.invoke(
    "I need to finish the quarterly report by tomorrow morning for the board meeting."
)

print("Task Priority Analysis:")
print(f"Task: {task_result.task}")
print(f"Priority: {task_result.priority.upper()}")
print(f"Urgent: {'Yes ‚ö†Ô∏è' if task_result.urgency else 'No'}")

### 6.3 Validation and Error Handling

In [None]:
from pydantic import validator, ValidationError

class ValidatedProduct(BaseModel):
    """Product with validation rules."""
    name: str = Field(description="Product name", min_length=3, max_length=100)
    price: float = Field(description="Price in USD", gt=0, lt=1000000)
    quantity: int = Field(description="Quantity in stock", ge=0)
    discount_percentage: Optional[float] = Field(
        default=0,
        description="Discount percentage",
        ge=0,
        le=100
    )
    
    @validator('price')
    def price_must_be_reasonable(cls, v):
        if v > 100000:
            raise ValueError('Price seems unreasonably high')
        return v

# Create validated model
validated_extractor = llm.with_structured_output(ValidatedProduct)

try:
    product = validated_extractor.invoke(
        "We have a laptop priced at $1299.99 with 50 units in stock and a 15% discount."
    )
    print("‚úÖ Validation Passed!")
    print(f"Product: {product.name}")
    print(f"Price: ${product.price}")
    print(f"Quantity: {product.quantity}")
    print(f"Discount: {product.discount_percentage}%")
except ValidationError as e:
    print("‚ùå Validation Failed:")
    print(e)

### 6.4 Complex Nested Structures

In [None]:
class Address(BaseModel):
    """Address information."""
    street: str
    city: str
    state: str
    zip_code: str
    country: str = "USA"

class PaymentMethod(BaseModel):
    """Payment method details."""
    type: Literal["credit_card", "debit_card", "paypal", "bank_transfer"]
    last_four: Optional[str] = Field(default=None, description="Last 4 digits")

class OrderItem(BaseModel):
    """Individual order item."""
    product_name: str
    quantity: int
    unit_price: float
    total_price: float

class Order(BaseModel):
    """Complete order information."""
    order_id: str
    customer_name: str
    customer_email: str
    shipping_address: Address
    billing_address: Optional[Address] = None
    items: List[OrderItem]
    payment_method: PaymentMethod
    subtotal: float
    tax: float
    shipping_cost: float
    total: float
    order_date: str

# Create order parser
order_parser = llm.with_structured_output(Order)

# Parse complex order information
order_text = """
Order #ORD-2024-001234 placed on January 31, 2024
Customer: Jane Smith (jane.smith@email.com)

Shipping Address:
456 Oak Avenue, Portland, OR 97201, USA

Items:
1. Wireless Headphones - Quantity: 2, Price: $79.99 each, Total: $159.98
2. USB-C Cable - Quantity: 3, Price: $12.99 each, Total: $38.97

Payment: Credit Card ending in 4532

Subtotal: $198.95
Tax: $17.91
Shipping: $8.99
Total: $225.85
"""

order = order_parser.invoke(f"Parse this order: {order_text}")

print("Order Details:")
print(f"Order ID: {order.order_id}")
print(f"Date: {order.order_date}")
print(f"Customer: {order.customer_name} ({order.customer_email})")
print(f"\nShipping Address:")
print(f"  {order.shipping_address.street}")
print(f"  {order.shipping_address.city}, {order.shipping_address.state} {order.shipping_address.zip_code}")
print(f"\nItems:")
for item in order.items:
    print(f"  ‚Ä¢ {item.product_name}: {item.quantity} x ${item.unit_price} = ${item.total_price}")
print(f"\nPayment: {order.payment_method.type.replace('_', ' ').title()} ending in {order.payment_method.last_four}")
print(f"\nSubtotal: ${order.subtotal}")
print(f"Tax: ${order.tax}")
print(f"Shipping: ${order.shipping_cost}")
print(f"Total: ${order.total}")

---

## 7. Best Practices

### 7.1 Clear Field Descriptions

In [None]:
# ‚ùå Bad: Unclear descriptions
class BadExample(BaseModel):
    data: str
    value: int

# ‚úÖ Good: Clear, detailed descriptions
class GoodExample(BaseModel):
    """Analysis result with clear field descriptions."""
    analysis_summary: str = Field(
        description="A concise summary of the analysis in 1-2 sentences"
    )
    confidence_score: int = Field(
        description="Confidence level from 1-100, where 100 is highest confidence",
        ge=1,
        le=100
    )

print("‚úÖ Always use clear, detailed field descriptions!")

### 7.2 Use Appropriate Types

In [None]:
from enum import Enum

class Priority(str, Enum):
    """Priority levels."""
    CRITICAL = "critical"
    HIGH = "high"
    MEDIUM = "medium"
    LOW = "low"

class WellTypedModel(BaseModel):
    """Example of well-typed model."""
    # Use Literal for fixed choices
    status: Literal["pending", "approved", "rejected"]
    
    # Use Enum for reusable choices
    priority: Priority
    
    # Use Optional for nullable fields
    notes: Optional[str] = None
    
    # Use List for arrays
    tags: List[str]
    
    # Use constraints for validation
    score: int = Field(ge=0, le=100)

print("‚úÖ Use appropriate types and constraints!")

### 7.3 Temperature Settings

In [None]:
# For structured output, use lower temperature for consistency
consistent_model = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0  # More deterministic
)

# For creative structured output, use higher temperature
creative_model = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.7  # More creative
)

print("‚úÖ Adjust temperature based on your use case!")
print("   - Low (0-0.3): Data extraction, classification")
print("   - Medium (0.4-0.7): Content generation, analysis")
print("   - High (0.8-1.0): Creative tasks")

---

## 8. Comparison: Before vs After Structured Output

### Before: Manual Parsing

In [None]:
import re
import json

# ‚ùå Old way: Manual parsing (error-prone)
basic_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

response = basic_llm.invoke(
    "Extract the name, age, and email from: John Doe is 30 years old, email: john@example.com. "
    "Return as JSON."
)

print("Raw Response (needs parsing):")
print(response.content)
print("\nType:", type(response.content))

# Manual parsing required
try:
    # Try to extract JSON from markdown code block
    json_match = re.search(r'```json\s*(.+?)\s*```', response.content, re.DOTALL)
    if json_match:
        data = json.loads(json_match.group(1))
    else:
        data = json.loads(response.content)
    print("\n‚úÖ Successfully parsed (but fragile!)")
    print(data)
except Exception as e:
    print(f"\n‚ùå Parsing failed: {e}")

### After: Structured Output

In [None]:
# ‚úÖ New way: Structured output (reliable)
class PersonInfo(BaseModel):
    """Person information."""
    name: str
    age: int
    email: str

structured_llm = basic_llm.with_structured_output(PersonInfo)

result = structured_llm.invoke(
    "Extract the name, age, and email from: John Doe is 30 years old, email: john@example.com"
)

print("Structured Response (ready to use):")
print(f"Name: {result.name}")
print(f"Age: {result.age}")
print(f"Email: {result.email}")
print("\nType:", type(result))
print("\n‚úÖ No parsing needed! Direct access to validated data!")

---

## 9. Common Pitfalls and Solutions

### Pitfall 1: Missing Field Descriptions

In [None]:
# ‚ùå Without descriptions - LLM might misunderstand
class PoorSchema(BaseModel):
    data: str
    value: int

# ‚úÖ With descriptions - Clear expectations
class GoodSchema(BaseModel):
    """Well-documented schema."""
    data: str = Field(description="The main content or message")
    value: int = Field(description="Numeric score from 1-100")

print("‚úÖ Always provide clear field descriptions!")

### Pitfall 2: Overly Complex Schemas

In [None]:
# ‚ùå Too complex - harder for LLM to fill correctly
class OverlyComplex(BaseModel):
    field1: str
    field2: int
    field3: List[str]
    field4: Optional[float]
    field5: dict
    field6: List[dict]
    field7: Optional[List[Optional[str]]]
    # ... 20 more fields

# ‚úÖ Break into smaller, focused schemas
class FocusedSchema(BaseModel):
    """Focused on specific task."""
    title: str = Field(description="Document title")
    summary: str = Field(description="Brief summary")
    tags: List[str] = Field(description="Relevant tags")

print("‚úÖ Keep schemas focused and manageable!")

### Pitfall 3: Not Handling Optional Fields

In [None]:
# ‚úÖ Properly handle optional fields
class ProperOptionals(BaseModel):
    """Schema with proper optional handling."""
    required_field: str = Field(description="This must be present")
    optional_field: Optional[str] = Field(
        default=None,
        description="This may or may not be present"
    )
    field_with_default: str = Field(
        default="default_value",
        description="This has a default if not provided"
    )

print("‚úÖ Use Optional and defaults appropriately!")

---

## 10. Performance Considerations

In [None]:
import time

class QuickExtraction(BaseModel):
    """Simple extraction for performance testing."""
    category: str
    sentiment: Literal["positive", "negative", "neutral"]

# Test performance
quick_model = ChatOpenAI(model="gpt-4o-mini", temperature=0).with_structured_output(QuickExtraction)

test_texts = [
    "This product is amazing!",
    "Terrible experience, would not recommend.",
    "It's okay, nothing special."
]

start_time = time.time()
results = []
for text in test_texts:
    result = quick_model.invoke(f"Categorize and analyze sentiment: {text}")
    results.append(result)
end_time = time.time()

print(f"Processed {len(test_texts)} texts in {end_time - start_time:.2f} seconds")
print(f"Average: {(end_time - start_time) / len(test_texts):.2f} seconds per text")
print("\nResults:")
for i, (text, result) in enumerate(zip(test_texts, results), 1):
    print(f"{i}. {result.sentiment.upper()} - {result.category}")

---

## 11. Summary and Key Takeaways

### What We Learned:

1. **Structured Output Benefits:**
   - Predictable, validated responses
   - No manual parsing required
   - Type safety and automatic validation
   - Easy integration with applications

2. **Schema Types:**
   - **Pydantic Models**: Best for most use cases (validation, nested structures)
   - **TypedDict**: Lighter weight, less validation
   - **JSON Schema**: Maximum control and interoperability

3. **Best Practices:**
   - Always provide clear field descriptions
   - Use appropriate types and constraints
   - Keep schemas focused and manageable
   - Use lower temperature for consistency
   - Handle optional fields properly

4. **Common Use Cases:**
   - Data extraction from unstructured text
   - Sentiment analysis and classification
   - Content tagging and categorization
   - Resume/CV parsing
   - Product review analysis
   - Order processing

5. **Methods:**
   - `json_schema`: Recommended for OpenAI (strict mode)
   - `function_calling`: Alternative method
   - `include_raw=True`: Get both parsed and raw responses

### Next Steps:

- Experiment with different schema designs
- Try structured output in your own applications
- Combine with other LangChain features (chains, agents)
- Explore provider-specific features

---

## 12. Additional Resources

- [LangChain Structured Output Documentation](https://docs.langchain.com/oss/python/langchain/structured-output)
- [Pydantic Documentation](https://docs.pydantic.dev/)
- [OpenAI Structured Output Guide](https://platform.openai.com/docs/guides/structured-outputs)
- [LangChain Models Documentation](https://docs.langchain.com/oss/python/langchain/models)

---

**Happy Coding! üöÄ**