# Batch example with more complex model

The code below gives a sense of Pydantic's  validation capabilities, including:

## Field-Level Validation
- **Field constraints with `Field()`** - Basic validation like `ge=0.0, le=1.0` for ranges, `ge=1` for minimums
- **Annotated field types** - Using `Annotated[float, Field(...)]` for inline type and constraint definition
- **Field descriptions** - Documentation via `description` parameter

## Type Validation  
- **Enum validation** - String enums like `SentimentLabel` and `ConfidenceLevel` for controlled vocabularies
- **Nested model validation** - `EmotionScores` model embedded within `SentimentAnalysis`
- **Optional fields** - `Optional[List[str]]` and `Optional[str]` with default values
- **List validation** - `List[str]` for collections with type checking

## Default Value Handling
- **Factory defaults** - `default_factory=list` and `default_factory=datetime.now` for mutable/dynamic defaults
- **Static defaults** - `default=None` for simple default values

## Custom Validation Logic
- **Model validators** - `@model_validator(mode='after')` for cross-field validation (emotion sum check)
- **Field validators** - `@field_validator` (not used in your example, but available)

## Computed Properties
- **Computed fields** - `@computed_field` with `@property` for derived values like `is_high_quality` and `quality_score`

## Advanced Features
- **Validation modes** - `mode='after'` for post-initialization validation
- **Custom error messages** - Descriptive `ValueError` messages in validators
- **Type coercion** - Automatic datetime creation and enum string conversion

In [None]:
from typing import List, Optional, Annotated
from enum import Enum
from datetime import datetime
from pydantic import BaseModel, Field, model_validator, field_validator, computed_field

# === NESTED MODELS ===
class SentimentLabel(str, Enum):
    """Sentiment classification options"""
    POSITIVE = "positive"
    NEGATIVE = "negative" 
    NEUTRAL = "neutral"

class ConfidenceLevel(str, Enum):
    """Confidence categories for easy filtering"""
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"

class EmotionScores(BaseModel):
    """Core emotion breakdown - must sum to 1.0"""
    positive: Annotated[float, Field(ge=0.0, le=1.0, description="Positive emotion score")]
    negative: Annotated[float, Field(ge=0.0, le=1.0, description="Negative emotion score")]
    neutral: Annotated[float, Field(ge=0.0, le=1.0, description="Neutral emotion score")]

# === CORRECTED MODEL ===
class SentimentAnalysis(BaseModel):
    """Fixed sentiment analysis with proper Pydantic validation"""
    
    # === NESTED DATA ===
    emotion_breakdown: EmotionScores = Field(..., description="Emotion score distribution")
    sentiment: SentimentLabel = Field(..., description="Primary sentiment classification")
    
    # === CORE FIELDS ===
    confidence: Annotated[float, Field(ge=0.0, le=1.0, description="Model confidence (0-1)")]
    confidence_level: ConfidenceLevel = Field(..., description="Categorical confidence for filtering")

    # === TEXT METRICS ===
    text_length: Annotated[int, Field(ge=1, le=10000, description="Character count")]
    word_count: Annotated[int, Field(ge=1, le=1000, description="Word count")]
    
    # === OPTIONAL FIELDS ===
    key_phrases: Optional[List[str]] = Field(default_factory=list, description="Top sentiment-bearing phrases")
    reasoning: Optional[str] = Field(default=None, description="Brief explanation of classification")
    
    # === METADATA ===
    analyzed_at: datetime = Field(default_factory=datetime.now)
    
    # === COMPUTED FIELDS ===
    @computed_field
    @property
    def is_high_quality(self) -> bool:
        """Binary quality flag for filtering low-quality analyses"""
        return (
            self.confidence >= 0.6 and
            self.text_length >= 10 and
            self.word_count >= 3
        )
    
    @computed_field
    @property
    def quality_score(self) -> float:
        """Weighted quality score for ranking/sorting"""
        confidence_score = self.confidence * 0.6
        length_score = min(self.text_length / 100, 1) * 0.2
        word_score = min(self.word_count / 20, 1) * 0.2
        return round(confidence_score + length_score + word_score, 3)
    
    @model_validator(mode='after')
    def validate_emotion_sum(self):
        """Ensure emotion scores sum to approximately 1.0"""
        total = self.emotion_breakdown.positive + self.emotion_breakdown.negative + self.emotion_breakdown.neutral
        if not (0.95 <= total <= 1.05):
            raise ValueError(f"Emotion scores must sum to ~1.0, got {total:.3f}")
        return self


In [None]:
# Test data

test_reviews = [
    # === CLEARLY POSITIVE ===
    {
        "text": "This product is absolutely amazing! Love everything about it.",
        "expected_sentiment": "positive",
        "category": "clear_positive"
    },
    {
        "text": "Outstanding quality and incredible value. Highly recommend to everyone!",
        "expected_sentiment": "positive", 
        "category": "clear_positive"
    },
    {
        "text": "Perfect! Exceeded all my expectations. Five stars!",
        "expected_sentiment": "positive",
        "category": "clear_positive"
    },
    
    # === CLEARLY NEGATIVE ===
    {
        "text": "Terrible product. Complete waste of money. Very disappointed.",
        "expected_sentiment": "negative",
        "category": "clear_negative"
    },
    {
        "text": "Worst purchase ever. Poor quality and awful customer service.",
        "expected_sentiment": "negative",
        "category": "clear_negative"
    },
    {
        "text": "Broken on arrival. Cheap materials. Would not recommend.",
        "expected_sentiment": "negative",
        "category": "clear_negative"
    },
    
    # === NEUTRAL/FACTUAL ===
    {
        "text": "The product arrived on time. Standard packaging. Works as described.",
        "expected_sentiment": "neutral",
        "category": "neutral_factual"
    },
    {
        "text": "Average quality. Nothing special but does the job. Standard price.",
        "expected_sentiment": "neutral",
        "category": "neutral_factual"
    },
    {
        "text": "This is a blue widget that measures 5 inches. Came with instructions.",
        "expected_sentiment": "neutral",
        "category": "neutral_factual"
    },
    
    # === MIXED SENTIMENT ===
    {
        "text": "Great design and features, but the price is way too high for what you get.",
        "expected_sentiment": "neutral",  # Mixed leans neutral
        "category": "mixed_sentiment"
    },
    {
        "text": "Love the color and style, but it broke after just two weeks of use.",
        "expected_sentiment": "negative",  # Negative outcome dominates
        "category": "mixed_sentiment"
    },
    {
        "text": "Customer service was helpful, though the product itself is mediocre.",
        "expected_sentiment": "neutral",
        "category": "mixed_sentiment"
    },
    
    # === EDGE CASES FOR VALIDATION TESTING ===
    {
        "text": "Ok.",  # Very short
        "expected_sentiment": "neutral",
        "category": "edge_case_short"
    },
    {
        "text": "This product has some really great features that I absolutely love, and the design is stunning, but unfortunately the build quality seems questionable and I'm worried it might not last very long, which is disappointing given the premium price point they're charging for it.",  # Very long
        "expected_sentiment": "neutral",
        "category": "edge_case_long"
    },
    {
        "text": "Good good good good good.",  # Repetitive
        "expected_sentiment": "positive",
        "category": "edge_case_repetitive"
    },
    
    # === SARCASM/IRONY DETECTION ===
    {
        "text": "Oh great, another broken product. Just what I needed today.",
        "expected_sentiment": "negative",
        "category": "sarcasm_irony"
    },
    {
        "text": "Sure, waiting 3 weeks for shipping was totally worth it.",
        "expected_sentiment": "negative",
        "category": "sarcasm_irony"
    },
    
    # === CONTEXT DEPENDENT ===
    {
        "text": "It's fine for the price. You get what you pay for.",
        "expected_sentiment": "neutral",
        "category": "context_dependent"
    },
    {
        "text": "Not bad considering it was free with my order.",
        "expected_sentiment": "neutral",
        "category": "context_dependent"
    },
    
    # === EMOTIONAL INTENSITY VARIATIONS ===
    {
        "text": "I like it.",  # Low intensity positive
        "expected_sentiment": "positive",
        "category": "low_intensity"
    },
    {
        "text": "It's okay I guess.",  # Low intensity neutral
        "expected_sentiment": "neutral", 
        "category": "low_intensity"
    },
    {
        "text": "ABSOLUTELY INCREDIBLE! BEST PURCHASE EVER!!!",  # High intensity positive
        "expected_sentiment": "positive",
        "category": "high_intensity"
    },
    {
        "text": "HORRIBLE! COMPLETELY USELESS! AVOID AT ALL COSTS!",  # High intensity negative
        "expected_sentiment": "negative",
        "category": "high_intensity"
    },
    
    # === SPECIFIC PRODUCT DOMAINS ===
    {
        "text": "The camera quality is excellent and battery life is impressive.",
        "expected_sentiment": "positive",
        "category": "tech_product"
    },
    {
        "text": "Food arrived cold and the portions were tiny. Service was slow.",
        "expected_sentiment": "negative",
        "category": "restaurant_review"
    },
    {
        "text": "The hotel room was clean and the staff was friendly.",
        "expected_sentiment": "positive",
        "category": "hotel_review"
    },
    
    # === COMPARATIVE STATEMENTS ===
    {
        "text": "Much better than my previous purchase, though still not perfect.",
        "expected_sentiment": "positive",
        "category": "comparative"
    },
    {
        "text": "Worse than expected but better than nothing.",
        "expected_sentiment": "neutral",
        "category": "comparative"
    }
]

df = pd.DataFrame(test_reviews)

display(df)

In [None]:
""" 
Notice for nested models like the EmotionScores model we flatten the model to return a flat dataframe.
Example: create columns for each nested field: emotion_breakdown.positive, emotion_breakdown.negative, etc. 
"""

results = await processor.process_dataframe(df, 'text', response_model=SentimentAnalysis)
display(results)