# Phase C: IA Générative et Agents

## Objectifs
1. **Zero-shot / Few-shot Classification** avec un LLM
2. **Aspect-Based Sentiment Analysis (ABSA)** avec LangChain

Ce notebook utilise des modèles de langage pour analyser les avis sans entraînement spécifique.

In [None]:
import pandas as pd
import json
from tqdm import tqdm

# Load sample of reviews
df = pd.read_parquet("../Data/prepared_reviews.parquet")
# Small sample for LLM testing (API calls are expensive/slow)
sample = df.sample(20, random_state=42)
print(f"Testing on {len(sample)} reviews")
sample.head(3)

## 1. Zero-shot Classification avec Transformers (Local)

Utilisation de `pipeline` de HuggingFace pour la classification zero-shot.

In [None]:
from transformers import pipeline

# Zero-shot classification pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli", device=-1)

candidate_labels = ["positive", "negative", "neutral"]

In [None]:
# Test on sample
results = []
for idx, row in tqdm(sample.iterrows(), total=len(sample), desc="Zero-shot"):
    text = row['text'][:512]  # Truncate for speed
    result = classifier(text, candidate_labels)
    predicted = result['labels'][0]
    results.append({
        'text': text[:100] + '...',
        'actual': row['polarity'],
        'predicted': predicted,
        'confidence': result['scores'][0]
    })

results_df = pd.DataFrame(results)
print(results_df)

In [None]:
# Accuracy
from sklearn.metrics import accuracy_score, classification_report

accuracy = accuracy_score(results_df['actual'], results_df['predicted'])
print(f"Zero-shot Accuracy: {accuracy:.4f}")
print(classification_report(results_df['actual'], results_df['predicted']))

## 2. Few-shot Classification (avec exemples)

On donne quelques exemples au modèle pour améliorer ses prédictions.

In [None]:
# Few-shot examples
few_shot_examples = """
Examples:
- "The food was amazing and service was great!" -> positive
- "Terrible experience. Never coming back." -> negative
- "It was okay, nothing special." -> neutral

Now classify this review:
"""

def few_shot_classify(text):
    prompt = few_shot_examples + f'"{text[:200]}" -> '
    result = classifier(prompt, candidate_labels)
    return result['labels'][0], result['scores'][0]

# Test
test_text = sample.iloc[0]['text']
pred, conf = few_shot_classify(test_text)
print(f"Prediction: {pred} (confidence: {conf:.2f})")
print(f"Actual: {sample.iloc[0]['polarity']}")

## 3. Aspect-Based Sentiment Analysis (ABSA)

Extraction des aspects et leur sentiment associé.

In [None]:
# Simple ABSA using keyword detection + sentiment
ASPECTS = ['food', 'service', 'price', 'atmosphere', 'location', 'cleanliness', 'staff', 'quality']

sentiment_classifier = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

def extract_aspects(text):
    text_lower = text.lower()
    found_aspects = []
    
    # Split into sentences
    sentences = text.split('.')
    
    for aspect in ASPECTS:
        for sentence in sentences:
            if aspect in sentence.lower():
                sentiment = sentiment_classifier(sentence[:512])[0]
                found_aspects.append({
                    'aspect': aspect,
                    'sentiment': sentiment['label'],
                    'confidence': sentiment['score'],
                    'context': sentence.strip()[:100]
                })
                break  # One mention per aspect
    
    return found_aspects

In [None]:
# Test ABSA on samples
print("=== Aspect-Based Sentiment Analysis ===")
for idx in range(min(5, len(sample))):
    text = sample.iloc[idx]['text']
    print(f"\n--- Review {idx+1} (Stars: {sample.iloc[idx]['stars']}) ---")
    print(f"Text: {text[:200]}...")
    
    aspects = extract_aspects(text)
    if aspects:
        print("Aspects found:")
        for a in aspects:
            print(f"  - {a['aspect'].upper()}: {a['sentiment']} ({a['confidence']:.2f})")
    else:
        print("No specific aspects found.")

## 4. Structured Output (JSON)

In [None]:
def analyze_review(text):
    """Complete analysis of a review with structured output."""
    # Overall sentiment
    overall = classifier(text[:512], candidate_labels)
    
    # Aspects
    aspects = extract_aspects(text)
    
    return {
        'overall_sentiment': overall['labels'][0],
        'overall_confidence': float(overall['scores'][0]),
        'aspects': aspects,
        'text_preview': text[:200]
    }

# Demo
demo_review = sample.iloc[0]['text']
analysis = analyze_review(demo_review)
print(json.dumps(analysis, indent=2))

## 5. Summary & Comparison with Phase B

In [None]:
print("=== GenAI Phase Summary ===")
print(f"Zero-shot Accuracy (on {len(sample)} samples): {accuracy:.2%}")
print("\nAdvantages of GenAI approach:")
print("  - No training required")
print("  - Can extract aspects and explanations")
print("  - Flexible to new domains")
print("\nDisadvantages:")
print("  - Slower inference")
print("  - May be less accurate than fine-tuned models")
print("  - Requires larger models for best results")