# Phase C: IA Générative et Agents

## Objectifs
1. **Zero-shot / Few-shot Classification** avec un LLM
2. **Aspect-Based Sentiment Analysis (ABSA)** avec LangChain

Ce notebook utilise des modèles de langage pour analyser les avis sans entraînement spécifique.

In [1]:
import pandas as pd
import json
from tqdm import tqdm

# Load sample of reviews
df = pd.read_parquet("../Data/prepared_reviews.parquet")
# Small sample for LLM testing (API calls are expensive/slow)
sample = df.sample(20, random_state=42)
print(f"Testing on {len(sample)} reviews")
sample.head(3)

Testing on 20 reviews


Unnamed: 0,text,stars,polarity,text_length
987231,I'm not quite sure what the hype is all about ...,1,negative,238
79954,Fleming's for a special occasion or just dinne...,5,positive,546
567130,We found this location clean and well-kept. I ...,4,positive,460


## 1. Zero-shot Classification avec Transformers (Local)

Utilisation de `pipeline` de HuggingFace pour la classification zero-shot.

In [2]:
from transformers import pipeline

# Zero-shot classification pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli", device=-1)

candidate_labels = ["positive", "negative", "neutral"]

config.json: 0.00B [00:00, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Loading weights:   0%|          | 0/515 [00:00<?, ?it/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]



merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

In [3]:
# Test on sample
results = []
for idx, row in tqdm(sample.iterrows(), total=len(sample), desc="Zero-shot"):
    text = row['text'][:512]  # Truncate for speed
    result = classifier(text, candidate_labels)
    predicted = result['labels'][0]
    results.append({
        'text': text[:100] + '...',
        'actual': row['polarity'],
        'predicted': predicted,
        'confidence': result['scores'][0]
    })

results_df = pd.DataFrame(results)
print(results_df)

Zero-shot:   0%|          | 0/20 [00:00<?, ?it/s]

Zero-shot:   5%|  | 1/20 [00:03<01:01,  3.24s/it]

Zero-shot:  10%|▏ | 2/20 [00:05<00:43,  2.44s/it]

Zero-shot:  15%|▎ | 3/20 [00:06<00:36,  2.12s/it]

Zero-shot:  20%|▍ | 4/20 [00:08<00:30,  1.90s/it]

Zero-shot:  25%|▌ | 5/20 [00:09<00:24,  1.60s/it]

Zero-shot:  30%|▌ | 6/20 [00:11<00:23,  1.65s/it]

Zero-shot:  35%|▋ | 7/20 [00:12<00:21,  1.65s/it]

Zero-shot:  40%|▊ | 8/20 [00:14<00:17,  1.48s/it]

Zero-shot:  45%|▉ | 9/20 [00:15<00:17,  1.58s/it]

Zero-shot:  50%|▌| 10/20 [00:17<00:15,  1.54s/it]

Zero-shot:  55%|▌| 11/20 [00:19<00:14,  1.61s/it]

Zero-shot:  60%|▌| 12/20 [00:20<00:13,  1.67s/it]

Zero-shot:  65%|▋| 13/20 [00:22<00:11,  1.70s/it]

Zero-shot:  70%|▋| 14/20 [00:24<00:10,  1.72s/it]

Zero-shot:  75%|▊| 15/20 [00:26<00:08,  1.76s/it]

Zero-shot:  80%|▊| 16/20 [00:27<00:06,  1.52s/it]

Zero-shot:  85%|▊| 17/20 [00:28<00:04,  1.45s/it]

Zero-shot:  90%|▉| 18/20 [00:30<00:03,  1.55s/it]

Zero-shot:  95%|▉| 19/20 [00:31<00:01,  1.52s/it]

Zero-shot: 100%|█| 20/20 [00:32<00:00,  1.40s/it]

Zero-shot: 100%|█| 20/20 [00:32<00:00,  1.64s/it]

                                                 text    actual predicted  \
0   I'm not quite sure what the hype is all about ...  negative  negative   
1   Fleming's for a special occasion or just dinne...  positive  positive   
2   We found this location clean and well-kept. I ...  positive  positive   
3   My husband has been coming here for a few year...  positive  positive   
4   Been here twice since they've opened. Was cert...  negative  negative   
5   I was pleasantly surprised during our first vi...  positive  positive   
6   We are very particular about our food . My hus...  positive  positive   
7   This is a halfway descent motel 6 but there's ...  negative  negative   
8   We tried looking for some brunch places that w...  positive  positive   
9   Came back to Tucson to visit family and I'm go...  positive  positive   
10  Terrible experience. My wife and I took our da...  negative  negative   
11  I am happy to report that I recently received ...  positive  positive   




In [4]:
# Accuracy
from sklearn.metrics import accuracy_score, classification_report

accuracy = accuracy_score(results_df['actual'], results_df['predicted'])
print(f"Zero-shot Accuracy: {accuracy:.4f}")
print(classification_report(results_df['actual'], results_df['predicted']))

Zero-shot Accuracy: 0.9000
              precision    recall  f1-score   support

    negative       0.80      0.80      0.80         5
    positive       0.93      0.93      0.93        15

    accuracy                           0.90        20
   macro avg       0.87      0.87      0.87        20
weighted avg       0.90      0.90      0.90        20



## 2. Few-shot Classification (avec exemples)

On donne quelques exemples au modèle pour améliorer ses prédictions.

In [5]:
# Few-shot examples
few_shot_examples = """
Examples:
- "The food was amazing and service was great!" -> positive
- "Terrible experience. Never coming back." -> negative
- "It was okay, nothing special." -> neutral

Now classify this review:
"""

def few_shot_classify(text):
    prompt = few_shot_examples + f'"{text[:200]}" -> '
    result = classifier(prompt, candidate_labels)
    return result['labels'][0], result['scores'][0]

# Test
test_text = sample.iloc[0]['text']
pred, conf = few_shot_classify(test_text)
print(f"Prediction: {pred} (confidence: {conf:.2f})")
print(f"Actual: {sample.iloc[0]['polarity']}")

Prediction: neutral (confidence: 0.44)
Actual: negative


## 3. Aspect-Based Sentiment Analysis (ABSA)

Extraction des aspects et leur sentiment associé.

In [6]:
# Simple ABSA using keyword detection + sentiment
ASPECTS = ['food', 'service', 'price', 'atmosphere', 'location', 'cleanliness', 'staff', 'quality']

sentiment_classifier = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

def extract_aspects(text):
    text_lower = text.lower()
    found_aspects = []
    
    # Split into sentences
    sentences = text.split('.')
    
    for aspect in ASPECTS:
        for sentence in sentences:
            if aspect in sentence.lower():
                sentiment = sentiment_classifier(sentence[:512])[0]
                found_aspects.append({
                    'aspect': aspect,
                    'sentiment': sentiment['label'],
                    'confidence': sentiment['score'],
                    'context': sentence.strip()[:100]
                })
                break  # One mention per aspect
    
    return found_aspects

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/104 [00:00<?, ?it/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [7]:
# Test ABSA on samples
print("=== Aspect-Based Sentiment Analysis ===")
for idx in range(min(5, len(sample))):
    text = sample.iloc[idx]['text']
    print(f"\n--- Review {idx+1} (Stars: {sample.iloc[idx]['stars']}) ---")
    print(f"Text: {text[:200]}...")
    
    aspects = extract_aspects(text)
    if aspects:
        print("Aspects found:")
        for a in aspects:
            print(f"  - {a['aspect'].upper()}: {a['sentiment']} ({a['confidence']:.2f})")
    else:
        print("No specific aspects found.")

=== Aspect-Based Sentiment Analysis ===

--- Review 1 (Stars: 1) ---
Text: I'm not quite sure what the hype is all about here. First off the infamous turtle soup was beyond salty. Service was not that great either. Our server seemed annoyed when we asked the history of the p...
Aspects found:
  - SERVICE: NEGATIVE (1.00)
  - PRICE: NEGATIVE (1.00)

--- Review 2 (Stars: 5) ---
Text: Fleming's for a special occasion or just dinner night out is unbeatable! I celebrated my graduation from college at Fleming's and they practically rolled out the red carpet! Spencer was our server and...


Aspects found:
  - FOOD: POSITIVE (1.00)
  - SERVICE: NEGATIVE (0.76)
  - STAFF: POSITIVE (1.00)

--- Review 3 (Stars: 4) ---
Text: We found this location clean and well-kept. I ordered my usual custom Smash Chicken and veggie frites. Shortly after ordering, our cashier approached to tell me veggie frites were no longer available-...
Aspects found:
  - FOOD: POSITIVE (1.00)
  - LOCATION: POSITIVE (1.00)

--- Review 4 (Stars: 5) ---
Text: My husband has been coming here for a few years and has always loved it. I had my last eye exam here, and I agree.  They were super nice and the exam was the most thorough I have ever had, and that's ...
No specific aspects found.

--- Review 5 (Stars: 1) ---
Text: Been here twice since they've opened. Was certainly not impressed either time. The food is "eh" & the service wasn't great either...


Aspects found:
  - FOOD: NEGATIVE (1.00)
  - SERVICE: NEGATIVE (1.00)


## 4. Structured Output (JSON)

In [8]:
def analyze_review(text):
    """Complete analysis of a review with structured output."""
    # Overall sentiment
    overall = classifier(text[:512], candidate_labels)
    
    # Aspects
    aspects = extract_aspects(text)
    
    return {
        'overall_sentiment': overall['labels'][0],
        'overall_confidence': float(overall['scores'][0]),
        'aspects': aspects,
        'text_preview': text[:200]
    }

# Demo
demo_review = sample.iloc[0]['text']
analysis = analyze_review(demo_review)
print(json.dumps(analysis, indent=2))

{
  "overall_sentiment": "negative",
  "overall_confidence": 0.976250171661377,
  "aspects": [
    {
      "aspect": "service",
      "sentiment": "NEGATIVE",
      "confidence": 0.9996477365493774,
      "context": "Service was not that great either"
    },
    {
      "aspect": "price",
      "sentiment": "NEGATIVE",
      "confidence": 0.9997197985649109,
      "context": "Over priced and under delivered"
    }
  ],
  "text_preview": "I'm not quite sure what the hype is all about here. First off the infamous turtle soup was beyond salty. Service was not that great either. Our server seemed annoyed when we asked the history of the p"
}


## 5. Summary & Comparison with Phase B

In [9]:
print("=== GenAI Phase Summary ===")
print(f"Zero-shot Accuracy (on {len(sample)} samples): {accuracy:.2%}")
print("\nAdvantages of GenAI approach:")
print("  - No training required")
print("  - Can extract aspects and explanations")
print("  - Flexible to new domains")
print("\nDisadvantages:")
print("  - Slower inference")
print("  - May be less accurate than fine-tuned models")
print("  - Requires larger models for best results")

=== GenAI Phase Summary ===
Zero-shot Accuracy (on 20 samples): 90.00%

Advantages of GenAI approach:
  - No training required
  - Can extract aspects and explanations
  - Flexible to new domains

Disadvantages:
  - Slower inference
  - May be less accurate than fine-tuned models
  - Requires larger models for best results
