# Phase C: IA Générative et Agents

## Objectifs
1. **Zero-shot / Few-shot Classification** avec un LLM
2. **Aspect-Based Sentiment Analysis (ABSA)** avec LangChain

Ce notebook utilise des modèles de langage pour analyser les avis sans entraînement spécifique.

In [9]:
import pandas as pd
import json
from tqdm import tqdm

df = pd.read_parquet("../Data/prepared_reviews.parquet")

sample = df.sample(20, random_state=42)
print(f"Testing on {len(sample)} reviews")
sample.head(3)

Testing on 20 reviews


Unnamed: 0,text,stars,polarity,text_length
987231,I'm not quite sure what the hype is all about ...,1,negative,238
79954,Fleming's for a special occasion or just dinne...,5,positive,546
567130,We found this location clean and well-kept. I ...,4,positive,460


## 1. Zero-shot Classification avec Transformers (Local)

Utilisation de `pipeline` de HuggingFace pour la classification zero-shot.

In [10]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli", device=-1)

candidate_labels = ["positive", "negative", "neutral"]

Loading weights:   0%|          | 0/515 [00:00<?, ?it/s]

In [11]:
results = []
for idx, row in tqdm(sample.iterrows(), total=len(sample), desc="Zero-shot"):
    text = row['text'][:512]  
    result = classifier(text, candidate_labels)
    predicted = result['labels'][0]
    results.append({
        'text': text[:100] + '...',
        'actual': row['polarity'],
        'predicted': predicted,
        'confidence': result['scores'][0]
    })

results_df = pd.DataFrame(results)
print(results_df)

Zero-shot: 100%|██████████| 20/20 [00:32<00:00,  1.61s/it]

                                                 text    actual predicted  \
0   I'm not quite sure what the hype is all about ...  negative  negative   
1   Fleming's for a special occasion or just dinne...  positive  positive   
2   We found this location clean and well-kept. I ...  positive  positive   
3   My husband has been coming here for a few year...  positive  positive   
4   Been here twice since they've opened. Was cert...  negative  negative   
5   I was pleasantly surprised during our first vi...  positive  positive   
6   We are very particular about our food . My hus...  positive  positive   
7   This is a halfway descent motel 6 but there's ...  negative  negative   
8   We tried looking for some brunch places that w...  positive  positive   
9   Came back to Tucson to visit family and I'm go...  positive  positive   
10  Terrible experience. My wife and I took our da...  negative  negative   
11  I am happy to report that I recently received ...  positive  positive   




In [12]:
from sklearn.metrics import accuracy_score, classification_report

accuracy = accuracy_score(results_df['actual'], results_df['predicted'])
print(f"Zero-shot Accuracy: {accuracy:.4f}")
print(classification_report(results_df['actual'], results_df['predicted']))

Zero-shot Accuracy: 0.9000
              precision    recall  f1-score   support

    negative       0.80      0.80      0.80         5
    positive       0.93      0.93      0.93        15

    accuracy                           0.90        20
   macro avg       0.87      0.87      0.87        20
weighted avg       0.90      0.90      0.90        20



## 2. Few-shot Classification (avec exemples)

On donne quelques exemples au modèle pour améliorer ses prédictions.

In [None]:
exemples = """
Examples:
- "The food was amazing and service was great!" -> positive
- "Terrible experience. Never coming back." -> negative
- "It was okay, nothing special." -> neutral

"""

def few_shot_classify(text):
    prompt = exemples + f'"{text[:200]}" -> '
    result = classifier(prompt, candidate_labels)
    return result['labels'][0], result['scores'][0]

# Test
test_text = sample.iloc[0]['text']
pred, conf = few_shot_classify(test_text)
print(f"Prediction: {pred} (confidence: {conf:.2f})")
print(f"Actual: {sample.iloc[0]['polarity']}")

Prediction: neutral (confidence: 0.44)
Actual: negative


## 3. Aspect-Based Sentiment Analysis (ABSA)

Extraction des aspects et leur sentiment associé.

In [None]:
ASPECTS = ['food', 'service', 'price', 'atmosphere', 'location', 'cleanliness', 'staff', 'quality']

sentiment_classifier = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

def extract_aspects(text):
    text_lower = text.lower()
    found_aspects = []
    
    sentences = text.split('.')
    
    for aspect in ASPECTS:
        for sentence in sentences:
            if aspect in sentence.lower():
                sentiment = sentiment_classifier(sentence[:512])[0]
                found_aspects.append({
                    'aspect': aspect,
                    'sentiment': sentiment['label'],
                    'confidence': sentiment['score'],
                    'context': sentence.strip()[:100]
                })
                break  
    
    return found_aspects

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/104 [00:00<?, ?it/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [15]:
print("Analyse des Sentiments")
for idx in range(min(5, len(sample))):
    text = sample.iloc[idx]['text']
    print(f"\nRevue {idx+1} (Stars: {sample.iloc[idx]['stars']}) ")
    print(f"Text: {text[:200]}...")
    
    aspects = extract_aspects(text)
    if aspects:
        print("Aspects constatés :")
        for a in aspects:
            print(f"  - {a['aspect'].upper()}: {a['sentiment']} ({a['confidence']:.2f})")
    else:
        print("Aucun aspect spécifique n'a été trouvé.")

Analyse des Sentiments

Revue 1 (Stars: 1) 
Text: I'm not quite sure what the hype is all about here. First off the infamous turtle soup was beyond salty. Service was not that great either. Our server seemed annoyed when we asked the history of the p...
Aspects constatés :
  - SERVICE: NEGATIVE (1.00)
  - PRICE: NEGATIVE (1.00)

Revue 2 (Stars: 5) 
Text: Fleming's for a special occasion or just dinner night out is unbeatable! I celebrated my graduation from college at Fleming's and they practically rolled out the red carpet! Spencer was our server and...
Aspects constatés :
  - FOOD: POSITIVE (1.00)
  - SERVICE: NEGATIVE (0.76)
  - STAFF: POSITIVE (1.00)

Revue 3 (Stars: 4) 
Text: We found this location clean and well-kept. I ordered my usual custom Smash Chicken and veggie frites. Shortly after ordering, our cashier approached to tell me veggie frites were no longer available-...
Aspects constatés :
  - FOOD: POSITIVE (1.00)
  - LOCATION: POSITIVE (1.00)

Revue 4 (Stars: 5) 
Text: My 

## 4. Structured Output (JSON)

In [16]:
def analyze_review(text):
    """Analyse complète d'une critique avec résultats structurés."""
    overall = classifier(text[:512], candidate_labels)
    
    aspects = extract_aspects(text)
    
    return {
        'overall_sentiment': overall['labels'][0],
        'overall_confidence': float(overall['scores'][0]),
        'aspects': aspects,
        'text_preview': text[:200]
    }

# Demo
demo_review = sample.iloc[0]['text']
analysis = analyze_review(demo_review)
print(json.dumps(analysis, indent=2))

{
  "overall_sentiment": "negative",
  "overall_confidence": 0.976250171661377,
  "aspects": [
    {
      "aspect": "service",
      "sentiment": "NEGATIVE",
      "confidence": 0.9996477365493774,
      "context": "Service was not that great either"
    },
    {
      "aspect": "price",
      "sentiment": "NEGATIVE",
      "confidence": 0.9997197985649109,
      "context": "Over priced and under delivered"
    }
  ],
  "text_preview": "I'm not quite sure what the hype is all about here. First off the infamous turtle soup was beyond salty. Service was not that great either. Our server seemed annoyed when we asked the history of the p"
}


## 5. Summary & Comparison with Phase B

In [None]:
print("Résumé de la phase GenAI ")
print(f"Précision zero-shot (sur {len(sample)} échantillons) : {accuracy:.2%}")
print("\nAvantages de l’approche GenAI :")
print("  - Aucun entraînement requis")
print("  - Peut extraire des aspects et fournir des explications")
print("  - Flexible pour de nouveaux domaines")
print("\nInconvénients :")
print("  - Inférence plus lente")
print("  - Peut être moins précise que des modèles entraînés spécifiquement")
print("  - Nécessite des modèles plus volumineux pour de meilleurs résultats")

=== Résumé de la phase GenAI ===
Précision zero-shot (sur 20 échantillons) : 90.00%

Avantages de l’approche GenAI :
  - Aucun entraînement requis
  - Peut extraire des aspects et fournir des explications
  - Flexible pour de nouveaux domaines

Inconvénients :
  - Inférence plus lente
  - Peut être moins précise que des modèles entraînés spécifiquement
  - Nécessite des modèles plus volumineux pour de meilleurs résultats
