<img src="https://toppng.com/uploads/preview/linkedin-logo-png-photo-116602552293wtc4qogql.png" width="20" height="20" /> [Bharath Hemachandran](https://www.linkedin.com/in/bharath-hemachandran/)

# üìä Traditional NLP vs Generative AI: Customer Sentiment

This notebook teaches the **same task**‚Äî**customer sentiment classification**‚Äîdone two ways: **traditional NLP** (classical ML) and **generative AI**. Run each cell in order. Text cells explain; code cells run step by step.

**Task:** Input = a short customer review. Output = **positive**, **negative**, or **neutral**.

<div style="background: #e8f5e9; padding: 14px; border-radius: 8px; border-left: 4px solid #4caf50;">
<strong>üéØ What you'll do:</strong> Train a small TF-IDF + classifier on 12 reviews, compare with Groq on in-distribution and challenge examples, and see when the generative model should abstain.
</div>

### üìã Notebook objective (table of contents)

This notebook covers:
- **Fairness note** ‚Äî Why the setup is intentionally uneven (traditional vs generative)
- **Setup** ‚Äî openai, scikit-learn; GROQ_API_KEY
- **Data** ‚Äî Training, test (in-distribution), challenge (beyond training), should-abstain inputs
- **Traditional NLP** ‚Äî TF-IDF + Logistic Regression pipeline; train on 12 reviews
- **Part 1** ‚Äî In-distribution: both approaches on similar-to-training reviews
- **Generative AI** ‚Äî Groq API with an instruction; no training for this task
- **Part 2** ‚Äî Beyond training: negation, sarcasm, slang; traditional vs generative
- **Part 3** ‚Äî Should abstain: empty/off-topic inputs; ideal behavior vs what happens
- **Summary** ‚Äî Positives/negatives of generative AI; interpretability, cost, privacy, etc.
- **Additional reading** ‚Äî Sentiment analysis and NLP resources


## ‚öñÔ∏è Fairness note

<div style="background: #fff3e0; padding: 12px; border-radius: 8px; border-left: 4px solid #ff9800;">
The setup is <strong>intentionally uneven</strong>: the traditional model is trained on <strong>12 examples</strong> here; the generative model is <strong>pre-trained on huge data</strong>. We are not claiming "traditional is worse"‚Äîwe are illustrating <strong>tradeoffs</strong> and <strong>strengths/weaknesses</strong> of each.
</div>

## üîß Setup (run once)

Install `openai` and `scikit-learn`. On **Google Colab**, run this cell first.

In [None]:
!pip install -q openai scikit-learn

### üîë Set your Groq API key

Get a free key at [console.groq.com](https://console.groq.com/keys). In Colab you can use **Secrets** (üîë) or run the cell below and paste when prompted.

In [None]:
import os
from getpass import getpass

if not os.environ.get("GROQ_API_KEY"):
    os.environ["GROQ_API_KEY"] = getpass("Paste your GROQ_API_KEY: ")

from openai import OpenAI

def get_groq_client():
    return OpenAI(
        api_key=os.environ["GROQ_API_KEY"],
        base_url="https://api.groq.com/openai/v1",
    )

print("Groq client ready (generative cells will use it).")

## üìÇ Data

We use three sets:
- **Training:** 12 labeled reviews (for the traditional model only).
- **Test (in-distribution):** Reviews similar to training; both models should do OK.
- **Challenge (beyond training):** Negation, sarcasm, slang‚Äîtraditional was never trained on these; generative can generalize.
- **Should abstain:** Inputs that are *not* reviews (empty, question, off-topic); ideal behavior = abstain, but both often output a label anyway.

In [None]:
REVIEWS_TRAIN = [
    ("The product is amazing, I love it!", "positive"),
    ("Great product, very happy with my purchase.", "positive"),
    ("Terrible experience, would not buy again.", "negative"),
    ("It's okay, nothing special.", "neutral"),
    ("Best purchase I've ever made. Highly recommend!", "positive"),
    ("Waste of money. Broke after one day.", "negative"),
    ("Complete waste of time and money.", "negative"),
    ("Does what it says. No complaints.", "neutral"),
    ("It works as described. Nothing more.", "neutral"),
    ("Excellent quality and fast shipping.", "positive"),
    ("Poor quality, very disappointed.", "negative"),
    ("It's fine for the price.", "neutral"),
    ("Absolutely love this! Five stars.", "positive"),
    ("Worst product ever. Avoid.", "negative"),
    ("Meets expectations. Average.", "neutral"),
]

REVIEWS_TEST = [
    "Great product, very happy with my purchase.",
    "Complete waste of time and money.",
    "It works as described. Nothing more.",
]

REVIEWS_CHALLENGE = [
    ("Not bad at all! Pleasantly surprised.", "positive"),
    ("Oh great, another broken product. Just what I needed.", "negative"),
    ("This slaps fr", "positive"),
    ("mid tbh wouldn't buy again", "negative"),
    ("It's okay I guess", "neutral"),
    ("Actually way better than I expected", "positive"),
    ("Meh.", "neutral"),
    ("Wouldn't say I'm not happy with it", "positive"),
]

INPUTS_SHOULD_ABSTAIN = [
    ("", "abstain"),
    ("Is this product good?", "abstain"),
    ("The weather is nice today.", "abstain"),
    ("asdf qwerty zxcv", "abstain"),
    ("1", "abstain"),
]

print(f"Training: {len(REVIEWS_TRAIN)} | Test: {len(REVIEWS_TEST)} | Challenge: {len(REVIEWS_CHALLENGE)} | Should abstain: {len(INPUTS_SHOULD_ABSTAIN)}")

## üìê Traditional NLP: TF-IDF + Logistic Regression

**Pipeline:** text ‚Üí tokenize + **TF-IDF** (hand-crafted features) ‚Üí **logistic regression** (trained on labels).

**Characteristics:** One model per task, needs labeled data, fast at inference, **cannot go beyond** what it was trained on (novel phrasing, sarcasm, slang often fail).

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

texts = [t for t, _ in REVIEWS_TRAIN]
labels = [label for _, label in REVIEWS_TRAIN]

pipe = Pipeline([
    ("tfidf", TfidfVectorizer(max_features=500, ngram_range=(1, 2))),
    ("clf", LogisticRegression(max_iter=500, random_state=42)),
])
pipe.fit(texts, labels)

def run_traditional(reviews):
    return list(pipe.predict(reviews))

print("Traditional model trained. Use run_traditional(reviews) to predict.")

## 1Ô∏è‚É£ Part 1: In-distribution (reviews similar to training)

Both approaches should get these right.

In [None]:
trad_test = run_traditional(REVIEWS_TEST)
print("Traditional (TF-IDF + LogReg):")
for review, pred in zip(REVIEWS_TEST, trad_test):
    print(f"  {review!r} ‚Üí {pred}")

## ü§ñ Generative AI: instruction + model, no training

**Pipeline:** prompt + review ‚Üí LLM (Groq) ‚Üí parse reply. No feature engineering, no training for this task. **Can go beyond training** (nuance, sarcasm, slang), but can also **generate when it shouldn't** (e.g. assign sentiment to empty or off-topic input).

In [None]:
def run_generative(reviews):
    client = get_groq_client()
    predictions = []
    for review in reviews:
        response = client.responses.create(
            model="llama-3.3-70b-versatile",
            input=(
                "Classify the sentiment of this customer review as exactly one "
                "word: positive, negative, or neutral. Reply with only that word.\n\n"
                f"Review: {review}"
            ),
            temperature=0,
            max_output_tokens=10,
        )
        out = (response.output_text or "").strip().lower()
        if "positive" in out:
            pred = "positive"
        elif "negative" in out:
            pred = "negative"
        else:
            pred = "neutral"
        predictions.append(pred)
    return predictions

print("run_generative(reviews) defined. Requires GROQ_API_KEY.")

In [None]:
gen_test = run_generative(REVIEWS_TEST)
print("Generative (Groq, no training):")
for review, pred in zip(REVIEWS_TEST, gen_test):
    print(f"  {review!r} ‚Üí {pred}")

## 2Ô∏è‚É£ Part 2: Beyond training (nuance, sarcasm, slang, negation)

These phrasings were **not** in the training set. Traditional often wrong (‚úó); generative usually right (‚úì). **Lesson:** Traditional cannot go beyond training; generative can.

In [None]:
challenge_texts = [t for t, _ in REVIEWS_CHALLENGE]
expected = [label for _, label in REVIEWS_CHALLENGE]
trad_challenge = run_traditional(challenge_texts)
gen_challenge = run_generative(challenge_texts)

print("Expected | Traditional | Generative\n")
for i, (text, exp) in enumerate(zip(challenge_texts, expected)):
    t_pred = trad_challenge[i]
    g_pred = gen_challenge[i]
    t_ok = "‚úì" if t_pred == exp else "‚úó"
    g_ok = "‚úì" if g_pred == exp else "‚úó"
    print(f"  {text!r}")
    print(f"    Exp: {exp} | Trad: {t_pred} {t_ok} | Gen: {g_pred} {g_ok}\n")

## 3Ô∏è‚É£ Part 3: When the model should NOT generate (ideal: abstain)

Input is **not** a review (empty, question, off-topic, gibberish). Ideal = abstain / "not applicable". Both often **output a label anyway**; generative in particular tends to **generate something even when it shouldn't**. **Lesson:** A negative of generative AI.

In [None]:
abstain_texts = [t for t, _ in INPUTS_SHOULD_ABSTAIN]
should_list = [s for _, s in INPUTS_SHOULD_ABSTAIN]
trad_abstain = run_traditional(abstain_texts)
gen_abstain = run_generative(abstain_texts)

print("Input (short) | Should | Traditional | Generative\n")
for i, text in enumerate(abstain_texts):
    snippet = (text[:20] + "‚Ä¶") if len(text) > 20 else (text or "(empty)")
    print(f"  {snippet!r} | {should_list[i]} | {trad_abstain[i]} | {gen_abstain[i]}")

## üìã Summary: Positives and negatives of generative AI

**Positives:** Can go beyond training (nuance, sarcasm, slang); one model, many tasks; no training for this task.

**Negatives:** Generates something even when it shouldn't (e.g. empty/off-topic ‚Üí still outputs a label); hallucination; cost and latency; non-determinism.

**More lessons:**
- **Interpretability:** Traditional = explainable (weights); generative = black box.
- **Consistency:** Traditional = same input ‚Üí same output; generative = can vary.
- **Cost at scale:** Traditional = cheap inference; generative = pay per token.
- **Privacy:** Traditional = data stays local; generative = text sent to API.
- **Latency:** Traditional = ms; generative = network + generation.
- **Task boundaries:** Traditional = one task; generative = flexible but softer fence.

In [None]:
print("Notebook complete. Re-run any cell to re-run that step.")

## üìö Additional reading

**YouTube (verified)**  
- [Sentiment Analysis with Transformers](https://www.youtube.com/watch?v=cXWuSWX0Va0) ‚Äî Hugging Face + Python.  
- [YouTube Sentiment Analysis with DistilBERT](https://www.youtube.com/watch?v=FIGoOP2b0V4) ‚Äî NLP pipeline with Hugging Face.

**Blogs (popular)**  
- [Getting Started with Sentiment Analysis using Python](https://huggingface.co/blog/sentiment-analysis-python) ‚Äî Hugging Face blog.  
- [Hugging Face NLP course](https://huggingface.co/course/chapter1/1) ‚Äî Tokenization, models, and tasks.