[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dz-web3/DS-Tech-2026spring/blob/main/Module8_LLM_Finetuning/Task1_Sentiment_Finetuning.ipynb)

**Click the badge above to open this notebook in Google Colab!**

# Task 1: Fine-Tuning for Sentiment Analysis

**Data Science for Business (Technical) ‚Äî Spring 2026**

---

## üéØ Learning Goals

In this task, you will:
1. **Run** a working sentiment classification model fine-tuned on product reviews
2. **Add** your own training examples to customize the model
3. **Test** the model on your own text to see how it performs
4. **Reflect** on when fine-tuning is the right business choice

---

## üìã What You Need to Do

1. **First**: Run all the cells from top to bottom to see the demo
2. **Then**: Complete the 3 exercises marked with ‚úèÔ∏è
3. **Finally**: Answer the reflection questions at the end

**Estimated time**: 20-30 minutes

## Step 1: Setup (Just Run This Cell)

In [None]:
# Check GPU availability
import torch
if torch.cuda.is_available():
    print(f"‚úÖ GPU enabled: {torch.cuda.get_device_name(0)}")
else:
    print("‚ö†Ô∏è No GPU detected. Go to Runtime ‚Üí Change runtime type ‚Üí Select T4 GPU")

In [None]:
%%capture
# Install required libraries (this takes ~1 minute)
!pip install transformers datasets evaluate accelerate -q

In [None]:
# Import libraries
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    TrainingArguments,
    Trainer
)
from datasets import Dataset
import evaluate
import numpy as np

print("‚úÖ Libraries loaded successfully!")

## Step 2: Prepare Training Data

We'll train the model to classify product reviews as **positive** (1) or **negative** (0).

Here's our training dataset:

In [None]:
# Training data: Product reviews with sentiment labels
# label = 1 means POSITIVE, label = 0 means NEGATIVE

training_examples = [
    # POSITIVE reviews (label = 1)
    {"text": "This product exceeded my expectations! Absolutely love it.", "label": 1},
    {"text": "Great quality for the price. Would definitely buy again.", "label": 1},
    {"text": "Fast shipping and the item works perfectly. Very satisfied!", "label": 1},
    {"text": "Best purchase I've made this year. Highly recommend!", "label": 1},
    {"text": "The customer service was amazing and the product is fantastic.", "label": 1},
    {"text": "Exactly what I needed. Works as described.", "label": 1},
    {"text": "Love this product! My whole family uses it now.", "label": 1},
    {"text": "Five stars! Couldn't be happier with my purchase.", "label": 1},
    
    # NEGATIVE reviews (label = 0)
    {"text": "Terrible quality. Broke after one week of use.", "label": 0},
    {"text": "Waste of money. Does not work as advertised.", "label": 0},
    {"text": "Very disappointed. Would not recommend to anyone.", "label": 0},
    {"text": "Poor quality and awful customer service.", "label": 0},
    {"text": "Arrived damaged and getting a refund was a nightmare.", "label": 0},
    {"text": "Complete scam. Nothing like the pictures showed.", "label": 0},
    {"text": "Don't buy this. Regret wasting my money.", "label": 0},
    {"text": "Stopped working after two days. Total junk.", "label": 0},
]

print(f"üìä Training data: {len(training_examples)} examples")
print(f"   - Positive reviews: {sum(1 for x in training_examples if x['label'] == 1)}")
print(f"   - Negative reviews: {sum(1 for x in training_examples if x['label'] == 0)}")

## Step 3: Load the Pre-trained Model

We'll use **DistilBERT** ‚Äî a smaller, faster version of BERT that's perfect for learning.

In [None]:
# Load tokenizer and model
model_name = "distilbert-base-uncased"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=2,  # 2 classes: positive and negative
    id2label={0: "NEGATIVE", 1: "POSITIVE"},
    label2id={"NEGATIVE": 0, "POSITIVE": 1}
)

print(f"‚úÖ Model loaded: {model_name}")
print(f"   Parameters: ~66 million")

## Step 4: Prepare Data for Training

In [None]:
# Convert to Hugging Face Dataset format
dataset = Dataset.from_list(training_examples)

# Tokenize the text
def tokenize(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=128)

tokenized_dataset = dataset.map(tokenize, batched=True)

print("‚úÖ Data tokenized and ready for training!")

## Step 5: Train the Model üöÄ

This will take about **2-3 minutes** on a T4 GPU. Watch the loss decrease!

In [None]:
# Define evaluation metric
accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)

# Training configuration
training_args = TrainingArguments(
    output_dir="./sentiment_model",
    num_train_epochs=5,
    per_device_train_batch_size=4,
    learning_rate=2e-5,
    logging_steps=5,
    save_strategy="no",
    report_to="none",
)

# Create trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    compute_metrics=compute_metrics,
)

print("üöÄ Starting training...")
print("   Watch the 'loss' value decrease ‚Äî that means the model is learning!\n")

In [None]:
# Train the model
trainer.train()

print("\n‚úÖ Training complete!")

## Step 6: Test the Fine-Tuned Model

Let's see how our trained model performs on new reviews!

In [None]:
# Function to predict sentiment
def predict_sentiment(text):
    """Predict sentiment of a given text"""
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
    inputs = {k: v.to(model.device) for k, v in inputs.items()}
    
    with torch.no_grad():
        outputs = model(**inputs)
        prediction = torch.argmax(outputs.logits, dim=-1).item()
        confidence = torch.softmax(outputs.logits, dim=-1).max().item()
    
    sentiment = "POSITIVE üòä" if prediction == 1 else "NEGATIVE üòû"
    return sentiment, confidence

print("üéØ Model ready for predictions!")

In [None]:
# Test on new reviews (not in training data)
test_reviews = [
    "Amazing product! Will definitely purchase again.",
    "Total garbage. Don't waste your money on this.",
    "It's okay. Nothing special but does the job.",
    "My kids absolutely love this toy!",
    "Broken on arrival. Very frustrating experience.",
]

print("üìã Testing on new reviews:\n")
for review in test_reviews:
    sentiment, confidence = predict_sentiment(review)
    print(f"Review: \"{review}\"")
    print(f"Prediction: {sentiment} (confidence: {confidence:.1%})")
    print("-" * 60)

---

# ‚úèÔ∏è Exercise 1: Add Your Own Training Examples

Add **5 new reviews** to the training data below ‚Äî at least 2 positive and 2 negative.

Think about reviews that might be challenging for the model:
- Mixed opinions ("Good quality but slow shipping")
- Sarcasm ("Wow, so great that it broke immediately")
- Industry-specific language

**After adding your examples**, run all the cells below to retrain the model.

In [None]:
# ‚úèÔ∏è YOUR CODE: Add 5 new training examples here!

my_examples = [
    # Add your examples below (at least 2 positive, 2 negative)
    # Format: {"text": "Your review here", "label": 1}  # 1 = positive
    # Format: {"text": "Your review here", "label": 0}  # 0 = negative
    
    # Example (delete this and add your own):
    # {"text": "This laptop runs so smoothly!", "label": 1},
    
]

# Combine with original training data
combined_data = training_examples + my_examples

print(f"üìä Combined training data: {len(combined_data)} examples")
print(f"   - Original: {len(training_examples)}")
print(f"   - Your additions: {len(my_examples)}")

In [None]:
# Retrain with combined data (only run this after adding your examples!)
if len(my_examples) >= 5:
    print("üîÑ Retraining model with your examples...\n")
    
    # Reload fresh model
    model = AutoModelForSequenceClassification.from_pretrained(
        model_name, num_labels=2,
        id2label={0: "NEGATIVE", 1: "POSITIVE"},
        label2id={"NEGATIVE": 0, "POSITIVE": 1}
    )
    
    # Prepare new dataset
    new_dataset = Dataset.from_list(combined_data)
    new_tokenized = new_dataset.map(tokenize, batched=True)
    
    # Train
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=new_tokenized,
    )
    trainer.train()
    print("\n‚úÖ Retrained with your examples!")
else:
    print("‚ö†Ô∏è Please add at least 5 examples to my_examples list above!")

---

# ‚úèÔ∏è Exercise 2: Test on Your Own Text

Write **3 product reviews** to test the model. Try to find cases where the model might be wrong!

In [None]:
# ‚úèÔ∏è YOUR CODE: Test your own reviews!

my_test_reviews = [
    # Add your test reviews here:
    "Your first review here",
    "Your second review here",
    "Your third review here",
]

print("üìã Testing your reviews:\n")
for review in my_test_reviews:
    sentiment, confidence = predict_sentiment(review)
    print(f"Review: \"{review}\"")
    print(f"Prediction: {sentiment} (confidence: {confidence:.1%})")
    print("-" * 60)

---

# ‚úèÔ∏è Exercise 3: Compare Before and After 

Think about the difference between using a pre-trained model vs. your fine-tuned model.

**Answer these questions** (edit the text below):

### Your Answers:

**Q1: Did the model correctly classify all your test reviews? Which ones did it get wrong and why do you think that happened?**

*Your answer here:* 

---

**Q2: If you were building a sentiment analysis system for a specific industry (e.g., restaurants, hotels, tech products), what kind of training data would you need?**

*Your answer here:*

---

**Q3: When would a company choose fine-tuning vs. just using better prompts with ChatGPT? Name one advantage of each approach.**

*Your answer here:*

---

## üéâ Congratulations!

You've successfully:
- ‚úÖ Fine-tuned a language model for sentiment classification
- ‚úÖ Added your own training examples
- ‚úÖ Tested the model on custom inputs
- ‚úÖ Thought about business applications of fine-tuning

**Next**: Complete Task 2 to compare fine-tuning with prompting approaches!