<a href="https://colab.research.google.com/github/Hsiao-2007/Hsiao-2007.github.io/blob/main/MLB_Project_4_Main_FineTuning_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MLB Project 4 Main - Fine-Tuning a Language Model

## Project Overview

Welcome to the Fine-Tuning Project! In this project, you'll learn how to **fine-tune a pre-trained language model** for sentiment analysis on product reviews.

### What is Fine-Tuning?
Fine-tuning is the process of taking a pre-trained model and adapting it to a specific task by training it on task-specific data. This allows you to leverage the general language understanding the model has already learned while specializing it for your particular use case.

### What You'll Learn
- How to load and preprocess a dataset for fine-tuning
- How to prepare a pre-trained model for a classification task
- How to set up training arguments and optimize hyperparameters
- How to train and evaluate a fine-tuned model
- How to make predictions with your trained model

### Project Structure
1. Setup and imports
2. Data loading and exploration
3. Data preprocessing and tokenization
4. Model preparation
5. Training configuration
6. Model training
7. Evaluation and inference

### Dataset: Amazon Product Reviews
We'll use a subset of Amazon product reviews with ratings from 1-5 stars. Your task is to predict the sentiment (positive/negative/neutral) based on the review text.

---

## Step 1: Setup and Imports

First, let's install the required libraries and import them.

In [4]:
# Install required packages (run this cell first!)
!pip install transformers datasets torch scikit-learn accelerate evaluate -q

In [5]:
# Import all necessary libraries
import os
import numpy as np
import pandas as pd
import warnings
import torch
from datasets import load_dataset, Dataset, DatasetDict
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    TrainingArguments,
    Trainer,
    DataCollatorWithPadding
)
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import evaluate

# Suppress unnecessary warnings for cleaner output
warnings.filterwarnings("ignore")

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

print("All libraries imported successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

All libraries imported successfully!
PyTorch version: 2.9.0+cu126
CUDA available: True


## Step 2: Configuration

Let's set up our model names and training configuration.

In [6]:
# Configuration settings
MODEL_NAME = "distilbert-base-uncased"  # A lighter, faster version of BERT
MAX_LENGTH = 128  # Maximum sequence length for tokenization
BATCH_SIZE = 16  # Number of samples per training batch
NUM_EPOCHS = 3  # Number of training epochs
LEARNING_RATE = 2e-5  # Learning rate for the optimizer

# Label mapping
LABEL_MAPPING = {
    0: "negative",  # 1-2 star reviews
    1: "neutral",   # 3 star reviews
    2: "positive"   # 4-5 star reviews
}

print(f"Model: {MODEL_NAME}")
print(f"Max Length: {MAX_LENGTH}")
print(f"Batch Size: {BATCH_SIZE}")
print(f"Epochs: {NUM_EPOCHS}")
print(f"Learning Rate: {LEARNING_RATE}")

Model: distilbert-base-uncased
Max Length: 128
Batch Size: 16
Epochs: 3
Learning Rate: 2e-05


## Step 3: Load and Explore the Dataset

We'll use the Amazon Polarity dataset which contains product reviews. We'll convert the ratings into sentiment labels.

### 3.1: Load the Dataset

**TODO**: Load the dataset and take a smaller subset for faster training.

In [48]:
print("Loading dataset...")

# TODO: Load the amazon_polarity dataset
dataset = load_dataset('fancyzhx/amazon_polarity')  # Replace None with your code

# We'll take a smaller subset for faster training (5000 samples)
# TODO: Select the first 5000 samples from the dataset
dataset['train'] = dataset['train'].select(range(5000))  # Replace None with your code
dataset['test'] = dataset['test'].select(range(5000))  # Replace None with your code

print(f"Dataset loaded with {len(dataset)} samples")
print(f"\nDataset structure: {dataset}")

Loading dataset...
Dataset loaded with 2 samples

Dataset structure: DatasetDict({
    train: Dataset({
        features: ['label', 'title', 'content'],
        num_rows: 5000
    })
    test: Dataset({
        features: ['label', 'title', 'content'],
        num_rows: 5000
    })
})


### 3.2: Explore the Data

Let's examine some sample reviews to understand the data better.

In [49]:
# TODO: Display the first 3 examples from the dataset
print("Sample reviews:\n")
for i in range(3):
    sample = dataset["train"][i]  # Replace None with code to get sample i
    print(f"Example {i+1}:")
    print(f"Title: {sample['title']}")
    print(f"Content: {sample['content'][:200]}...")  # Show first 200 chars
    print(f"Label: {sample['label']} ({LABEL_MAPPING[sample['label']]})")
    print("-" * 80)
    print()

Sample reviews:

Example 1:
Title: Stuning even for the non-gamer
Content: This sound track was beautiful! It paints the senery in your mind so well I would recomend it even to people who hate vid. game music! I have played the game Chrono Cross but out of all of the games I...
Label: 1 (neutral)
--------------------------------------------------------------------------------

Example 2:
Title: The best soundtrack ever to anything.
Content: I'm reading a lot of reviews saying that this is the best 'game soundtrack' and I figured that I'd write a review to disagree a bit. This in my opinino is Yasunori Mitsuda's ultimate masterpiece. The ...
Label: 1 (neutral)
--------------------------------------------------------------------------------

Example 3:
Title: Amazing!
Content: This soundtrack is my favorite music of all time, hands down. The intense sadness of "Prisoners of Fate" (which means all the more if you've played the game) and the hope in "A Distant Promise" and "G...
Label: 1 (

### 3.3: Prepare Labels

The amazon_polarity dataset has binary labels (0=negative, 1=positive). For this project, we'll work with these as-is, but you could extend this to include neutral sentiment.

**TODO**: Create a function to combine title and content into a single text field.

In [50]:
def prepare_text(example):
    """
    Combine title and content into a single text field.

    Args:
        example: A single example from the dataset

    Returns:
        Dictionary with 'text' and 'label' fields
    """
    # TODO: Combine title and content with a separator
    text = example['title'] + '|' + example['content']  # Replace None with your code

    return {
        'text': text,
        'label': example['label']
    }

# TODO: Apply the prepare_text function to the entire dataset
dataset = dataset.map(prepare_text) # Replace None with your code
dataset = dataset.remove_columns(['title','content'])

print("Text preparation complete!")
print(f"\nUpdated dataset structure: {dataset}")

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

Text preparation complete!

Updated dataset structure: DatasetDict({
    train: Dataset({
        features: ['label', 'text'],
        num_rows: 5000
    })
    test: Dataset({
        features: ['label', 'text'],
        num_rows: 5000
    })
})
{'label': 1, 'text': 'Stuning even for the non-gamer|This sound track was beautiful! It paints the senery in your mind so well I would recomend it even to people who hate vid. game music! I have played the game Chrono Cross but out of all of the games I have ever played it has the best music! It backs away from crude keyboarding and takes a fresher step with grate guitars and soulful orchestras. It would impress anyone who cares to listen! ^_^'}


### 3.4: Split the Dataset

**TODO**: Split the dataset into training, validation, and test sets.

In [52]:
# TODO: Split the dataset into train (70%), validation (15%), and test (15%)

# First split: 70% train, 30% temp
train_test = None # Replace None with your code

# Second split: Split the temp set into 50% validation, 50% test
val_test = None  # Replace None with your code

# Create the final dataset dictionary
dataset_dict = DatasetDict({
    'train': train_test['train'],
    'validation': val_test['train'],
    'test': val_test['test']
})

print("Dataset split complete!")
print(f"\nDataset splits:")
print(f"   Training: {len(dataset_dict['train'])} samples")
print(f"   Validation: {len(dataset_dict['validation'])} samples")
print(f"   Test: {len(dataset_dict['test'])} samples")

TypeError: 'NoneType' object is not subscriptable

## Step 4: Tokenization

Tokenization converts text into numerical tokens that the model can understand.

### 4.1: Load the Tokenizer

**TODO**: Load the tokenizer for our model.

In [54]:
print("Loading tokenizer...")

# TODO: Load the tokenizer using AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)  # Replace None with your code

print(f"Tokenizer loaded: {tokenizer.__class__.__name__}")
print(f"Vocabulary size: {len(tokenizer)}")

Loading tokenizer...


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Tokenizer loaded: DistilBertTokenizerFast
Vocabulary size: 30522


### 4.2: Create Tokenization Function

**TODO**: Create a function to tokenize the text data.

In [56]:
def tokenize_function(examples):
    """
    Tokenize the text in the examples.

    Args:
        examples: Batch of examples from the dataset

    Returns:
        Tokenized examples
    """
    # TODO: Tokenize the text with padding and truncation
    return tokenizer.tokenize(examples)  # Replace None with your code

# TODO: Apply tokenization to all splits
print("Tokenizing datasets...")
tokenized_datasets = dataset.map(tokenize_function)  # Replace None with your code

print("Tokenization complete!")
print(f"\nTokenized dataset structure: {tokenized_datasets}")

Tokenizing datasets...


Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

TypeError: Provided `function` which is applied to all elements of table returns a variable of type <class 'list'>. Make sure provided `function` returns a variable of type `dict` (or a pyarrow table) to update the dataset or `None` if you are only interested in side effects.

### 4.3: Verify Tokenization

Let's check that tokenization worked correctly.

In [None]:
# Examine a tokenized example
print("Sample tokenized example:\n")
sample = tokenized_datasets['train'][0]
print(f"Input IDs shape: {len(sample['input_ids'])}")
print(f"Input IDs (first 20): {sample['input_ids'][:20]}")
print(f"\nDecoded text: {tokenizer.decode(sample['input_ids'][:50])}...")
print(f"Label: {sample['label']}")

## Step 5: Prepare the Model

Now we'll load the pre-trained model and prepare it for our classification task.

### 5.1: Load the Model

**TODO**: Load the pre-trained model for sequence classification.

In [None]:
print("Loading pre-trained model...")

# TODO: Load the model for sequence classification
model = None  # Replace None with your code

print(f"Model loaded: {model.__class__.__name__}")
print(f"Total parameters: {sum(p.numel() for p in model.parameters()):,}")
print(f"Trainable parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}")

### 5.2: Create Data Collator

A data collator handles batching and padding during training.

In [None]:
# TODO: Create a data collator with padding
data_collator = None  # Replace None with your code

print("Data collator created!")

## Step 6: Define Evaluation Metrics

We need to define how to evaluate our model's performance.

In [None]:
def compute_metrics(eval_pred):
    """
    Compute evaluation metrics for the model.

    Args:
        eval_pred: Tuple of (predictions, labels)

    Returns:
        Dictionary of metric values
    """
    # TODO: Extract predictions and labels
    predictions, labels = None, None  # Replace None with your code

    # TODO: Get predicted class by taking argmax
    predictions = None  # Replace None with your code

    # TODO: Calculate accuracy
    accuracy = None  # Replace None with your code

    # TODO: Calculate precision, recall, and F1
    precision, recall, f1, _ = None, None, None, None  # Replace with your code

    return {
        'accuracy': accuracy,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }

print("Metrics function defined!")

## Step 7: Set Up Training Arguments

Training arguments control how the model is trained.

In [None]:
# TODO: Define training arguments
training_args = TrainingArguments(
    output_dir="./results",              # Output directory for checkpoints
    num_train_epochs=None,               # TODO
    per_device_train_batch_size=None,    # TODO
    per_device_eval_batch_size=None,     # TODO
    learning_rate=None,                  # TODO
    weight_decay=0.01,                   # Regularization
    eval_strategy="epoch",               # Evaluate after each epoch
    save_strategy="epoch",               # Save checkpoint after each epoch
    load_best_model_at_end=True,         # Load best model at the end
    metric_for_best_model="accuracy",    # Use accuracy to determine best model
    logging_dir="./logs",                # TensorBoard logs
    logging_steps=50,                    # Log every 50 steps
    warmup_steps=100,                    # Warmup steps for learning rate
    seed=42,                             # Random seed
    report_to="none",
)

print("Training arguments configured!")
print(f"\nTraining configuration:")
print(f"Epochs: {training_args.num_train_epochs}")
print(f"Batch size: {training_args.per_device_train_batch_size}")
print(f"Learning rate: {training_args.learning_rate}")

## Step 8: Create the Trainer

The Trainer handles the training loop, evaluation, and logging.

In [None]:
# TODO: Create a Trainer instance
trainer = None  # Replace None with your code

print("Trainer created!")

## Step 9: Train the Model

Now we're ready to train! This will take several minutes.

**TODO**: Start the training process.

In [None]:
print("Starting training...\n")
print("=" * 80)

# TODO: Train the model
train_result = None  # Replace None with your code

print("\n" + "=" * 80)
print("Training complete!")
print(f"\nTraining metrics:")
print(f"Final loss: {train_result.training_loss:.4f}")
print(f"Training time: {train_result.metrics['train_runtime']:.2f} seconds")

## Step 10: Evaluate the Model

Let's evaluate our fine-tuned model on the validation and test sets.

### 10.1: Validation Set Evaluation

**TODO**: Evaluate on the validation set.

In [None]:
print("Evaluating on validation set...\n")

# TODO: Evaluate the model on validation set
val_results = None  # Replace None with your code

print("Validation results:")
for metric, value in val_results.items():
    if metric.startswith('eval_'):
        print(f"   {metric[5:]}: {value:.4f}")

### 10.2: Test Set Evaluation

**TODO**: Evaluate on the test set.

In [None]:
print("Evaluating on test set...\n")

# TODO: Evaluate the model on test set
test_results = None  # Replace None with your code

print("Test results:")
for metric, value in test_results.items():
    if metric.startswith('eval_'):
        print(f"   {metric[5:]}: {value:.4f}")

## Step 11: Make Predictions

Now let's use our fine-tuned model to make predictions on new text!

### 11.1: Create Prediction Function

**TODO**: Create a function to predict sentiment for new reviews.

In [None]:
def predict_sentiment(text: str, model, tokenizer):
    """
    Predict the sentiment of a text review.

    Args:
        text: Review text to classify
        model: Fine-tuned model
        tokenizer: Tokenizer

    Returns:
        Tuple of (predicted_label, confidence_score)
    """
    # TODO: Tokenize the input text
    inputs = None  # Replace None with your code

    device = next(model.parameters()).device
    inputs = {k: v.to(device) for k, v in inputs.items()}

    # TODO: Get model predictions (no gradient calculation needed)
    with torch.no_grad():
        outputs = None  # Replace None with your code

    # TODO: Get the predicted class
    predicted_class = None  # Replace None with your code

    # TODO: Calculate confidence score using softmax
    probabilities = torch.softmax(outputs.logits, dim=1)
    confidence = None  # Replace None with your code

    # Map to label name
    label_name = LABEL_MAPPING[predicted_class.item()]

    return label_name, confidence.item()

print("Prediction function defined!")

### 11.2: Test with Sample Reviews

Let's test our model with some example reviews!

In [None]:
# Sample reviews to test
test_reviews = [
    "This product is amazing! Best purchase I've ever made. Highly recommend!",
    "Terrible quality. Broke after one day. Complete waste of money.",
    "This is absolutely the worst product I have ever purchased. Save your money!",
    "Great value for the price. Works exactly as described.",
    "Not bad, but not great either. It's okay for the price."
]

print("Making predictions on sample reviews:\n")
print("=" * 80)

for i, review in enumerate(test_reviews, 1):
    # TODO: Get prediction for the review
    sentiment, confidence = None, None  # Replace with your code

    print(f"\nReview {i}: {review}")
    print(f"Predicted Sentiment: {sentiment.upper()}")
    print(f"Confidence: {confidence:.2%}")
    print("-" * 80)

### 11.3: Interactive Prediction

Try it yourself! Enter your own reviews to classify.

In [None]:
print("\n" + "=" * 80)
print("Interactive Sentiment Classifier")
print("Enter product reviews to classify their sentiment.")
print("Type 'exit' or 'quit' to stop.")
print("=" * 80 + "\n")

while True:
    # Get user input
    review = input("\nüìù Enter a review: ")

    # Check if user wants to exit
    if review.lower() in ["exit", "quit"]:
        print("\nüëã Thank you for using the sentiment classifier!")
        break

    # Skip empty input
    if not review.strip():
        continue

    # TODO: Get prediction
    sentiment, confidence = None, None  # Replace with your code

    print(f"\n‚ú® Prediction: {sentiment.upper()}")
    print(f"üìä Confidence: {confidence:.2%}")

## üéì Congratulations!

You've successfully fine-tuned a language model for sentiment analysis! Here's what you accomplished:

1. ‚úÖ Loaded and preprocessed the Amazon product reviews dataset
2. ‚úÖ Tokenized text data for model input
3. ‚úÖ Configured and loaded a pre-trained DistilBERT model
4. ‚úÖ Set up training arguments and evaluation metrics
5. ‚úÖ Fine-tuned the model on sentiment classification
6. ‚úÖ Evaluated model performance on validation and test sets
7. ‚úÖ Created a prediction function for new reviews

### Next Steps

Want to improve your model? Try:
- Training for more epochs
- Experimenting with different learning rates
- Using a larger model (e.g., BERT-base instead of DistilBERT)
- Adding more training data
- Implementing data augmentation techniques
- Trying different optimizers or schedulers
- Fine-tuning on a different task (e.g., multi-class classification)
- Adding attention visualization to understand model decisions

### Additional Resources

- [Hugging Face Transformers Documentation](https://huggingface.co/docs/transformers/)
- [Fine-tuning Guide](https://huggingface.co/docs/transformers/training)
- [DistilBERT Paper](https://arxiv.org/abs/1910.01108)
- [Transfer Learning in NLP](https://ruder.io/transfer-learning/)

### Challenge Tasks

1. **Hyperparameter Tuning**: Experiment with different batch sizes, learning rates, and epochs to improve performance
2. **Model Comparison**: Try fine-tuning different models (BERT, RoBERTa, ALBERT) and compare results
3. **Error Analysis**: Analyze misclassified examples to understand model limitations
4. **Deployment**: Create a simple web app using Gradio or Streamlit to deploy your model

Great work! üéâ