# üéØ Chapter 6.1: Fine-tuning Pretrained Models on Hugging Face

---

## üìö What You'll Learn

In this notebook, we'll explore **fine-tuning** - a powerful technique that allows you to take a pretrained model and adapt it for your specific use case. By the end of this notebook, you'll understand:

1. **What is Fine-tuning?** - Understanding the concept and why it's important
2. **Loading & Preparing Datasets** - Working with Hugging Face datasets
3. **Tokenization** - Converting text to model-ready format
4. **Setting Up Models** - Configuring pretrained models for classification
5. **Training with Trainer API** - Using Hugging Face's powerful training utilities
6. **Inference** - Using your fine-tuned model for predictions
7. **Multiclass Classification** - Extending to more complex scenarios

---

## üß† Understanding Fine-tuning

**Fine-tuning** is like teaching an expert a new specialty. Imagine a doctor who has years of general medical training (pretrained model). Instead of training a new doctor from scratch for cardiology, you take this experienced doctor and give them specialized cardiology training (fine-tuning). They already understand medicine fundamentals, so they learn the specialty faster and with less effort!

### Why Fine-tune?

| Approach | Training Data Needed | Time Required | Performance |
|----------|---------------------|---------------|-------------|
| Train from Scratch | Millions of examples | Days/Weeks | Variable |
| Fine-tuning | Thousands of examples | Minutes/Hours | High |
| Zero-shot (no training) | None | Instant | Moderate |

Fine-tuning offers the **sweet spot** between effort and performance!

---

## üõ†Ô∏è Setup & Installation

Let's start by installing and importing the necessary libraries.

In [1]:
# Install required packages (uncomment if needed)
# !pip install transformers datasets torch accelerate -q

In [2]:
# Import essential libraries
from datasets import load_dataset, DatasetDict
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    Trainer,
    TrainingArguments
)
import torch
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ All libraries imported successfully!")

‚úÖ All libraries imported successfully!


In [3]:
# Check for GPU availability
if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"üöÄ Using CUDA GPU: {torch.cuda.get_device_name(0)}")
elif torch.backends.mps.is_available():
    device = torch.device("mps")
    print("üçé Using Apple Silicon MPS")
else:
    device = torch.device("cpu")
    print("üíª Using CPU (training will be slower)")

print(f"Device selected: {device}")

üöÄ Using CUDA GPU: 
Device selected: cuda


---

## üìä Part 1: Binary Classification - Movie Review Sentiment Analysis

For our first fine-tuning exercise, we'll use the **IMDB movie reviews dataset** to classify reviews as positive or negative.

### 1.1 Loading the Dataset

In [4]:
# Load the IMDB dataset
imdb_dataset = load_dataset("imdb")

print("üì¶ IMDB Dataset Structure:")
print(imdb_dataset)

README.md: 0.00B [00:00, ?B/s]

plain_text/train-00000-of-00001.parquet:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

plain_text/test-00000-of-00001.parquet:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

plain_text/unsupervised-00000-of-00001.p(‚Ä¶):   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

üì¶ IMDB Dataset Structure:
DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label'],
        num_rows: 50000
    })
})


In [5]:
# Let's examine a sample review
sample = imdb_dataset['train'][0]

print("üé¨ Sample Movie Review:")
print("=" * 50)
print(f"Text (first 500 chars): {sample['text'][:500]}...")
print(f"\nLabel: {sample['label']} ({'Positive üëç' if sample['label'] == 1 else 'Negative üëé'})")

üé¨ Sample Movie Review:
Text (first 500 chars): I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it when it was first released in 1967. I also heard that at first it was seized by U.S. customs if it ever tried to enter this country, therefore being a fan of films considered "controversial" I really had to see this for myself.<br /><br />The plot is centered around a young Swedish drama student named Lena who wants to learn everything she can about life. In particular she wants to focus her attent...

Label: 0 (Negative üëé)


### 1.2 Creating a Subset for Faster Training

The full IMDB dataset has 25,000 training examples. For learning purposes, we'll use a smaller subset to speed up training while still achieving good results.

In [6]:
# Create a manageable subset for training
NUM_TRAIN_SAMPLES = 3000
NUM_TEST_SAMPLES = 1000

# Shuffle and select subsets
train_subset = imdb_dataset['train'].shuffle(seed=42).select(range(NUM_TRAIN_SAMPLES))
test_subset = imdb_dataset['test'].shuffle(seed=42).select(range(NUM_TEST_SAMPLES))

# Combine into a DatasetDict
movie_dataset = DatasetDict({
    'train': train_subset,
    'test': test_subset
})

print(f"üìä Subset Dataset:")
print(movie_dataset)
print(f"\n‚úÖ Using {NUM_TRAIN_SAMPLES} training and {NUM_TEST_SAMPLES} test samples")

üìä Subset Dataset:
DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 3000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 1000
    })
})

‚úÖ Using 3000 training and 1000 test samples


In [7]:
# Let's check the label distribution
train_labels = movie_dataset['train']['label']
positive_count = sum(train_labels)
negative_count = len(train_labels) - positive_count

print("üìà Label Distribution in Training Set:")
print(f"   Positive reviews: {positive_count} ({positive_count/len(train_labels)*100:.1f}%)")
print(f"   Negative reviews: {negative_count} ({negative_count/len(train_labels)*100:.1f}%)")

üìà Label Distribution in Training Set:
   Positive reviews: 1489 (49.6%)
   Negative reviews: 1511 (50.4%)


### 1.3 Tokenization - Converting Text to Numbers

Models don't understand text directly - they need numbers! **Tokenization** converts our text into tokens (think of them as word pieces) and then into numerical IDs.

We'll use **DistilBERT**, a smaller and faster version of BERT that retains 97% of its performance while being 60% faster!

In [8]:
# Define the model checkpoint
MODEL_CHECKPOINT = "distilbert-base-uncased"

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_CHECKPOINT)

print(f"‚úÖ Loaded tokenizer for: {MODEL_CHECKPOINT}")
print(f"   Vocabulary size: {tokenizer.vocab_size:,} tokens")

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

‚úÖ Loaded tokenizer for: distilbert-base-uncased
   Vocabulary size: 30,522 tokens


In [9]:
# Let's see tokenization in action
sample_text = "This movie was absolutely fantastic! I loved every moment."

# Tokenize the sample
tokens = tokenizer.tokenize(sample_text)
token_ids = tokenizer.encode(sample_text)

print("üî§ Tokenization Example:")
print(f"   Original: {sample_text}")
print(f"   Tokens: {tokens}")
print(f"   Token IDs: {token_ids}")

üî§ Tokenization Example:
   Original: This movie was absolutely fantastic! I loved every moment.
   Tokens: ['this', 'movie', 'was', 'absolutely', 'fantastic', '!', 'i', 'loved', 'every', 'moment', '.']
   Token IDs: [101, 2023, 3185, 2001, 7078, 10392, 999, 1045, 3866, 2296, 2617, 1012, 102]


In [10]:
# Define the tokenization function
def tokenize_function(examples):
    """
    Tokenizes the text with padding and truncation.
    
    - padding='max_length': Pads shorter texts to max_length
    - truncation=True: Cuts longer texts to max_length
    - max_length=256: Maximum sequence length (reduced from 512 for speed)
    """
    return tokenizer(
        examples['text'],
        padding='max_length',
        truncation=True,
        max_length=256  # Shorter for faster training
    )

# Apply tokenization to the entire dataset
print("‚è≥ Tokenizing dataset...")
tokenized_movie_dataset = movie_dataset.map(tokenize_function, batched=True)
print("‚úÖ Tokenization complete!")

# View the new structure
print("\nüìä Tokenized Dataset Structure:")
print(tokenized_movie_dataset)

‚è≥ Tokenizing dataset...


Map:   0%|          | 0/3000 [00:00<?, ? examples/s]

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

‚úÖ Tokenization complete!

üìä Tokenized Dataset Structure:
DatasetDict({
    train: Dataset({
        features: ['text', 'label', 'input_ids', 'attention_mask'],
        num_rows: 3000
    })
    test: Dataset({
        features: ['text', 'label', 'input_ids', 'attention_mask'],
        num_rows: 1000
    })
})


In [11]:
# Examine a tokenized example
example = tokenized_movie_dataset['train'][0]

print("üîç Tokenized Example:")
print(f"   Keys: {example.keys()}")
print(f"   Input IDs length: {len(example['input_ids'])}")
print(f"   Attention mask length: {len(example['attention_mask'])}")
print(f"   Label: {example['label']}")

üîç Tokenized Example:
   Keys: dict_keys(['text', 'label', 'input_ids', 'attention_mask'])
   Input IDs length: 256
   Attention mask length: 256
   Label: 1


### 1.4 Setting Up the Pretrained Model

Now we load the pretrained DistilBERT model and configure it for our binary classification task. The `num_labels=2` parameter tells the model we have two classes (positive/negative).

In [12]:
# Load the model for sequence classification
model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_CHECKPOINT,
    num_labels=2  # Binary classification: 0=negative, 1=positive
)

# Move model to the appropriate device
model.to(device)

print(f"‚úÖ Model loaded and moved to {device}")
print(f"   Model type: {type(model).__name__}")
print(f"   Number of parameters: {model.num_parameters():,}")

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


‚úÖ Model loaded and moved to cuda
   Model type: DistilBertForSequenceClassification
   Number of parameters: 66,955,010


### üìñ Understanding the Model Architecture

When you load `AutoModelForSequenceClassification`, Hugging Face automatically:

1. **Loads the pretrained transformer** (DistilBERT) - This contains the general language understanding
2. **Adds a classification head** - A linear layer that maps the transformer's output to your number of classes

The classification head is randomly initialized and needs to be trained on your data!

In [13]:
# View model architecture (optional - generates lengthy output)
print(model)

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): DistilBertSdpaAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)


### 1.5 Training Configuration

The `TrainingArguments` class allows us to configure all aspects of training. Let's set up our training parameters.

In [14]:
# Define training arguments
training_args = TrainingArguments(
    # Output settings
    output_dir="./movie_sentiment_model",  # Where to save the model
    
    # Training hyperparameters
    learning_rate=2e-5,              # Small LR to avoid destroying pretrained weights
    num_train_epochs=3,              # Number of complete passes through the dataset
    per_device_train_batch_size=16,  # Samples processed per training step
    per_device_eval_batch_size=16,   # Samples processed per evaluation step
    weight_decay=0.01,               # Regularization to prevent overfitting
    
    # Evaluation strategy
    eval_strategy="epoch",           # Evaluate after each epoch
    save_strategy="epoch",           # Save checkpoint after each epoch
    load_best_model_at_end=True,     # Load the best model when training ends
    
    # Logging
    logging_dir="./logs",
    logging_steps=50,
    
    # Other settings
    fp16=torch.cuda.is_available(),  # Use mixed precision if GPU available
)

print("‚úÖ Training arguments configured!")
print(f"   Epochs: {training_args.num_train_epochs}")
print(f"   Batch size: {training_args.per_device_train_batch_size}")
print(f"   Learning rate: {training_args.learning_rate}")

‚úÖ Training arguments configured!
   Epochs: 3
   Batch size: 16
   Learning rate: 2e-05


### üìñ Understanding Training Hyperparameters

| Parameter | Description | Typical Value |
|-----------|-------------|---------------|
| `learning_rate` | How much to update weights each step. Lower = slower but more stable | 1e-5 to 5e-5 |
| `num_train_epochs` | Complete passes through the training data | 2-5 |
| `batch_size` | Samples processed before updating weights | 8-32 |
| `weight_decay` | Regularization strength to prevent overfitting | 0.01-0.1 |

### 1.6 Training the Model

Now we create the `Trainer` and start the fine-tuning process!

In [15]:
# Create the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_movie_dataset['train'],
    eval_dataset=tokenized_movie_dataset['test'],
)

print("‚úÖ Trainer initialized!")
print(f"   Training samples: {len(tokenized_movie_dataset['train'])}")
print(f"   Evaluation samples: {len(tokenized_movie_dataset['test'])}")

‚úÖ Trainer initialized!
   Training samples: 3000
   Evaluation samples: 1000


In [16]:
# Start training!
print("üöÄ Starting fine-tuning...")
print("   This may take several minutes depending on your hardware.")
print("=" * 60)

train_result = trainer.train()

print("\n" + "=" * 60)
print("‚úÖ Training complete!")

üöÄ Starting fine-tuning...
   This may take several minutes depending on your hardware.


Epoch,Training Loss,Validation Loss
1,0.3284,0.310028
2,0.2269,0.328088
3,0.1503,0.389217



‚úÖ Training complete!


In [17]:
# Evaluate the model
print("üìä Evaluating the model...")
eval_results = trainer.evaluate()

print("\nüìà Evaluation Results:")
print(f"   Loss: {eval_results['eval_loss']:.4f}")
print(f"   Runtime: {eval_results['eval_runtime']:.2f} seconds")
print(f"   Samples/second: {eval_results['eval_samples_per_second']:.2f}")

üìä Evaluating the model...



üìà Evaluation Results:
   Loss: 0.3100
   Runtime: 0.39 seconds
   Samples/second: 2541.86


### 1.7 Saving the Fine-tuned Model

Let's save our model so we can use it later without retraining.

In [18]:
# Save the model and tokenizer
MODEL_SAVE_PATH = "./movie_sentiment_model/final"

model.save_pretrained(MODEL_SAVE_PATH)
tokenizer.save_pretrained(MODEL_SAVE_PATH)

print(f"‚úÖ Model saved to: {MODEL_SAVE_PATH}")

‚úÖ Model saved to: ./movie_sentiment_model/final


### 1.8 Using the Fine-tuned Model for Inference

Now let's test our model on some new movie reviews!

In [19]:
# Load the fine-tuned model
inference_model = AutoModelForSequenceClassification.from_pretrained(MODEL_SAVE_PATH)
inference_tokenizer = AutoTokenizer.from_pretrained(MODEL_SAVE_PATH)

inference_model.to(device)
inference_model.eval()  # Set to evaluation mode

print("‚úÖ Model loaded for inference!")

‚úÖ Model loaded for inference!


In [20]:
def predict_sentiment(text):
    """
    Predicts the sentiment of a movie review.
    
    Args:
        text: The movie review text
        
    Returns:
        A dictionary with sentiment and confidence
    """
    # Tokenize the input
    inputs = inference_tokenizer(
        text,
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=256
    )
    
    # Move inputs to device
    inputs = {key: value.to(device) for key, value in inputs.items()}
    
    # Get predictions
    with torch.no_grad():
        outputs = inference_model(**inputs)
        probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
        predicted_class = torch.argmax(probabilities, dim=-1).item()
        confidence = probabilities[0][predicted_class].item()
    
    sentiment = "Positive üëç" if predicted_class == 1 else "Negative üëé"
    
    return {
        "sentiment": sentiment,
        "confidence": confidence,
        "class": predicted_class
    }

In [21]:
# Test reviews - a mix of positive and negative
test_reviews = [
    """This is hands down the best sci-fi movie I've seen in years! 
    The special effects were mind-blowing, the storyline kept me engaged 
    throughout, and the acting was superb. Highly recommend!""",
    
    """What a waste of two hours. The plot made no sense, the characters 
    were one-dimensional, and I found myself checking my phone multiple times. 
    Save your money and skip this one.""",
    
    """A decent thriller with some good moments. While the twist at the end 
    was predictable, the lead actor's performance made it worth watching. 
    Not great, not terrible.""",
    
    """Absolutely horrible! The CGI looked like it was from the 90s, 
    the dialogue was cringeworthy, and I actually fell asleep halfway through. 
    Worst movie of the year.""",
    
    """A masterpiece of modern cinema. Every frame is beautifully crafted, 
    the score is hauntingly beautiful, and the performances will stay with 
    you long after the credits roll. Oscar-worthy!"""
]

print("üé¨ Movie Review Sentiment Analysis")
print("=" * 60)

for i, review in enumerate(test_reviews, 1):
    result = predict_sentiment(review)
    print(f"\nReview {i}:")
    print(f"   {review[:100]}...")
    print(f"   ‚Üí {result['sentiment']} (Confidence: {result['confidence']:.2%})")

üé¨ Movie Review Sentiment Analysis

Review 1:
   This is hands down the best sci-fi movie I've seen in years! 
    The special effects were mind-blow...
   ‚Üí Positive üëç (Confidence: 94.50%)

Review 2:
   What a waste of two hours. The plot made no sense, the characters 
    were one-dimensional, and I f...
   ‚Üí Negative üëé (Confidence: 95.22%)

Review 3:
   A decent thriller with some good moments. While the twist at the end 
    was predictable, the lead ...
   ‚Üí Negative üëé (Confidence: 55.35%)

Review 4:
   Absolutely horrible! The CGI looked like it was from the 90s, 
    the dialogue was cringeworthy, an...
   ‚Üí Negative üëé (Confidence: 95.97%)

Review 5:
   A masterpiece of modern cinema. Every frame is beautifully crafted, 
    the score is hauntingly bea...
   ‚Üí Positive üëç (Confidence: 95.82%)


---

## üìä Part 2: Multiclass Classification - Food Review Rating Prediction

Now let's tackle a more complex task: predicting review ratings on a 1-5 star scale using the **Yelp Review Full dataset**. We'll filter for food-related reviews to create a focused food rating predictor!

In [22]:
# Load Yelp Review Full dataset for multiclass classification
# This dataset has 5 classes (1-5 star ratings)
print("üì¶ Loading Yelp Review Full dataset...")

yelp_full = load_dataset("yelp_review_full")
print(yelp_full)

üì¶ Loading Yelp Review Full dataset...


README.md: 0.00B [00:00, ?B/s]

yelp_review_full/train-00000-of-00001.pa(‚Ä¶):   0%|          | 0.00/299M [00:00<?, ?B/s]

yelp_review_full/test-00000-of-00001.par(‚Ä¶):   0%|          | 0.00/23.5M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/650000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/50000 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['label', 'text'],
        num_rows: 650000
    })
    test: Dataset({
        features: ['label', 'text'],
        num_rows: 50000
    })
})


In [23]:
# Examine a sample
sample = yelp_full['train'][100]

print("üìù Sample Review:")
print(f"   Text: {sample['text'][:300]}...")
print(f"   Label: {sample['label']}")

# Label mapping explanation
label_names = {
    0: "‚≠ê (1 star - Very Negative)",
    1: "‚≠ê‚≠ê (2 stars - Negative)",
    2: "‚≠ê‚≠ê‚≠ê (3 stars - Neutral)",
    3: "‚≠ê‚≠ê‚≠ê‚≠ê (4 stars - Positive)",
    4: "‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê (5 stars - Very Positive)"
}

print(f"\n   This means: {label_names[sample['label']]}")

üìù Sample Review:
   Text: My expectations for McDonalds are t rarely high. But for one to still fail so spectacularly...that takes something special!\nThe cashier took my friends's order, then promptly ignored me. I had to force myself in front of a cashier who opened his register to wait on the person BEHIND me. I waited ov...
   Label: 0

   This means: ‚≠ê (1 star - Very Negative)


In [24]:
# Create a subset for faster training
NUM_TRAIN = 2500
NUM_TEST = 500

# Filter for food-related reviews (our creative twist!)
def is_food_related(example):
    text = example['text'].lower()
    food_keywords = ['food', 'restaurant', 'dinner', 'lunch', 'breakfast', 'meal', 'dish', 'eat']
    return any(keyword in text for keyword in food_keywords)

print("‚è≥ Filtering for food-related reviews...")
food_train = yelp_full['train'].filter(is_food_related)
food_test = yelp_full['test'].filter(is_food_related)

print(f"   Found {len(food_train)} training and {len(food_test)} test reviews about food")

# Select subsets
train_subset = food_train.shuffle(seed=42).select(range(min(NUM_TRAIN, len(food_train))))
test_subset = food_test.shuffle(seed=42).select(range(min(NUM_TEST, len(food_test))))

# Create DatasetDict
food_reviews_dataset = DatasetDict({
    'train': train_subset,
    'test': test_subset
})

print(f"\nüìä Final Dataset:")
print(food_reviews_dataset)

‚è≥ Filtering for food-related reviews...


Filter:   0%|          | 0/650000 [00:00<?, ? examples/s]

Filter:   0%|          | 0/50000 [00:00<?, ? examples/s]

   Found 468163 training and 36037 test reviews about food

üìä Final Dataset:
DatasetDict({
    train: Dataset({
        features: ['label', 'text'],
        num_rows: 2500
    })
    test: Dataset({
        features: ['label', 'text'],
        num_rows: 500
    })
})


In [25]:
# Check label distribution
from collections import Counter

label_counts = Counter(food_reviews_dataset['train']['label'])

print("üìä Label Distribution:")
for label in sorted(label_counts.keys()):
    count = label_counts[label]
    bar = "‚ñà" * (count // 10)
    print(f"   {label_names[label]}: {count} {bar}")

üìä Label Distribution:
   ‚≠ê (1 star - Very Negative): 392 ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
   ‚≠ê‚≠ê (2 stars - Negative): 512 ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
   ‚≠ê‚≠ê‚≠ê (3 stars - Neutral): 536 ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
   ‚≠ê‚≠ê‚≠ê‚≠ê (4 stars - Positive): 531 ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
   ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê (5 stars - Very Positive): 529 ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà


### 2.1 Tokenization for Multiclass

In [26]:
# Create a fresh tokenizer for multiclass
multiclass_tokenizer = AutoTokenizer.from_pretrained(MODEL_CHECKPOINT)

def tokenize_for_multiclass(examples):
    return multiclass_tokenizer(
        examples['text'],
        padding='max_length',
        truncation=True,
        max_length=256
    )

print("‚è≥ Tokenizing multiclass dataset...")
tokenized_food_dataset = food_reviews_dataset.map(tokenize_for_multiclass, batched=True)
print("‚úÖ Tokenization complete!")

‚è≥ Tokenizing multiclass dataset...


Map:   0%|          | 0/2500 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

‚úÖ Tokenization complete!


### 2.2 Setting Up the 5-Class Model

The key difference here is `num_labels=5` to handle the 5-star rating system.

In [27]:
# Load model for 5-class classification
multiclass_model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_CHECKPOINT,
    num_labels=5  # 5 classes: 1-5 stars (mapped to 0-4 internally)
)

multiclass_model.to(device)

print(f"‚úÖ Multiclass model loaded!")
print(f"   Number of labels: 5")
print(f"   Device: {device}")

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


‚úÖ Multiclass model loaded!
   Number of labels: 5
   Device: cuda


In [28]:
# Training arguments for multiclass (similar but with adjusted settings)
multiclass_training_args = TrainingArguments(
    output_dir="./food_rating_model",
    learning_rate=3e-5,  # Slightly higher for more classes
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    weight_decay=0.01,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    logging_steps=25,
    fp16=torch.cuda.is_available(),
)

# Create trainer
multiclass_trainer = Trainer(
    model=multiclass_model,
    args=multiclass_training_args,
    train_dataset=tokenized_food_dataset['train'],
    eval_dataset=tokenized_food_dataset['test'],
)

print("‚úÖ Multiclass trainer configured!")

‚úÖ Multiclass trainer configured!


In [29]:
# Train the multiclass model
print("üöÄ Starting multiclass fine-tuning...")
print("=" * 60)

multiclass_trainer.train()

print("\n" + "=" * 60)
print("‚úÖ Multiclass training complete!")

üöÄ Starting multiclass fine-tuning...


Epoch,Training Loss,Validation Loss
1,1.1235,1.07479
2,0.9555,1.032564
3,0.7405,1.036507



‚úÖ Multiclass training complete!


In [30]:
# Evaluate
multiclass_eval = multiclass_trainer.evaluate()

print("üìà Multiclass Evaluation Results:")
print(f"   Loss: {multiclass_eval['eval_loss']:.4f}")
print(f"   Runtime: {multiclass_eval['eval_runtime']:.2f} seconds")

üìà Multiclass Evaluation Results:
   Loss: 1.0326
   Runtime: 0.20 seconds


In [31]:
# Save the multiclass model
MULTICLASS_SAVE_PATH = "./food_rating_model/final"

multiclass_model.save_pretrained(MULTICLASS_SAVE_PATH)
multiclass_tokenizer.save_pretrained(MULTICLASS_SAVE_PATH)

print(f"‚úÖ Multiclass model saved to: {MULTICLASS_SAVE_PATH}")

‚úÖ Multiclass model saved to: ./food_rating_model/final


### 2.3 Multiclass Inference

In [32]:
# Load the multiclass model for inference
rating_model = AutoModelForSequenceClassification.from_pretrained(MULTICLASS_SAVE_PATH)
rating_tokenizer = AutoTokenizer.from_pretrained(MULTICLASS_SAVE_PATH)

rating_model.to(device)
rating_model.eval()

print("‚úÖ Multiclass model loaded for inference!")

‚úÖ Multiclass model loaded for inference!


In [33]:
def predict_rating(text):
    """
    Predicts the star rating (1-5) for a review.
    
    Args:
        text: The review text
        
    Returns:
        A dictionary with predicted rating and confidence
    """
    inputs = rating_tokenizer(
        text,
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=256
    )
    
    inputs = {key: value.to(device) for key, value in inputs.items()}
    
    with torch.no_grad():
        outputs = rating_model(**inputs)
        probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
        predicted_class = torch.argmax(probabilities, dim=-1).item()
        confidence = probabilities[0][predicted_class].item()
    
    # Convert 0-4 to 1-5 stars
    star_rating = predicted_class + 1
    stars = "‚≠ê" * star_rating
    
    return {
        "stars": star_rating,
        "display": stars,
        "confidence": confidence,
        "probabilities": probabilities[0].cpu().numpy()
    }

In [34]:
# Test reviews for multiclass prediction
food_test_reviews = [
    """The food was absolutely incredible! Best Italian restaurant in the city. 
    The pasta was cooked to perfection and the service was impeccable.""",
    
    """Terrible experience. The food was cold, the waiter was rude, 
    and we waited over an hour for our order. Never coming back.""",
    
    """It was okay. The food was average, nothing special. 
    Portions were decent for the price. Might try again.""",
    
    """Good food but slow service. The burger was tasty but 
    we had to wait 40 minutes. The fries were a bit cold.""",
    
    """A culinary masterpiece! The chef's tasting menu was an 
    unforgettable journey. Worth every penny. 10/10 would recommend!"""
]

print("üçΩÔ∏è Food Review Rating Predictions")
print("=" * 60)

for i, review in enumerate(food_test_reviews, 1):
    result = predict_rating(review)
    print(f"\nReview {i}: \"{review[:60]}...\"")
    print(f"   Predicted Rating: {result['display']} ({result['stars']} stars)")
    print(f"   Confidence: {result['confidence']:.2%}")

üçΩÔ∏è Food Review Rating Predictions

Review 1: "The food was absolutely incredible! Best Italian restaurant ..."
   Predicted Rating: ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê (5 stars)
   Confidence: 81.77%

Review 2: "Terrible experience. The food was cold, the waiter was rude,..."
   Predicted Rating: ‚≠ê (1 stars)
   Confidence: 83.78%

Review 3: "It was okay. The food was average, nothing special. 
    Por..."
   Predicted Rating: ‚≠ê‚≠ê (2 stars)
   Confidence: 48.08%

Review 4: "Good food but slow service. The burger was tasty but 
    we..."
   Predicted Rating: ‚≠ê‚≠ê‚≠ê (3 stars)
   Confidence: 53.70%

Review 5: "A culinary masterpiece! The chef's tasting menu was an 
    ..."
   Predicted Rating: ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê (5 stars)
   Confidence: 82.86%


### 2.4 Pushing Model to Hugging Face Hub ü§ó

You can share your fine-tuned model with the world by pushing it to the Hugging Face Hub. This makes it easy for others to use your model and for you to access it from anywhere!

**Prerequisites:**
- A Hugging Face account (free at [huggingface.co](https://huggingface.co))
- Hugging Face CLI login or access token

In [35]:
# First, login to Hugging Face Hub
# Option 1: Using notebook login (interactive)
from huggingface_hub import notebook_login
notebook_login()

# Option 2: Using token directly (uncomment if preferred)
# from huggingface_hub import login
# login(token="your_hf_token_here")

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv‚Ä¶

In [41]:
# Load the saved model from disk
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from huggingface_hub import create_repo, HfApi

MULTICLASS_SAVE_PATH = "./food_rating_model/final"

print("üìÇ Loading saved model from disk...")
saved_model = AutoModelForSequenceClassification.from_pretrained(MULTICLASS_SAVE_PATH)
saved_tokenizer = AutoTokenizer.from_pretrained(MULTICLASS_SAVE_PATH)
print("‚úÖ Model loaded!")

# Define your Hugging Face username and model name
HF_USERNAME = "your-username"  # üëà Replace with your HF username
MODEL_NAME = "your model name"

# Full repository ID
repo_id = f"{HF_USERNAME}/{MODEL_NAME}"

# Create repo if it doesn't exist (optional - push_to_hub does this automatically)
# Set private=True if you want a private repository
try:
    create_repo(repo_id, exist_ok=True, private=False)
    print(f"‚úÖ Repository ready: {repo_id}")
except Exception as e:
    print(f"‚ÑπÔ∏è Note: {e}")

print(f"\nüöÄ Pushing model to: {repo_id}")
print("=" * 50)

# Push the model and tokenizer to Hub
saved_model.push_to_hub(repo_id)
saved_tokenizer.push_to_hub(repo_id)

print("\n" + "=" * 50)
print(f"‚úÖ Model successfully pushed!")
print(f"üîó View your model at: https://huggingface.co/{repo_id}")

In [50]:
# # Alternative: Push using the Trainer (includes training metrics)
# # This is useful if you want to include training logs and metrics
# # Uncomment to use:

# # Define your Hugging Face username and model name
# HF_USERNAME = "your-username"  # üëà Replace with your HF username
# MODEL_NAME = "your model name"

# # Full repository ID
# repo_id = f"{HF_USERNAME}/{MODEL_NAME}"

# trainer.push_to_hub(repo_id)     # trainer name should be same as you defined
# tokenizer.push_to_hub(repo_id)   # tokenizer name should be same as you defined

### 2.5 Using Your Published Model

Once pushed, anyone can use your model with just two lines of code:

```python
from transformers import pipeline

# Load your model from the Hub
classifier = pipeline("text-classification", model="your-username/food-rating-predictor-distilbert")

# Predict ratings!
reviews = [
    "The pizza was absolutely delicious!",
    "Worst restaurant experience ever.",
    "Food was okay, nothing special."
]

for review in reviews:
    result = classifier(review)
    print(f"Review: {review}")
    print(f"Predicted: {result}")
    print()

---

## üéì Key Takeaways

### What We Learned:

1. **Fine-tuning Process**
   - Load pretrained model ‚Üí Prepare dataset ‚Üí Tokenize ‚Üí Train ‚Üí Evaluate ‚Üí Save

2. **Binary vs Multiclass Classification**
   - Binary: `num_labels=2` (positive/negative)
   - Multiclass: `num_labels=n` (e.g., 5 for star ratings)

3. **Key Classes from Hugging Face**
   - `AutoTokenizer`: Handles text-to-token conversion
   - `AutoModelForSequenceClassification`: Model with classification head
   - `TrainingArguments`: Training configuration
   - `Trainer`: Handles the training loop

4. **Best Practices**
   - Use small learning rates (2e-5 to 5e-5) to preserve pretrained knowledge
   - Filter and subset large datasets for manageable training
   - Save models after training for reuse
   - Use appropriate device (GPU/MPS/CPU) for efficiency

### Next Steps:

- Try fine-tuning on your own datasets
- Experiment with different base models (BERT, RoBERTa, etc.)
- Add metrics like accuracy, F1-score to evaluation
- Explore hyperparameter tuning for better results

---

## üìö References & Resources

- [Hugging Face Transformers Documentation](https://huggingface.co/docs/transformers)
- [Hugging Face Datasets](https://huggingface.co/docs/datasets)
- [Fine-tuning Guide](https://huggingface.co/docs/transformers/training)
- [DistilBERT Paper](https://arxiv.org/abs/1910.01108)

---