# Fine-tune Your Own Model with Hugging Face

This notebook provides a comprehensive guide to fine-tuning pre-trained models using the Hugging Face ecosystem. We'll cover the entire workflow from understanding when to fine-tune, to deploying your model in production.

## Table of Contents

1. [Introduction and Motivation](#1-introduction-and-motivation)
2. [Data Collection and Preparation](#2-data-collection-and-preparation)
3. [Selecting a Base Model](#3-selecting-a-base-model)
4. [Training and Fine-tuning](#4-training-and-fine-tuning)
5. [Evaluation](#5-evaluation)
6. [Deployment](#6-deployment)

Let's start by installing the necessary libraries:

In [None]:
# Install required packages
!pip install transformers datasets evaluate accelerate scikit-learn pandas matplotlib pillow torch

In [None]:
# Import common libraries
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datasets import load_dataset, Dataset, DatasetDict
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoFeatureExtractor
from transformers import AutoModelForImageClassification, TrainingArguments, Trainer
from transformers import EarlyStoppingCallback
import torch
import evaluate
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
from PIL import Image

# Set random seed for reproducibility
np.random.seed(42)
torch.manual_seed(42)

## 1. Introduction and Motivation

### Why Fine-tune Rather Than Train from Scratch?

Fine-tuning leverages pre-trained models that have already learned general patterns from large datasets, allowing you to adapt them to your specific task with relatively little data and computational resources.

Key advantages include:

- **Data Efficiency**: Requires significantly less task-specific data (often 10-100x less)
- **Computational Efficiency**: Training takes hours instead of weeks/months
- **Better Performance**: Especially with limited data, fine-tuned models outperform models trained from scratch
- **Knowledge Transfer**: Leverages general knowledge learned during pre-training

### Common Scenarios and Use Cases

Fine-tuning is particularly valuable in these scenarios:

1. **Domain Adaptation**: Adapting general models to specific domains (medical, legal, financial)
2. **Task Specialization**: Adapting a model trained for one task to a related task
3. **Low-Resource Settings**: When you have limited data or computational resources
4. **Multilingual Applications**: Adapting language models to specific languages
5. **Rapid Prototyping**: Quickly testing ideas and solutions

### When to Consider Fine-tuning

| Scenario | Recommendation |
|----------|----------------|
| Limited data (<10k examples) | **Fine-tune** - Training from scratch likely won't work well |
| Limited compute | **Fine-tune** - Significantly lower resource requirements |
| Tight timeline | **Fine-tune** - Faster to reach acceptable performance |
| Very different domain from available pre-trained models | **Consider domain-adaptive pre-training** before fine-tuning |
| Need for architectural innovation | **Train from scratch** if your architecture differs significantly |

Now that we understand why fine-tuning is valuable, let's dive into the most critical component of successful fine-tuning: your dataset.

## 2. Data Collection and Preparation

The quality, quantity, and relevance of your data is the single most important factor in fine-tuning success. Even the most sophisticated models can't overcome fundamental data problems ("garbage in, garbage out").

### 2.1 Strategies for Data Collection

Let's explore different approaches to collecting task-specific data:

#### Using Existing Public Datasets

The simplest approach is to leverage existing datasets from the Hugging Face Hub. Let's see how to browse and load datasets:

In [None]:
# Example: Loading a text classification dataset for sentiment analysis
imdb_dataset = load_dataset("imdb")

# Examine the dataset structure
print(f"Dataset structure: {imdb_dataset}")
print(f"Available splits: {list(imdb_dataset.keys())}")
print(f"Number of examples in train split: {len(imdb_dataset['train'])}")
print(f"Features: {imdb_dataset['train'].features}")

# Look at a sample example
print("\nSample example:")
print(imdb_dataset['train'][0])

In [None]:
# Example: Loading an image classification dataset
cifar10_dataset = load_dataset("cifar10")

# Examine the dataset structure
print(f"Dataset structure: {cifar10_dataset}")
print(f"Available splits: {list(cifar10_dataset.keys())}")
print(f"Number of examples in train split: {len(cifar10_dataset['train'])}")
print(f"Features: {cifar10_dataset['train'].features}")

# Visualize a few examples
fig, axes = plt.subplots(2, 5, figsize=(15, 6))
axes = axes.flatten()

for i, idx in enumerate(range(10)):
    example = cifar10_dataset['train'][idx]
    axes[i].imshow(example['img'])
    axes[i].set_title(f"Label: {example['label']}")
    axes[i].axis('off')

plt.tight_layout()
plt.show()

#### Creating Custom Datasets from Structured Sources

Often, you'll need to create a custom dataset from existing structured sources like CSV files or JSON data. Let's see how to create a custom dataset:

In [None]:
# Create a sample CSV file with product reviews
import pandas as pd
import numpy as np

# Generate synthetic product review data
np.random.seed(42)
n_samples = 1000

# Create product categories
categories = ['Electronics', 'Clothing', 'Home', 'Books', 'Beauty']

# Generate synthetic reviews
positive_templates = [
    "I love this {category} product! It's exactly what I needed.",
    "This {category} item exceeded my expectations. Highly recommend!",
    "Great quality {category} product, worth every penny.",
    "The best {category} purchase I've made this year.",
    "Excellent {category} item, works perfectly."
]

negative_templates = [
    "Disappointed with this {category} product. Would not recommend.",
    "Poor quality {category} item, broke after a few uses.",
    "This {category} product didn't meet my expectations at all.",
    "Waste of money for this {category} item.",
    "The {category} product has serious design flaws."
]

# Generate data
data = []
for i in range(n_samples):
    category = np.random.choice(categories)
    rating = np.random.randint(1, 6)  # 1-5 star rating
    
    if rating >= 4:  # Positive review
        template = np.random.choice(positive_templates)
        sentiment = 'positive'
    else:  # Negative review
        template = np.random.choice(negative_templates)
        sentiment = 'negative'
    
    review = template.format(category=category.lower())
    
    # Add some noise/variation
    if np.random.random() < 0.3:
        review = review + " " + np.random.choice(["Thank you!", "Never again.", "Will buy again.", "Save your money."])
    
    data.append({
        'review_id': i,
        'category': category,
        'rating': rating,
        'review_text': review,
        'sentiment': sentiment
    })

# Create DataFrame and save to CSV
df = pd.DataFrame(data)
csv_path = 'product_reviews.csv'
df.to_csv(csv_path, index=False)

# Display the first few rows
df.head()

In [None]:
# Convert the CSV to a Hugging Face dataset
from datasets import Dataset, DatasetDict

# Load the CSV data
df = pd.read_csv('product_reviews.csv')

# Create train/validation/test splits (80/10/10)
df = df.sample(frac=1, random_state=42).reset_index(drop=True)  # Shuffle
train_size = int(0.8 * len(df))
val_size = int(0.1 * len(df))

train_df = df[:train_size]
val_df = df[train_size:train_size+val_size]
test_df = df[train_size+val_size:]

# Convert to Hugging Face datasets
train_dataset = Dataset.from_pandas(train_df)
val_dataset = Dataset.from_pandas(val_df)
test_dataset = Dataset.from_pandas(test_df)

# Combine into a DatasetDict
product_reviews_dataset = DatasetDict({
    'train': train_dataset,
    'validation': val_dataset,
    'test': test_dataset
})

# Examine the dataset
print(f"Dataset structure: {product_reviews_dataset}")
print(f"Number of examples in train split: {len(product_reviews_dataset['train'])}")
print(f"Features: {product_reviews_dataset['train'].features}")

### 2.2 Data Cleaning and Preprocessing

Once you've collected your data, the next critical step is cleaning and preprocessing. The specific techniques depend on the data modality (text, image, audio, etc.).

#### Text Data Preprocessing

For text data, common preprocessing steps include:
- Removing HTML tags and special characters
- Handling contractions and abbreviations
- Normalizing case (lowercase/uppercase)
- Removing or replacing numbers
- Handling punctuation

Let's implement these for our product reviews dataset:

In [None]:
# Text preprocessing function
import re

def preprocess_text(text):
    """Basic text preprocessing for fine-tuning"""
    # Convert to lowercase
    text = text.lower()
    
    # Remove HTML tags
    text = re.sub(r'<.*?>', '', text)
    
    # Replace contractions
    contractions = {
        "won't": "will not",
        "can't": "cannot",
        "n't": " not",
        "'re": " are",
        "'s": " is",
        "'d": " would",
        "'ll": " will",
        "'ve": " have",
        "'m": " am"
    }
    for contraction, expansion in contractions.items():
        text = text.replace(contraction, expansion)
    
    # Remove excessive whitespace
    text = re.sub(r'\s+', ' ', text).strip()
    
    return text

# Apply preprocessing to a sample
sample_text = df['review_text'].iloc[0]
print(f"Original text: {sample_text}")
print(f"Preprocessed text: {preprocess_text(sample_text)}")

In [None]:
# Apply preprocessing to the entire dataset
def preprocess_dataset(dataset):
    """Apply text preprocessing to a dataset"""
    return dataset.map(
        lambda example: {'processed_text': preprocess_text(example['review_text'])}
    )

processed_dataset = preprocess_dataset(product_reviews_dataset)
print(processed_dataset)

#### Image Data Preprocessing

For image data, common preprocessing steps include:
- Resizing to a consistent dimension
- Normalization (scaling pixel values)
- Data augmentation (for training)

With Hugging Face, much of this is handled by the model's feature extractor or image processor. Let's see how this works with CIFAR-10:

In [None]:
# Load a pre-trained image processor for ViT
from transformers import AutoImageProcessor

image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")

# Function to preprocess images
def preprocess_images(examples):
    """Process images for the model"""
    # Process the images with the image processor
    # This handles resizing, normalization, etc.
    inputs = image_processor(examples["img"], return_tensors="pt")
    return inputs

# Process a sample image
sample_image = cifar10_dataset["train"][0]["img"]
processed_image = preprocess_images({"img": [sample_image]})

print(f"Original image shape: {np.array(sample_image).shape}")
print(f"Processed image shape: {processed_image['pixel_values'].shape}")

### 2.3 Data Formatting for Hugging Face

To train models with Hugging Face, we need to format our data appropriately. This typically involves:

1. Converting labels to numeric format
2. Tokenizing text data
3. Creating train/validation/test splits

Let's prepare our text classification dataset:

In [None]:
# Load a tokenizer for text classification
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

# Function to tokenize text and prepare for the model
def tokenize_function(examples):
    """Tokenize text data for the model"""
    return tokenizer(
        examples["processed_text"],
        padding="max_length",
        truncation=True,
        max_length=128
    )

# Convert labels to numeric format
def convert_labels(examples):
    """Convert sentiment labels to numeric format"""
    label_map = {"negative": 0, "positive": 1}
    return {"label": label_map[examples["sentiment"]]}

# Apply both functions to the dataset
tokenized_dataset = processed_dataset.map(convert_labels)
tokenized_dataset = tokenized_dataset.map(tokenize_function, batched=True)

# Keep only the necessary columns
tokenized_dataset = tokenized_dataset.remove_columns(
    ["review_id", "category", "rating", "review_text", "sentiment", "processed_text", "__index_level_0__"]
)

# Set the format for PyTorch
tokenized_dataset = tokenized_dataset.with_format("torch")

print(tokenized_dataset)

Now let's prepare our image classification dataset:

In [None]:
# Prepare CIFAR-10 for training
def prepare_cifar10(examples):
    """Prepare CIFAR-10 images for training"""
    # Process images
    inputs = image_processor(examples["img"], return_tensors="pt")
    inputs["labels"] = examples["label"]
    return inputs

# Apply to a small subset for demonstration
small_cifar = cifar10_dataset["train"].select(range(1000))
small_cifar_val = cifar10_dataset["test"].select(range(200))

# Create a smaller dataset for demonstration
cifar_small = DatasetDict({
    "train": small_cifar,
    "validation": small_cifar_val
})

# Prepare the dataset
prepared_cifar = cifar_small.map(prepare_cifar10, batched=True)
prepared_cifar = prepared_cifar.remove_columns(["img", "id"])
prepared_cifar = prepared_cifar.with_format("torch")

print(prepared_cifar)

## 3. Selecting a Base Model

Choosing the right pre-trained model is crucial for successful fine-tuning. The selection depends on your task, data, and computational constraints.

### 3.1 Criteria for Selecting Pre-trained Models

Consider these factors when selecting a base model:

1. **Task Compatibility**: Choose a model pre-trained on a task similar to yours
2. **Domain Relevance**: Consider models trained on data from your domain
3. **Model Size**: Larger models generally perform better but require more resources
4. **Computational Requirements**: Consider your hardware constraints
5. **Community Support**: Popular models have better documentation and resources

### 3.2 Model Architectures for Different Tasks

| Task | Common Architectures | Example Models |
|------|---------------------|----------------|
| Text Classification | BERT, RoBERTa, DistilBERT | distilbert-base-uncased, roberta-base |
| Named Entity Recognition | BERT, RoBERTa, LUKE | bert-base-NER, roberta-large |
| Question Answering | BERT, RoBERTa, T5 | deepset/roberta-base-squad2, t5-base |
| Text Generation | GPT-2, T5, BART | gpt2, t5-base, facebook/bart-large |
| Image Classification | ResNet, ViT, Swin | google/vit-base-patch16-224, microsoft/resnet-50 |
| Object Detection | DETR, Faster R-CNN | facebook/detr-resnet-50, faster-rcnn |
| Image Segmentation | SegFormer, DeepLabV3 | nvidia/segformer-b0-finetuned-ade-512-512 |
| Speech Recognition | Wav2Vec2, HuBERT | facebook/wav2vec2-base-960h |

### 3.3 Browsing Models on the Hugging Face Hub

The Hugging Face Hub hosts thousands of pre-trained models. You can browse them at https://huggingface.co/models.

Let's see how to programmatically explore models for our tasks:

In [None]:
# Import the HfAPI to search for models
from huggingface_hub import HfApi

api = HfApi()

# Search for text classification models
text_models = api.list_models(filter="text-classification", sort="downloads", direction=-1, limit=5)

print("Top 5 Text Classification Models:")
for model in text_models:
    print(f"- {model.id} (Downloads: {model.downloads})")

# Search for image classification models
image_models = api.list_models(filter="image-classification", sort="downloads", direction=-1, limit=5)

print("\nTop 5 Image Classification Models:")
for model in image_models:
    print(f"- {model.id} (Downloads: {model.downloads})")

For our text classification task, we'll use `distilbert-base-uncased`, a smaller, faster version of BERT that still performs well. For image classification, we'll use `google/vit-base-patch16-224`, a Vision Transformer model.

## 4. Training and Fine-tuning

Now that we have our data prepared and have selected our base models, let's implement the fine-tuning process.

### 4.1 Text Classification Fine-tuning

We'll fine-tune DistilBERT on our product reviews dataset for sentiment analysis:

In [None]:
# Load the pre-trained model for sequence classification
model_name = "distilbert-base-uncased"
num_labels = 2  # binary classification (positive/negative)

model = AutoModelForSequenceClassification.from_pretrained(
    model_name, 
    num_labels=num_labels
)

In [None]:
# Define metrics for evaluation
metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    """Compute metrics for evaluation"""
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    
    # Calculate multiple metrics
    accuracy = accuracy_score(labels, predictions)
    f1 = f1_score(labels, predictions, average="weighted")
    precision = precision_score(labels, predictions, average="weighted")
    recall = recall_score(labels, predictions, average="weighted")
    
    return {
        "accuracy": accuracy,
        "f1": f1,
        "precision": precision,
        "recall": recall
    }

In [None]:
# Configure training arguments
training_args = TrainingArguments(
    output_dir="./results/sentiment_model",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    push_to_hub=False,  # Set to True if you want to upload to Hugging Face Hub
    report_to="none"  # Disable reporting to avoid dependencies
)

In [None]:
# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=2)]
)

In [None]:
# Train the model
trainer.train()

In [None]:
# Evaluate on the test set
test_results = trainer.evaluate(tokenized_dataset["test"])
print(f"Test results: {test_results}")

In [None]:
# Save the model
trainer.save_model("./final_model/sentiment_model")
tokenizer.save_pretrained("./final_model/sentiment_model")

### 4.2 Image Classification Fine-tuning

Now let's fine-tune a Vision Transformer (ViT) model on the CIFAR-10 dataset:

In [None]:
# Load the pre-trained model for image classification
model_name = "google/vit-base-patch16-224"
num_labels = 10  # CIFAR-10 has 10 classes

# Get label names for CIFAR-10
label_names = cifar10_dataset["train"].features["label"].names
print(f"CIFAR-10 classes: {label_names}")

# Load the model with the correct number of labels
model = AutoModelForImageClassification.from_pretrained(
    model_name,
    num_labels=num_labels,
    ignore_mismatched_sizes=True  # Important when changing the number of labels
)

In [None]:
# Configure training arguments for image classification
training_args = TrainingArguments(
    output_dir="./results/cifar10_model",
    learning_rate=5e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    push_to_hub=False,
    report_to="none"
)

In [None]:
# Initialize the Trainer for image classification
image_trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=prepared_cifar["train"],
    eval_dataset=prepared_cifar["validation"],
    compute_metrics=compute_metrics,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=2)]
)

In [None]:
# Train the image classification model
image_trainer.train()

In [None]:
# Evaluate the image classification model
image_test_results = image_trainer.evaluate()
print(f"Image classification results: {image_test_results}")

In [None]:
# Save the image classification model
image_trainer.save_model("./final_model/cifar10_model")
image_processor.save_pretrained("./final_model/cifar10_model")

## 5. Evaluation

Now that we've trained our models, let's evaluate them more thoroughly and perform error analysis.

### 5.1 Text Classification Evaluation

Let's evaluate our sentiment analysis model on the test set and analyze its errors:

In [None]:
# Get predictions on the test set
predictions = trainer.predict(tokenized_dataset["test"])
preds = np.argmax(predictions.predictions, axis=-1)

# Get the original texts and true labels
test_df = df[train_size+val_size:].reset_index(drop=True)
test_texts = test_df["review_text"].tolist()
true_labels = [1 if s == "positive" else 0 for s in test_df["sentiment"].tolist()]

# Create a DataFrame with predictions
results_df = pd.DataFrame({
    "text": test_texts,
    "true_label": true_labels,
    "predicted_label": preds,
    "correct": np.equal(true_labels, preds)
})

# Calculate overall accuracy
accuracy = accuracy_score(true_labels, preds)
print(f"Overall accuracy: {accuracy:.4f}")

# Look at the confusion matrix
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

cm = confusion_matrix(true_labels, preds)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=["Negative", "Positive"])
disp.plot(cmap="Blues")
plt.title("Confusion Matrix")
plt.show()

In [None]:
# Error analysis: look at misclassified examples
errors_df = results_df[~results_df["correct"]].copy()
print(f"Number of errors: {len(errors_df)} out of {len(results_df)} examples")

# Add human-readable labels
errors_df["true_sentiment"] = errors_df["true_label"].map({0: "Negative", 1: "Positive"})
errors_df["predicted_sentiment"] = errors_df["predicted_label"].map({0: "Negative", 1: "Positive"})

# Display some misclassified examples
print("\nSample of misclassified examples:")
sample_errors = errors_df.sample(min(5, len(errors_df)))
for i, row in sample_errors.iterrows():
    print(f"\nText: {row['text']}")
    print(f"True sentiment: {row['true_sentiment']}")
    print(f"Predicted sentiment: {row['predicted_sentiment']}")

### 5.2 Image Classification Evaluation

Now let's evaluate our image classification model and analyze its errors:

In [None]:
# Get predictions on the validation set
image_predictions = image_trainer.predict(prepared_cifar["validation"])
image_preds = np.argmax(image_predictions.predictions, axis=-1)
true_image_labels = prepared_cifar["validation"]["labels"]

# Calculate accuracy
image_accuracy = accuracy_score(true_image_labels, image_preds)
print(f"Image classification accuracy: {image_accuracy:.4f}")

# Confusion matrix
image_cm = confusion_matrix(true_image_labels, image_preds)
image_disp = ConfusionMatrixDisplay(
    confusion_matrix=image_cm, 
    display_labels=label_names
)
fig, ax = plt.subplots(figsize=(10, 10))
image_disp.plot(ax=ax, cmap="Blues")
plt.title("CIFAR-10 Confusion Matrix")
plt.xticks(rotation=45, ha="right")
plt.tight_layout()
plt.show()

In [None]:
# Visualize some misclassified images
misclassified_indices = np.where(image_preds != true_image_labels)[0]
print(f"Number of misclassified images: {len(misclassified_indices)} out of {len(true_image_labels)}")

# Display some misclassified examples
if len(misclassified_indices) > 0:
    num_examples = min(10, len(misclassified_indices))
    fig, axes = plt.subplots(2, 5, figsize=(15, 6))
    axes = axes.flatten()
    
    for i, idx in enumerate(misclassified_indices[:num_examples]):
        # Get the original image
        original_idx = cifar_small["validation"][idx]["id"]
        img = cifar10_dataset["test"][original_idx]["img"]
        
        true_label = true_image_labels[idx]
        pred_label = image_preds[idx]
        
        axes[i].imshow(img)
        axes[i].set_title(f"True: {label_names[true_label]}\nPred: {label_names[pred_label]}")
        axes[i].axis('off')
    
    plt.tight_layout()
    plt.show()

### 5.3 Performance Comparison

Let's compare our fine-tuned models to some baselines to understand the benefits of fine-tuning:

In [None]:
# For text classification, compare to a simple baseline (TF-IDF + LogisticRegression)
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

# Prepare data
train_texts = train_df["review_text"].tolist()
train_labels = [1 if s == "positive" else 0 for s in train_df["sentiment"].tolist()]

test_texts = test_df["review_text"].tolist()
test_labels = [1 if s == "positive" else 0 for s in test_df["sentiment"].tolist()]

# Create a baseline model
baseline_model = Pipeline([
    ("vectorizer", TfidfVectorizer(max_features=5000)),
    ("classifier", LogisticRegression(max_iter=1000))
])

# Train and evaluate
baseline_model.fit(train_texts, train_labels)
baseline_preds = baseline_model.predict(test_texts)
baseline_accuracy = accuracy_score(test_labels, baseline_preds)

print(f"Baseline (TF-IDF + LogReg) accuracy: {baseline_accuracy:.4f}")
print(f"Fine-tuned DistilBERT accuracy: {accuracy:.4f}")
print(f"Improvement: {(accuracy - baseline_accuracy) * 100:.2f} percentage points")

## 6. Deployment

Once you've fine-tuned and evaluated your model, the next step is to deploy it for inference. There are several options for deploying Hugging Face models.

### 6.1 Local Inference

The simplest deployment option is to load your saved model for local inference:

In [None]:
# Load the saved sentiment analysis model
from transformers import pipeline

# Create a sentiment analysis pipeline
sentiment_analyzer = pipeline(
    "text-classification",
    model="./final_model/sentiment_model",
    tokenizer="./final_model/sentiment_model"
)

# Test with some examples
test_examples = [
    "This product is amazing! I love it.",
    "Terrible quality, broke after one use.",
    "Average product, nothing special but works fine."
]

# Run inference
results = sentiment_analyzer(test_examples)

# Display results
for text, result in zip(test_examples, results):
    label = "Positive" if result["label"] == "LABEL_1" else "Negative"
    print(f"Text: {text}")
    print(f"Sentiment: {label} (confidence: {result['score']:.4f})\n")

In [None]:
# Load the saved image classification model
image_classifier = pipeline(
    "image-classification",
    model="./final_model/cifar10_model",
    feature_extractor="./final_model/cifar10_model"
)

# Test with some examples from CIFAR-10
test_images = [cifar10_dataset["test"][i]["img"] for i in range(5)]

# Run inference
image_results = image_classifier(test_images)

# Display results
fig, axes = plt.subplots(1, 5, figsize=(15, 3))
for i, (img, result) in enumerate(zip(test_images, image_results)):
    axes[i].imshow(img)
    top_label = result[0]["label"]
    confidence = result[0]["score"]
    axes[i].set_title(f"{top_label}\n{confidence:.2f}")
    axes[i].axis('off')

plt.tight_layout()
plt.show()

### 6.2 Creating a Simple API with FastAPI

For production use, you might want to create an API. Here's a simple example using FastAPI:

In [None]:
%%writefile sentiment_api.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from transformers import pipeline
import uvicorn

# Define the request model
class SentimentRequest(BaseModel):
    text: str

# Initialize FastAPI
app = FastAPI(title="Sentiment Analysis API")

# Load the model at startup
@app.on_event("startup")
def load_model():
    global sentiment_analyzer
    sentiment_analyzer = pipeline(
        "text-classification",
        model="./final_model/sentiment_model",
        tokenizer="./final_model/sentiment_model"
    )

# Define the prediction endpoint
@app.post("/predict")
def predict_sentiment(request: SentimentRequest):
    try:
        # Run inference
        result = sentiment_analyzer(request.text)[0]
        
        # Convert to a more user-friendly format
        sentiment = "positive" if result["label"] == "LABEL_1" else "negative"
        confidence = result["score"]
        
        return {
            "text": request.text,
            "sentiment": sentiment,
            "confidence": confidence
        }
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

# Health check endpoint
@app.get("/health")
def health_check():
    return {"status": "healthy"}

# Run the API server when the script is executed directly
if __name__ == "__main__":
    uvicorn.run("sentiment_api:app", host="0.0.0.0", port=8000, reload=True)

To run this API, you would execute:

```bash
python sentiment_api.py
```

Then you can make requests to `http://localhost:8000/predict` with JSON data like `{"text": "I love this product!"}`. The API also provides automatic documentation at `http://localhost:8000/docs`.

### 6.3 Model Optimization

Before deploying to production, you might want to optimize your model for inference speed and memory usage. Common techniques include:

#### Quantization

Quantization reduces the precision of the model weights, typically from 32-bit floating point to 8-bit integers, significantly reducing model size and improving inference speed with minimal impact on accuracy.

In [None]:
# Example of dynamic quantization with PyTorch
import torch

# Load the fine-tuned model
model_path = "./final_model/sentiment_model"
model = AutoModelForSequenceClassification.from_pretrained(model_path)

# Apply dynamic quantization
quantized_model = torch.quantization.quantize_dynamic(
    model, 
    {torch.nn.Linear}, 
    dtype=torch.qint8
)

# Save the quantized model
quantized_model_path = "./final_model/sentiment_model_quantized"
quantized_model.save_pretrained(quantized_model_path)

# Compare model sizes
import os

def get_model_size(path):
    total_size = 0
    for dirpath, dirnames, filenames in os.walk(path):
        for f in filenames:
            fp = os.path.join(dirpath, f)
            total_size += os.path.getsize(fp)
    return total_size / (1024 * 1024)  # Convert to MB

original_size = get_model_size(model_path)
quantized_size = get_model_size(quantized_model_path)

print(f"Original model size: {original_size:.2f} MB")
print(f"Quantized model size: {quantized_size:.2f} MB")
print(f"Size reduction: {(1 - quantized_size/original_size) * 100:.2f}%")

#### Pruning

Pruning removes less important weights from the model, making it smaller and faster. This is often combined with fine-tuning to recover any lost accuracy.

#### Knowledge Distillation

Knowledge distillation trains a smaller "student" model to mimic a larger "teacher" model, transferring the knowledge while reducing model size.

### 6.4 Hugging Face Inference Endpoints

For a fully managed solution, you can use Hugging Face Inference Endpoints, which provide scalable, serverless deployment of your models.

To use this service:

1. Push your model to the Hugging Face Hub
2. Create an Inference Endpoint through the Hugging Face website
3. Make API requests to your endpoint

Here's how you would push your model to the Hub:

In [None]:
# Example of pushing a model to the Hugging Face Hub
# Note: You need to be logged in with `huggingface-cli login`

# Load the model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("./final_model/sentiment_model")
tokenizer = AutoTokenizer.from_pretrained("./final_model/sentiment_model")

# Push to the Hub (commented out as it requires authentication)
# model.push_to_hub("your-username/sentiment-analysis-model")
# tokenizer.push_to_hub("your-username/sentiment-analysis-model")

## Conclusion

In this notebook, we've covered the complete fine-tuning workflow using the Hugging Face ecosystem:

1. **Introduction and Motivation**: We learned when and why to fine-tune pre-trained models
2. **Data Collection and Preparation**: We explored different data sources and preprocessing techniques
3. **Selecting a Base Model**: We discussed criteria for choosing the right pre-trained model
4. **Training and Fine-tuning**: We implemented fine-tuning for both text and image classification
5. **Evaluation**: We evaluated our models and analyzed their errors
6. **Deployment**: We explored different deployment options, from local inference to APIs

Fine-tuning pre-trained models is a powerful technique that allows you to leverage state-of-the-art models for your specific tasks with relatively little data and computational resources. The Hugging Face ecosystem makes this process accessible and efficient.

Remember that the most critical component of successful fine-tuning is your dataset. Invest time in collecting and preparing high-quality, task-specific data, and you'll be rewarded with better model performance.