Team Number-8

Team Members-Chalani Kalpana,
             Nilmini Pusweli,
             Thilini Gamage

## Food Type Detection Using EfficientNet_B0 on Food101

Step 1: Load Pretrained Model-
We load the EfficientNet model (e.g., efficientnet_b0) which has been pre-trained on ImageNet. This helps the model start with strong visual features without training from scratch.

In [2]:

# Step 1: Import Libraries
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torchvision.datasets import Food101
from torchvision.models import efficientnet_b0
from torch.utils.data import DataLoader, random_split
import time
import os
import copy
import numpy as np

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")


Using device: cuda


Step 2: Load and Prepare Dataset-
We use the Food101 dataset, a large vision dataset of 101 food categories. Images are transformed (resized, normalized, converted to tensor) and split into training and test loaders for training and evaluation.

In [3]:

# Step 2: Load and Transform Food101 Dataset
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

food_train = Food101(root='./data', split='train', transform=transform, download=True)
food_test = Food101(root='./data', split='test', transform=transform, download=True)

# Subsample for faster training/testing
train_subset, _ = random_split(food_train, [5000, len(food_train) - 5000])
test_subset, _ = random_split(food_test, [1000, len(food_test) - 1000])

train_loader = DataLoader(train_subset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_subset, batch_size=32, shuffle=False)


100%|██████████| 5.00G/5.00G [03:40<00:00, 22.6MB/s]


Step 3: Replace the Classifier
Since the pretrained EfficientNet_B0 model was originally trained on ImageNet with 1000 output classes, we need to replace its final classification layer to match our target dataset — Food101 — which has 101 classes. This step ensures that the model can correctly predict the classes relevant to our food recognition task.

In [4]:
# Step 3: Load Pretrained EfficientNet_B0 and Modify Output Layer
model = efficientnet_b0(pretrained=True)
model.classifier[1] = nn.Linear(model.classifier[1].in_features, 101)
model = model.to(device)


Downloading: "https://download.pytorch.org/models/efficientnet_b0_rwightman-7f5810bc.pth" to /root/.cache/torch/hub/checkpoints/efficientnet_b0_rwightman-7f5810bc.pth
100%|██████████| 20.5M/20.5M [00:00<00:00, 101MB/s]


Step 4: Evaluate Pretrained Model (Before Fine-Tuning)
We evaluate the model on the test set before any fine-tuning. This gives us a baseline accuracy using the pretrained features.

In [6]:
# Step 4: Test Accuracy Before Fine-Tuning
def evaluate(model, dataloader, device):
    model.to(device)
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in dataloader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, preds = torch.max(outputs, 1)
            correct += (preds == labels).sum().item()
            total += labels.size(0)
    return 100 * correct / total


baseline_acc = evaluate(model, test_loader, device)
print(f"Test Accuracy (before fine-tuning): {baseline_acc:.2f}%")



Test Accuracy (before fine-tuning): 0.80%


Step 5: Fine-Tune the Model-
We train the model for one epoch on the Food101 training set. This helps the model adapt its learned features to the specific task of food classification.

In [7]:
# Step 5: Fine-Tune for 1 Epoch
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-4)

model.train()
for images, labels in train_loader:
    images, labels = images.to(device), labels.to(device)
    optimizer.zero_grad()
    outputs = model(images)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()


finetuned_acc = evaluate(model, test_loader, device)
print(f"Test Accuracy (after fine-tuning): {finetuned_acc:.2f}%")



Test Accuracy (after fine-tuning): 20.70%


Step 6: Save Fine-Tuned Model and Measure Model Size
After fine-tuning, we save the updated model weights to disk and measure the size of the saved model file. This helps us understand the storage requirements before applying any compression or quantization techniques.

In [10]:
# Step 6: Save Model and Measure Size
torch.save(model.state_dict(), "efficientnet_finetuned.pth")
original_size = os.path.getsize("efficientnet_finetuned.pth") / 1e6  # MB
print(f"Model size before quantization: {original_size:.2f} MB")


Model size before quantization: 16.85 MB


Step 7: Quantize Model and Measure Size After Quantization
We apply dynamic quantization to the fine-tuned model, save the quantized model, and measure its file size. This helps us understand the reduction in model size achieved through quanti

In [9]:
# Step 7: Quantize Model
quantized_model = torch.quantization.quantize_dynamic(copy.deepcopy(model), {nn.Linear}, dtype=torch.qint8)
torch.save(quantized_model.state_dict(), "efficientnet_quantized.pth")
quantized_size = os.path.getsize("efficientnet_quantized.pth") / 1e6  # MB
print(f"Model size after quantization: {quantized_size:.2f} MB")



Model size after quantization: 16.46 MB


Step 8: Apply Dynamic Quantization-
We use dynamic quantization to convert Linear layers of the model to 8-bit integers (INT8), which reduces model size and may speed up inference — especially on CPU.

In [11]:
# Step 8: Apply Dynamic Quantization
from torch.quantization import quantize_dynamic

quantized_model = quantize_dynamic(
    model.cpu(),                 # move model to CPU before quantization
    {torch.nn.Linear},           # specify layers to quantize
    dtype=torch.qint8            # use int8 for weights
)

quantized_model.eval()          # evaluation mode
quantized_model.to("cpu")       # make sure it's on CPU

quantized_acc = evaluate(quantized_model, test_loader, device="cpu")
print(f"Test Accuracy (after quantizing linear layers only): {quantized_acc:.2f}%")


Test Accuracy (after quantizing linear layers only): 21.10%


Step 9: Measure Inference Latency-
We measure and compare inference time (latency) before and after quantization. This step helps show the speed improvement that quantization can provide, especially on CPU.

In [12]:
# Step 9: Inference Latency Comparison
def measure_latency(model, loader, device='cpu', n_samples=10):
    model.to(device)
    model.eval()
    times = []
    count = 0
    with torch.no_grad():
        for images, _ in loader:
            images = images.to(device)
            start = time.time()
            _ = model(images)
            end = time.time()
            times.append((end - start) * 1000)  # milliseconds
            count += 1
            if count >= n_samples:
                break
    return np.mean(times)

# Measure latency
latency_before = measure_latency(model, test_loader, device='cuda')  # original model on GPU
latency_after = measure_latency(quantized_model, test_loader, device='cpu')  # quantized model on CPU

print(f"Inference latency before quantization (GPU): {latency_before:.2f} ms")
print(f"Inference latency after quantization (CPU): {latency_after:.2f} ms")


Inference latency before quantization (GPU): 10.45 ms
Inference latency after quantization (CPU): 2142.26 ms


Step 10: Report Summary Metrics-
We calculate and print:

Memory saving (%) after quantization

Accuracy drop caused by quantization

Test accuracies before and after fine-tuning and quantization

Inference latency (in milliseconds) before and after quantization

In [17]:
# Step 10: Inference Latency Comparison

def get_model_size(model, filename="temp.pth"):
    torch.save(model.state_dict(), filename)
    size_mb = os.path.getsize(filename) / 1e6  # MB
    os.remove(filename)
    return size_mb

# Measure size of both models
original_size = get_model_size(model, "original.pth")
quantized_size = get_model_size(quantized_model, "quantized.pth")

# Accuracy metrics (already evaluated earlier)
# finetuned_acc: accuracy after fine-tuning
# quantized_acc: accuracy after quantization

# Memory savings and accuracy drop
memory_saved = ((original_size - quantized_size) / original_size) * 100
accuracy_drop = finetuned_acc - quantized_acc

print(f"Original Model Size: {original_size:.2f} MB")
print(f"Quantized Model Size: {quantized_size:.2f} MB")
print(f"Memory Saving after Quantization: {memory_saved:.2f}%")
print(f"Accuracy Drop after Quantization: {accuracy_drop:.2f}%")


Original Model Size: 16.85 MB
Quantized Model Size: 16.45 MB
Memory Saving after Quantization: 2.33%
Accuracy Drop after Quantization: -0.40%


 Step 11:Latency measurement function

In [29]:
# Step 11:Latency measurement function
def measure_latency(model, device, dataset, num_samples=10):
    model.to(device)
    model.eval()
    times = []
    indices = random.sample(range(len(dataset)), num_samples)

    with torch.no_grad():
        for idx in indices:
            img, _ = dataset[idx]
            img = img.unsqueeze(0).to(device)  # Add batch dimension
            start = time.time()
            _ = model(img)
            end = time.time()
            times.append((end - start) * 1000)  # Convert to ms

    return sum(times) / len(times)

fp32_latency = measure_latency(model_fp32, device, food_test)
int8_latency = measure_latency(model_int8, 'cpu', food_test)

# Print results
print(f"Average Inference Latency (FP32) : {fp32_latency:.2f} ms")
print(f"Average Inference Latency (INT8) : {int8_latency:.2f} ms")


Average Inference Latency (FP32) : 8.84 ms
Average Inference Latency (INT8) : 39.43 ms


In [1]:
# Summary of Metrics

print("Summary of Model Evaluation Metrics")

print("Using device: cuda")
print("Test Accuracy (before fine-tuning): 0.80%")
print("Test Accuracy (after fine-tuning): 20.70%")
print("Model size before quantization: 16.85 MB")
print("Model size after quantization: 16.46 MB")
print("Test Accuracy (after quantizing linear layers only): 21.10%")
print("Inference latency before quantization (GPU): 10.45 ms")
print("Inference latency after quantization (CPU): 2142.26 ms")
print("Original Model Size: 16.85 MB")
print("Quantized Model Size: 16.45 MB")
print("Memory Saving after Quantization: 2.33%")
print("Accuracy Drop after Quantization: -0.40%")
print("Average Inference Latency (FP32) : 8.84 ms")
print("Average Inference Latency (INT8) : 39.43 ms")

Summary of Model Evaluation Metrics
Using device: cuda
Test Accuracy (before fine-tuning): 0.80%
Test Accuracy (after fine-tuning): 20.70%
Model size before quantization: 16.85 MB
Model size after quantization: 16.46 MB
Test Accuracy (after quantizing linear layers only): 21.10%
Inference latency before quantization (GPU): 10.45 ms
Inference latency after quantization (CPU): 2142.26 ms
Original Model Size: 16.85 MB
Quantized Model Size: 16.45 MB
Memory Saving after Quantization: 2.33%
Accuracy Drop after Quantization: -0.40%
Average Inference Latency (FP32) : 8.84 ms
Average Inference Latency (INT8) : 39.43 ms


Interpretation:
The results from the Food101 model analysis reveal several key insights:

Pre-trained Model Performance:
The original EfficientNet-B0 model, without any fine-tuning, achieved a very low accuracy of 0.80% on the Food101 dataset. This indicates that pre-trained models, which are typically trained on general datasets like ImageNet, may not directly generalize well to specialized tasks such as food classification.

Impact of Fine-tuning:
After fine-tuning the model on the Food101 dataset, the accuracy improved significantly to 20.70%. Although this accuracy is still modest, it clearly demonstrates that fine-tuning enables the model to adapt to domain-specific features, substantially enhancing performance.

Quantization Effects:
Quantizing only the linear layers of the model resulted in a slight increase in accuracy to 21.10%, indicating that quantization did not degrade model performance. Additionally, the model size was reduced by approximately 2.33%, which is beneficial for deploying models in resource-constrained environments such as mobile devices.

Inference Latency:
The original FP32 model running on a GPU exhibited low inference latency of around 10 ms, whereas the quantized INT8 model running on a CPU had much higher latency, approximately 2142 ms. The average measured inference latency was 8.84 ms for the FP32 model and 39.43 ms for the INT8 model on the tested devices. This shows that while quantization reduces model size and can maintain accuracy, improvements in latency heavily depend on the hardware being used. Quantization alone does not guarantee faster inference unless supported by compatible hardware acceleration.

Summary:
These findings highlight the importance of domain-specific fine-tuning to achieve reasonable accuracy in specialized applications like food classification. Quantization offers advantages in reducing model size without sacrificing accuracy, but its impact on inference speed varies depending on the deployment hardware. For practical deployment, both model optimization and hardware compatibility must be considered.

Conclusion-
The fine-tuned EfficientNet-B0 model demonstrates a substantial improvement in classification accuracy on the Food101 dataset, increasing from a low baseline of 0.80% to 20.70% after training. This underscores the importance of adapting pre-trained models to domain-specific datasets for meaningful performance gains. Applying quantization reduced the model size by approximately 2.33%, with minimal impact on accuracy, confirming that quantization is an effective technique to compress models for deployment without significant performance loss. However, the inference latency after quantization increased substantially on the CPU, highlighting that hardware capabilities greatly influence the practical benefits of quantization. Overall, fine-tuning combined with model quantization offers an effective strategy to balance accuracy, model size, and efficiency for real-world food classification applications.
