<a href="https://colab.research.google.com/github/Cado87/Fruit-classification/blob/main/Fruit_classifier_dataset_ethanjxu.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


Download dataset

New dataset for fine-tuning(https://huggingface.co/datasets/ethanjxu/fruit_images)

In [1]:
from datasets import load_dataset

ds = load_dataset("ethanjxu/fruit_images")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

data/train-00000-of-00002.parquet:   0%|          | 0.00/423M [00:00<?, ?B/s]

data/train-00001-of-00002.parquet:   0%|          | 0.00/425M [00:00<?, ?B/s]

data/test-00000-of-00001.parquet:   0%|          | 0.00/152M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/90580 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/15985 [00:00<?, ? examples/s]

**Reasoning**:
The dataset has been loaded into the variable `ds`. Now, I will explore its structure by printing the dataset object, displaying the first example from the training set, and printing the number of examples in each split. This will help understand the dataset's splits, features, and the format of the data.



In [2]:
print(ds)
print("\nExample from the 'train' split:")
print(ds['train'][0])
print(f"\nNumber of examples in the 'train' split: {len(ds['train'])}")
print(f"Number of examples in the 'test' split: {len(ds['test'])}")

DatasetDict({
    train: Dataset({
        features: ['image', 'label'],
        num_rows: 90580
    })
    test: Dataset({
        features: ['image', 'label'],
        num_rows: 15985
    })
})

Example from the 'train' split:
{'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=100x100 at 0x7B5D023B1110>, 'label': 89}

Number of examples in the 'train' split: 90580
Number of examples in the 'test' split: 15985


## Prepare data for training

### Subtask:
Preprocess the images and labels for training. This may involve resizing images, normalizing pixel values, and encoding labels.


**Reasoning**:
Define the preprocessing function to resize, convert to tensor, and normalize images, then apply it to both train and test splits, remove the original image column, rename the label column, and set the format to torch.



In [3]:
import torchvision.transforms as transforms
import torch

# 1. Define a preprocessing function
def preprocess_function(examples):
    # Resize images and convert to tensor
    transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) # ImageNet stats for normalization
    ])
    examples['pixel_values'] = [transform(image.convert("RGB")) for image in examples['image']]
    return examples

# 2. Apply the preprocessing function
processed_ds = ds.map(preprocess_function, batched=True, batch_size=32)

# 3. Remove the original 'image' column
processed_ds = processed_ds.remove_columns("image")

# 4. Rename the 'label' column
processed_ds = processed_ds.rename_column("label", "labels")

# 5. Set the format to 'torch'
processed_ds.set_format("torch")

print(processed_ds)
print("\nExample from the 'train' split after preprocessing:")
print(processed_ds['train'][0])

Map:   0%|          | 0/90580 [00:00<?, ? examples/s]

Map:   0%|          | 0/15985 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['labels', 'pixel_values'],
        num_rows: 90580
    })
    test: Dataset({
        features: ['labels', 'pixel_values'],
        num_rows: 15985
    })
})

Example from the 'train' split after preprocessing:
{'labels': tensor(89), 'pixel_values': tensor([[[2.2489, 2.2489, 2.2489,  ..., 2.2489, 2.2489, 2.2489],
         [2.2489, 2.2489, 2.2489,  ..., 2.2489, 2.2489, 2.2489],
         [2.2489, 2.2489, 2.2489,  ..., 2.2489, 2.2489, 2.2489],
         ...,
         [2.2489, 2.2489, 2.2489,  ..., 2.2489, 2.2489, 2.2489],
         [2.2489, 2.2489, 2.2489,  ..., 2.2489, 2.2489, 2.2489],
         [2.2489, 2.2489, 2.2489,  ..., 2.2489, 2.2489, 2.2489]],

        [[2.4286, 2.4286, 2.4286,  ..., 2.4286, 2.4286, 2.4286],
         [2.4286, 2.4286, 2.4286,  ..., 2.4286, 2.4286, 2.4286],
         [2.4286, 2.4286, 2.4286,  ..., 2.4286, 2.4286, 2.4286],
         ...,
         [2.4286, 2.4286, 2.4286,  ..., 2.4286, 2.4286, 2.4286],
         [2.4286

**Reasoning**:
The image data seems to have been removed after preprocessing. Modify the preprocessing function to ensure the processed image tensor is kept and the original image is removed afterwards.



## Define model

### Subtask:
Choose and define a model architecture suitable for image classification. A pre-trained model fine-tuned on this dataset would be a good starting point.

**Reasoning**:
We will use a pre-trained ResNet-18 model from `torchvision.models`. We will replace the last fully connected layer to match the number of classes in our dataset. We will also freeze the weights of the pre-trained layers and only train the new classification layer.

In [4]:
import torch.nn as nn
import torchvision.models as models
import torch

# Load a pre-trained ResNet-18 model
model = models.resnet18(pretrained=True)

# Get the number of unique labels from the training dataset
# Assuming processed_ds['train']['labels'] contains the labels
# Convert the labels column to a tensor
labels_tensor = torch.tensor(processed_ds['train']['labels'])
num_classes = len(torch.unique(labels_tensor))


# Replace the last fully connected layer
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, num_classes)

# Freeze the weights of the pre-trained layers
for param in model.parameters():
    param.requires_grad = False

# Unfreeze the weights of the last layer
for param in model.fc.parameters():
    param.requires_grad = True

# Print the model architecture
print(model)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 227MB/s]


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

In [None]:
import torch
from torch.utils.data import DataLoader

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.fc.parameters(), lr=0.001)

# Create DataLoaders
train_dataloader = DataLoader(processed_ds['train'], batch_size=32, shuffle=True)
test_dataloader = DataLoader(processed_ds['test'], batch_size=32, shuffle=False)

# Train the model
num_epochs = 5  # You can adjust this

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("The selected device is:", device)
model.to(device)

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for i, data in enumerate(train_dataloader, 0):
        inputs, labels = data['pixel_values'].to(device), data['labels'].to(device)

        optimizer.zero_grad()

        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if i % 100 == 99:    # print every 100 mini-batches
            print(f'[{epoch + 1}, {i + 1}] loss: {running_loss / 100:.3f}')
            running_loss = 0.0

print('Finished Training')

The selected device is: cuda
[1, 100] loss: 0.014
[1, 200] loss: 0.012
[1, 300] loss: 0.012
[1, 400] loss: 0.018
[1, 500] loss: 0.012
[1, 600] loss: 0.008
[1, 700] loss: 0.013
[1, 800] loss: 0.008
[1, 900] loss: 0.011
[1, 1000] loss: 0.009
[1, 1100] loss: 0.010
[1, 1200] loss: 0.017
[1, 1300] loss: 0.008
[1, 1400] loss: 0.007
[1, 1500] loss: 0.012
[1, 1600] loss: 0.008
[1, 1700] loss: 0.008
[1, 1800] loss: 0.006
[1, 1900] loss: 0.007
[1, 2000] loss: 0.007
[1, 2100] loss: 0.008
[1, 2200] loss: 0.007
[1, 2300] loss: 0.006
[1, 2400] loss: 0.005
[1, 2500] loss: 0.005
[1, 2600] loss: 0.009
[1, 2700] loss: 0.004
[1, 2800] loss: 0.007
[2, 100] loss: 0.002
[2, 200] loss: 0.005
[2, 300] loss: 0.004
[2, 400] loss: 0.005
[2, 500] loss: 0.004
[2, 600] loss: 0.006
[2, 700] loss: 0.006
[2, 800] loss: 0.006
[2, 900] loss: 0.004
