## CIFAR 100 Classification

# Stage 1: Create Dataset and Dataloader

In this case we are going to develop a cifar 100 classification problem, for that we a re going to use the cifar100 dataset availablwe in torchvision, Although we are going to apply data augmentation for make the neuronal network more robust, and batch size of 64

For this time i will use 256x256 size images, letting to the neuronal network learn better, so i have made some changes in stage 2 

In [12]:
import torch,os
from torchvision import datasets, transforms
from torch.utils.data import random_split, ConcatDataset, DataLoader

#Augmentation data for being morerobust
transform_train = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomAdjustSharpness(sharpness_factor=2, p=0.5),
    transforms.RandomResizedCrop(256),
    transforms.ToTensor(),  # Converts the image into a tensor
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) 
])

transform_test = transforms.Compose([
    transforms.Resize((256,256)),  # Resizes the image to 64x64
    transforms.ToTensor(),  # Converts the image into a tensor
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))  # Normalizes the tensors (mean and std deviation for 3 color channels)
])

# Create Train dataset
train_dataset = datasets.CIFAR100(root = './dataset/train',download=True, train=True, transform=transform_train)
# Create Test dataset
test_dataset = datasets.CIFAR100(root = './dataset/test',download=True, train=False, transform=transform_test) 

#Create train loader
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
#Create test loader
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=True)

Files already downloaded and verified
Files already downloaded and verified


# Stage 2: building neural network model

Stride: The stride refers to how many pixels the filter moves through the image or input volume at each step during the convolution operation. A stride of 1 means that the filter moves one pixel at a time. A stride of 2 means that the filter moves two pixels at a time, and so on. A larger stride will result in a lower spatial dimension output.

Padding: Padding refers to the addition of extra pixels around the input image or volume before applying the convolution operation. The purpose of padding is to control the spatial dimension of the output. It is especially useful when you want to keep the spatial dimensions of the input and output the same after the convolution operation.

For this time i will use 4 convolutional layer, letting to the network learn more complex forms

In [13]:
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # First convolutional layer: input channels = 3 (RGB), output channels = 32
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1)
        # Second convolutional layer: input channels = 32 (from previous layer), output channels = 64
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        # Third convolutional layer: input channels = 64 (from previous layer), output channels = 128
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1)
        # Fourth convolutional layer: input channels = 128 (from previous layer), output channels = 256
        self.conv4 = nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1)
        # Max pooling layer
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        # Dropout layer
        self.dropout = nn.Dropout(0.1)
        # First fully connected layer, input size should match the output size of the last conv layer
        self.fc1 = nn.Linear(256 * 16 * 16, 500)
        # Second fully connected layer, output size is the same as the number of classes
        self.fc2 = nn.Linear(500, 100)

    def forward(self, x):
        # Apply first conv layer, followed by ReLU, then max pooling
        x = self.pool(F.relu(self.conv1(x)))
        # Apply second conv layer, followed by ReLU, then max pooling
        x = self.pool(F.relu(self.conv2(x)))
        # Apply third conv layer, followed by ReLU, then max pooling
        x = self.pool(F.relu(self.conv3(x)))
        # Apply fourth conv layer, followed by ReLU, then max pooling
        x = self.pool(F.relu(self.conv4(x)))
        # Flatten the tensor output from the conv layers
        x = x.view(-1, 256 * 16 * 16)
        # Apply first fully connected layer with ReLU after applying dropout
        x = F.relu(self.fc1(self.dropout(x)))
        # Apply second fully connected layer after applying dropout
        x = self.fc2(self.dropout(x))
        return x


# Stage 3: Train model

For this, we need to define a loss function and an optimiser. We will use Cross Entropy as our loss function, as it is a good choice for classification problems. For the optimiser, we will use Adam.

Furthermore, we will divide our dataset into a training set and a validation set. During each epoch, we will train the model on the training set and then evaluate it on the validation set. If the performance on the validation set improves, we will save the model.

Actually it has been trained with 42 epochs, improveing for 5.7... to 2.622346130905637 and still getting better

In [16]:
import torch.optim as optim
from torch.utils.data import random_split, DataLoader
from torchvision import transforms
from tqdm import tqdm
import torch

# Try to use cuda if posible
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("It is using: " + device.type)

# Initialice the network
model = Net().to(device)

# Path to save the model
model_path = 'best_model.pth'
if os.path.exists(model_path):
    print("Previous mode was loaded.")
    model = Net()
    model.load_state_dict(torch.load(model_path))
    model.to(device)
    
else:
    print("Not previous model found.")
    model = Net().to(device)
    
# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-5)  # L2 regularization

# Define a number of training epochs
epochs = 1

#actually is my best
best_loss = 50

best_val_loss = 411.70834255218506  # Initialize with a high value

# Training loop
for epoch in range(epochs):
    # Set the model to training mode
    model.train()
    
    # Create a progress bar
    progress_bar = tqdm(train_loader, desc=f'Epoch {epoch+1}')
    
    for inputs, labels in progress_bar:
        # Move data to the GPU if available
        inputs, labels = inputs.to(device), labels.to(device)

        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, labels)

        # Backward pass and optimization
        loss.backward()
        optimizer.step()

        # Update the progress bar
        progress_bar.set_postfix({'training_loss': loss.item()})

    # Set the model to evaluation mode
    model.eval()
    val_loss = 0
    with torch.no_grad():
        for inputs, labels in test_loader:
            # Move data to the GPU if available
            inputs, labels = inputs.to(device), labels.to(device)

            # Forward pass
            outputs = model(inputs)
            loss = criterion(outputs, labels)

            # Update the validation loss
            val_loss += loss.item()

    # Print epoch loss
    print(f'Epoch {epoch+1}, Validation Loss: {val_loss/len(test_loader)}')

    # Save the model if it has the best validation loss so far
    print(f'El loss actual es {val_loss} y el mejor es {best_val_loss}')
    if val_loss < best_val_loss:
        print("model saved")
        best_val_loss = val_loss
        torch.save(model.state_dict(), 'best_model.pth')

print('Finished Training')

It is using: cuda
Previous mode was loaded.


Epoch 1: 100%|██████████| 782/782 [02:34<00:00,  5.08it/s, training_loss=2.71]


# Stage 4 my own tests

As you can see below, the neuronal network still neading more epochs for improve its results

In [8]:
import pickle
# Load the saved model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("It is using: " + device.type)
model = Net().to(device)
model.load_state_dict(torch.load('best_model.pth'))

# Set the model to evaluation mode
model.eval()

# Get a batch of validation data
inputs, labels = next(iter(test_loader))
inputs = inputs.to(device)

# Make predictions
with torch.no_grad():
    outputs = model(inputs)

probabilities = F.softmax(outputs, dim=1)

# The outputs are probabilities for each class. To get the predicted class, we take the index of the highest probability.
_, preds = torch.max(probabilities, 1)

# Cargando las etiquetas de CIFAR-100
with open('./dataset/train/cifar-100-python/meta', 'rb') as file:
    data = pickle.load(file, encoding='bytes')
    fine_label_names = [t.decode('utf8') for t in data[b'fine_label_names']]

# Utilizando las etiquetas para imprimir las clases predichas
print('Predicted:', [fine_label_names[i] for i in preds])
print('True:     ', [fine_label_names[i] for i in labels])


It is using: cuda
Predicted: ['caterpillar', 'chimpanzee', 'snake', 'cockroach', 'motorcycle', 'train', 'dolphin', 'sweet_pepper', 'bowl', 'shark', 'hamster', 'television', 'bee', 'sweet_pepper', 'maple_tree', 'bicycle', 'spider', 'hamster', 'cockroach', 'forest', 'lawn_mower', 'palm_tree', 'baby', 'motorcycle', 'rose', 'apple', 'streetcar', 'wardrobe', 'porcupine', 'tiger', 'pear', 'house', 'forest', 'tiger', 'orchid', 'bicycle', 'pear', 'cup', 'television', 'sunflower', 'bicycle', 'tulip', 'bed', 'couch', 'boy', 'bottle', 'wardrobe', 'bee', 'cockroach', 'cup', 'chimpanzee', 'ray', 'shark', 'kangaroo', 'pine_tree', 'sunflower', 'trout', 'trout', 'crab', 'tractor', 'motorcycle', 'sweet_pepper', 'sunflower', 'bicycle']
True:      ['caterpillar', 'shark', 'tank', 'cockroach', 'cattle', 'bus', 'wolf', 'beetle', 'bowl', 'worm', 'bed', 'telephone', 'bee', 'sweet_pepper', 'willow_tree', 'chimpanzee', 'spider', 'hamster', 'cockroach', 'forest', 'lawn_mower', 'beetle', 'girl', 'motorcycle', 'r