In [None]:
# Sumani
# 5-8-2024

# Building and Training a CNN model on CIFAR-10 Using PyTorch

This notebook demonstrates how to build a Convolutional Neural Network (CNN) using PyTorch to classify images from the CIFAR-10 dataset.

## CIFAR-10 Dataset

The CIFAR-10 dataset is a widely used dataset for training machine learning and computer vision algorithms. It was created by the Canadian Institute For Advanced Research (CIFAR). The CIFAR-10 dataset is often used to benchmark the performance of new algorithms in the field of image classification. It provides a challenging classification task where each image must be correctly labeled with one of the 10 classes.

Here are some key points about the dataset:

* Number of Images: The dataset contains 60,000 color images, each of size 32x32 pixels.
* Classes: There are 10 different classes in the CIFAR-10 dataset. Each class represents a distinct object category: Airplane, Automobile, Bird, Cat, Deer, Dog, Frog, Horse, Ship, Truck.

* Training and Testing Split: The dataset is divided into two parts:
    * 50,000 training images
    * 10,000 test images

Image Characteristics: All images in the dataset are RGB (color) images, and each image has dimensions of 32x32 pixels. This small size makes the dataset suitable for experimenting with various image recognition algorithms without requiring extensive computational resources.

Normalization: Before training a model on the CIFAR-10 dataset, it is common practice to normalize the images. Normalization helps in speeding up the training process and achieving better performance. Typically, the images are normalized to have a mean and standard deviation of 0.5 for each of the RGB channels.

## Import Libraries

We start by importing the necessary libraries. PyTorch is a popular deep learning framework, and torchvision provides utilities for working with image data.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np


# Check device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


## Load and Preprocess Data

We load the CIFAR-10 dataset and apply transformations to the images. These transformations include random horizontal flipping, random cropping, and normalization. Normalization scales the pixel values to a range that is suitable for training a neural network.

### Define transformations for training and testing datasets

In computer vision, it is a standard practice to transform the dataset. These transformations help in augmenting the training data and standardizing the input, which can lead to better model performance and generalization.

* *transforms.Compose*: transforms.Compose is a method provided by the torchvision.transforms module. It allows you to chain multiple image transformations together. This means that each transformation is applied sequentially to the images.

* *transforms.RandomHorizontalFlip()*: This transformation randomly flips the image horizontally with a probability of 0.5. Horizontal flipping is a common data augmentation technique that helps to make the model invariant to the horizontal orientation of objects in the image. This can improve the model's generalization to unseen data.

* *transforms.RandomCrop(32, padding=4)*: This transformation crops the image to a size of 32x32 pixels. The padding=4 argument specifies that 4 pixels of padding (with zeros) should be added to each side of the image before cropping. The padded image will have dimensions of 40x40 pixels, from which a 32x32 patch is randomly cropped. This data augmentation technique helps the model learn to be robust to slight shifts and translations of the objects within the images.

* *transforms.ToTensor()*: This transformation converts the image from a PIL image or a NumPy array to a PyTorch tensor. This conversion is necessary because PyTorch models expect input data in the form of tensors. The image data is also scaled from a range of [0, 255] to [0, 1].

* *transforms.Normalize()*: This transformation normalizes the image tensor by subtracting the mean and dividing by the standard deviation for each color channel (RGB). Normalization helps to standardize the pixel values and speeds up the convergence of the training process. It also ensures that each feature (pixel value) has a similar scale, which can improve the performance of the neural network. The mean and standard deviation values used here are specific to the CIFAR-10 dataset:
    * Mean: (0.4914, 0.4822, 0.4465)
    * Standard Deviation: (0.2023, 0.1994, 0.2010)

In [None]:
# Define transformations for training and testing datasets
transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32, padding=4),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])

### Load training and testing datasets

In [None]:
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=100, shuffle=False, num_workers=2)

# Define classes
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')