# Image Classification with PyTorch

This Python notebook demonstrates how to build an image classification model using PyTorch. We will use a custom dataset, define a convolutional neural network (CNN) architecture, train the model, and evaluate its performance.

## Importing Required Libraries

We start by importing the necessary libraries for our project:

- `os` for file and directory operations
- `torch` and `torch.nn` for building and training the neural network
- `torch.optim` for optimization algorithms
- `torch.utils.data` for creating data loaders and custom datasets
- `torchvision.transforms` for image transformations
- `PIL` for image loading and manipulation
- `matplotlib.pyplot` for visualizing images and plots
- `random` for random number generation

In [1]:
import os
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms
from PIL import Image
import matplotlib.pyplot as plt
import random

## Custom Dataset Class

We define a custom `WasteDataset` class that inherits from PyTorch's `Dataset` class. This class is responsible for loading and preprocessing the images from the dataset.

### Initialization

The `__init__` method takes the following parameters:
- `root_dir`: The root directory containing the dataset images.
- `split`: The dataset split (train, validation, or test).
- `transform`: Optional image transformations to be applied.

Inside the `__init__` method, we:
1. Store the `root_dir`, `transform`, and `split` parameters.
2. Get the list of class names by listing the directories in `root_dir`.
3. Initialize empty lists for `image_paths` and `labels`.
4. Iterate over each class directory and its subfolders ('default' and 'real_world').
5. Shuffle the image names in each subfolder.
6. Based on the `split` parameter, select a portion of the images (60% for train, 20% for validation, 20% for test).
7. Append the image paths and corresponding labels to the respective lists.

### Length and Item Retrieval

The `__len__` method returns the total number of images in the dataset.

The `__getitem__` method takes an `index` and returns the image and its corresponding label at that index. It:
1. Retrieves the image path and label using the provided index.
2. Opens the image using `Image.open()` and converts it to RGB format.
3. Applies the specified image transformations, if any.
4. Returns the transformed image and its label.

This custom dataset class allows us to easily load and preprocess the waste images for training, validation, and testing.

In [2]:
# Define the dataset class (modified to include a split parameter)
class WasteDataset(Dataset):
    def __init__(self, root_dir, split, transform=None):
        self.root_dir = root_dir
        self.transform = transform
        self.classes = sorted(os.listdir(root_dir))
        self.image_paths = []
        self.labels = []
        
        for i, class_name in enumerate(self.classes):
            class_dir = os.path.join(root_dir, class_name)
            for subfolder in ['default', 'real_world']:
                subfolder_dir = os.path.join(class_dir, subfolder)
                image_names = os.listdir(subfolder_dir)
                random.shuffle(image_names)
                
                if split == 'train':
                    image_names = image_names[:int(0.6 * len(image_names))]
                elif split == 'val':
                    image_names = image_names[int(0.6 * len(image_names)):int(0.8 * len(image_names))]
                else:  # split == 'test'
                    image_names = image_names[int(0.8 * len(image_names)):]
                
                for image_name in image_names:
                    self.image_paths.append(os.path.join(subfolder_dir, image_name))
                    self.labels.append(i)
    
    def __len__(self):
        return len(self.image_paths)
    
    def __getitem__(self, index):
        image_path = self.image_paths[index]
        label = self.labels[index]
        image = Image.open(image_path).convert('RGB')
        
        if self.transform:
            image = self.transform(image)
        
        return image, label

## CNN Model Architecture

We define a convolutional neural network (CNN) model called `CNN` that inherits from PyTorch's `nn.Module` class. This model architecture consists of convolutional layers, pooling layers, and fully connected layers.

### Initialization

The `__init__` method takes the following parameter:
- `num_classes`: The number of output classes in the classification task.

Inside the `__init__` method, we define the layers of the CNN:
1. `conv1`: A 2D convolutional layer with 3 input channels, 32 output channels, a kernel size of 3, stride of 1, and padding of 1.
2. `relu`: A ReLU activation function.
3. `maxpool`: A 2D max pooling layer with a kernel size of 2 and stride of 2.
4. `conv2`: Another 2D convolutional layer with 32 input channels, 64 output channels, a kernel size of 3, stride of 1, and padding of 1.
5. `fc1`: A fully connected layer that takes the flattened output of `conv2` and maps it to 512 features.
6. `fc2`: The final fully connected layer that takes the 512 features and maps them to the number of output classes.

### Forward Pass

The `forward` method defines the forward pass of the CNN model. It takes an input tensor `x` and applies the following operations:
1. Pass `x` through `conv1`, followed by `relu` activation and `maxpool`.
2. Pass the output through `conv2`, followed by `relu` activation and `maxpool`.
3. Flatten the output of `conv2` using `x.view(x.size(0), -1)`.
4. Pass the flattened tensor through `fc1`, followed by `relu` activation.
5. Pass the output of `fc1` through `fc2` to obtain the final output.

The output of the `forward` method represents the predicted class scores for each input sample.

This CNN architecture is designed to learn hierarchical features from the input images and make predictions based on those features. The convolutional layers capture local patterns, the pooling layers reduce spatial dimensions, and the fully connected layers perform the final classification.

In [3]:
# Define the CNN model
class CNN(nn.Module):
    def __init__(self, num_classes):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1)
        self.relu = nn.ReLU()
        self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        self.fc1 = nn.Linear(64 * 56 * 56, 512)
        self.fc2 = nn.Linear(512, num_classes)
    
    def forward(self, x):
        x = self.conv1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        x = self.conv2(x)
        x = self.relu(x)
        x = self.maxpool(x)
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

## Dataset Path and Hyperparameters

We set the following dataset path and hyperparameters:
- `dataset_path`: The path to the directory containing the dataset images.
- `batch_size`: The number of samples per batch during training and evaluation.
- `num_epochs`: The number of epochs to train the model.
- `learning_rate`: The learning rate for the optimizer.

These hyperparameters can be adjusted based on the specific requirements and available computational resources.

In [5]:
# Set the dataset path and hyperparameters
dataset_path = 'C:\\Users\\amitn\\Downloads\\Dataset for project\\Master Project\\Deep Learning\\CNN\\images'
batch_size = 32
num_epochs = 5
learning_rate = 0.001

## Data Preprocessing and Loaders

We define a composition of image transformations using `transforms.Compose`:
1. `transforms.Resize((224, 224))`: Resizes the images to a fixed size of (224, 224) pixels.
2. `transforms.ToTensor()`: Converts the images to PyTorch tensors.
3. `transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])`: Normalizes the image tensors using the specified mean and standard deviation values.

These transformations ensure that the images are preprocessed consistently before being fed into the model.

We create instances of the `WasteDataset` class for the train, validation, and test splits, passing the `dataset_path`, `split`, and `transform` parameters. This allows us to load the dataset images with the specified transformations for each split.

Finally, we create data loaders for each dataset using `DataLoader`:
- `train_dataloader`: Loads the training data in batches of size `batch_size` and shuffles the samples.
- `val_dataloader`: Loads the validation data in batches of size `batch_size` without shuffling.
- `test_dataloader`: Loads the test data in batches of size `batch_size` without shuffling.

The data loaders provide an efficient way to iterate over the dataset during training and evaluation, handling batching and shuffling as specified.

In [7]:
# Create the datasets and data loaders
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
train_dataset = WasteDataset(dataset_path, split='Train', transform=transform)
val_dataset = WasteDataset(dataset_path, split='Check', transform=transform)
test_dataset = WasteDataset(dataset_path, split='Test', transform=transform)
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
test_dataloader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

FileNotFoundError: [WinError 3] The system cannot find the path specified: 'C:\\Users\\amitn\\Downloads\\Dataset for project\\Master Project\\Deep Learning\\CNN\\images\\Check\\default'

### Model Initialization

We create an instance of the `CNN` model, passing the `num_classes` parameter. The `num_classes` is determined by the number of unique classes in the dataset, which is obtained using `len(dataset.classes)`. This ensures that the model's output layer has the correct number of units corresponding to the number of classes.

The model is then moved to the GPU using `.to('cuda')` to take advantage of GPU acceleration for faster training and inference.

### Loss Function

We define the loss function using `nn.CrossEntropyLoss()`. Cross-entropy loss is commonly used for multi-class classification tasks. It measures the dissimilarity between the predicted class probabilities and the true class labels, providing a measure of how well the model is performing.

### Optimizer

We create an optimizer using `optim.Adam()`, passing the model's parameters (`model.parameters()`) and the learning rate (`lr=learning_rate`). Adam (Adaptive Moment Estimation) is a popular optimization algorithm that adapts the learning rate for each parameter based on the first and second moments of the gradients. It combines the benefits of AdaGrad and RMSprop optimizers.

The optimizer is responsible for updating the model's parameters during the training process based on the computed gradients and the specified learning rate.

With the model, loss function, and optimizer defined, we are ready to proceed with training the CNN model on the waste classification dataset.

In [None]:
# Create the model, loss function, and optimizer
num_classes = len(train_dataset.classes)
model = CNN(num_classes).to('cuda')
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

## Training Loop

We define lists `train_losses` and `val_losses` to store the training and validation losses for each epoch, respectively. These lists will be used to monitor the model's performance during training.

The training loop iterates over the specified number of epochs (`num_epochs`). In each epoch, we perform the following steps:

### Training Phase

1. Set the model to training mode using `model.train()`. This enables dropout and batch normalization layers to update their parameters during training.

2. Initialize the `train_loss` variable to keep track of the cumulative training loss for the current epoch.

3. Iterate over the training data using `train_dataloader`:
  - Move the images and labels to the GPU using `.to('cuda')`.
  - Forward pass: Pass the images through the model to obtain the predicted outputs.
  - Compute the loss using the defined criterion (`criterion(outputs, labels)`).
  - Backward pass: Zero the gradients using `optimizer.zero_grad()`, compute the gradients using `loss.backward()`, and update the model parameters using `optimizer.step()`.
  - Accumulate the training loss for the current batch.

4. Compute the average training loss for the epoch by dividing the cumulative loss by the total number of training samples.

5. Append the average training loss to the `train_losses` list.

### Validation Phase

1. Set the model to evaluation mode using `model.eval()`. This disables dropout and batch normalization layers during inference.

2. Initialize the `val_loss` variable to keep track of the cumulative validation loss for the current epoch.

3. Disable gradient computation using `torch.no_grad()` to speed up the validation process.

4. Iterate over the validation data using `val_dataloader`:
  - Move the images and labels to the GPU using `.to('cuda')`.
  - Forward pass: Pass the images through the model to obtain the predicted outputs.
  - Compute the loss using the defined criterion (`criterion(outputs, labels)`).
  - Accumulate the validation loss for the current batch.

5. Compute the average validation loss for the epoch by dividing the cumulative loss by the total number of validation samples.

6. Append the average validation loss to the `val_losses` list.

After each epoch, we print the current epoch number, training loss, and validation loss to monitor the progress of the training process.

Once all epochs are completed, we print a message indicating that the training is finished.

In [None]:
# Lists to store the training and validation losses
train_losses = []
val_losses = []

# Training loop
for epoch in range(num_epochs):
    # Training
    model.train()
    train_loss = 0.0
    for images, labels in train_dataloader:
        images = images.to('cuda')
        labels = labels.to('cuda')
        
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        train_loss += loss.item() * images.size(0)
    
    train_loss /= len(train_dataset)
    train_losses.append(train_loss)
    
    # Validation
    model.eval()
    val_loss = 0.0
    with torch.no_grad():
        for images, labels in val_dataloader:
            images = images.to('cuda')
            labels = labels.to('cuda')
            
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            val_loss += loss.item() * images.size(0)
    
    val_loss /= len(val_dataset)
    val_losses.append(val_loss)
    
    print(f"Epoch [{epoch+1}/{num_epochs}], Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}")

print("Training completed!")

## Results Visualization

In this section, we visualize the results from training and inference.

In [None]:
# Plot the training and validation losses
plt.figure(figsize=(10, 5))
plt.plot(train_losses, label='Training Loss')
plt.plot(val_losses, label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

In [None]:
# Perform sample inferences on random test images with different labels
model.eval()
with torch.no_grad():
    indices = list(range(len(test_dataset)))
    random.shuffle(indices)
    
    selected_images = []
    selected_labels = []
    selected_predicted = []
    
    for index in indices:
        image, label = test_dataset[index]
        image = image.unsqueeze(0).to('cuda')
        
        output = model(image)
        _, predicted = torch.max(output, 1)
        
        if label not in selected_labels:
            selected_images.append(image)
            selected_labels.append(label)
            selected_predicted.append(predicted.item())
        
        if len(selected_labels) == 9:
            break
    
    fig, axes = plt.subplots(3, 3, figsize=(12, 12))
    axes = axes.flatten()
    
    for i in range(9):
        axes[i].imshow(selected_images[i].squeeze().cpu().permute(1, 2, 0))
        axes[i].set_title(f"True: {train_dataset.classes[selected_labels[i]]}\nPredicted: {train_dataset.classes[selected_predicted[i]]}")
        axes[i].axis('off')
    
    plt.tight_layout()
    plt.show()