<a href="https://colab.research.google.com/github/JoshBoii/Convolutional-Neural-Network-/blob/main/CIFAR_10_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
#Import necessary libraries and modules
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
import torchvision
import torchvision.datasets as datasets
import torchvision.transforms as transforms
from itertools import product
import matplotlib.pyplot as plt
import numpy as np
import pickle


In [2]:
!nvidia-smi

Tue Apr  4 21:49:19 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   29C    P0    45W / 400W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

batch_size = 128 #64, 256, or 512, to see if they improve model performance
num_epochs = 10 #Increasing the number of epochs might give the model more time to learn. Try increasing it to 20, 50, or 100 later,
comment out poor accuracy hyperparameters

`num_blocks` = number of blocks in the network, each block consists of several convolutional layers that extract features from the input.

Increasing the number of blocks can increase the model's capacity to learn complex features, but also increases the risk of **overfitting** the training data if the model becomes too complex.

`num_convs` = the number of convolutional layers within each block. 

Increasing the number of convolutions can also increase the model's capacity to learn complex features, but also increases the risk of **overfitting**.

`num_classes`= the number of output classes that the model can predict. 

Increasing the number of classes will make the problem more complex, and thus the model will require more capacity to learn the mapping between inputs and outputs. However, if the dataset is too small for the number of classes, the model might not be able to learn the necessary features to discriminate between the classes, which can lead to poor accuracy.

In [3]:
# Define hyperparameters
num_classes = 10
batch_size = 128 
num_epochs = 10 

num_blocks = 3
num_convs = 2
num_classes = 10


num_blocks_values = list(range(2, 7)) # 2 to 6 as (2, 7)
num_convs_values = list(range(1, 5)) # 1 to 4 as (1, 5)
learning_rate_values = [0.00005, 0.0001, 0.0005, 0.001, 0.005, 0.01]

Footnote

[1] trial run showed 0.00005 and 0.01 give poor accuracy; very low learning 
rates like 0.00005 result in slow training + poor convergence, while very high learning rates like 0.01 may cause the model to overshoot the optimal weights and result in poor accuracy.

[2]  too few blocks (2) do not provide enough depth for the model to learn complex features and too many blocks (6) may cause the model to "overfit" to training data.

In [4]:
# Define hyperparameters to tune
num_blocks_values = list(range(4, 6)) # 2 to 6 as (2, 7)
num_convs_values = list(range(1, 5)) # 1 to 4 as (1, 5)
learning_rate_values = [0.0005, 0.001, 0.005]

# Create a list of all hyperparameter combinations
hyperparams = list(product(num_blocks_values, num_convs_values, learning_rate_values))

In [5]:
#Read the dataset and create dataloaders
transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(15),
    transforms.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])

In [6]:
# Load CIFAR-10 dataset
train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)
test_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:01<00:00, 106381870.41it/s]


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


In [7]:
# Create dataloaders
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

In [8]:
# Define the model architecture 
class Block(nn.Module):
    def __init__(self, in_channels, K, dropout_rate=0.5):
        super(Block, self).__init__()
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(in_channels, K)
        self.convs = nn.ModuleList(
            [nn.Sequential(
                nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=1, padding=1),
                nn.BatchNorm2d(in_channels),
                nn.ReLU(inplace=True),
                nn.Dropout(dropout_rate)
            ) for _ in range(K)
            ])
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        identity = x
        out = self.avgpool(x)
        out = torch.flatten(out, 1)
        a = self.fc(out)
        a = torch.softmax(a, dim=1).unsqueeze(2).unsqueeze(3)
        conv_outs = [a[:, i:i + 1] * conv(x) for i, conv in enumerate(self.convs)]
        out = sum(conv_outs)
        out += identity
        out = self.relu(out)
        return out

class Model(nn.Module):
    def __init__(self, num_blocks, K, num_classes, dropout_rate):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1)
        self.relu = nn.ReLU(inplace=True)
        self.blocks = nn.Sequential(*[Block(64, K) for _ in range(num_blocks)])
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.dropout = nn.Dropout(dropout_rate)
        self.fc = nn.Linear(64, num_classes)

    def forward(self, x):
        x = self.conv1(x)
        x = self.relu(x)
        x = self.blocks(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.dropout(x)  # Add dropout here
        x = self.fc(x)
        return x



The line `device = torch.device("cuda" if torch.cuda.is_available() else "cpu")` checks if a GPU with CUDA support is available on your system. If it is available, it will use the GPU ("cuda") as the device. If not, it will fall back to using the CPU ("cpu"). This allows the code to run on systems with or without GPU support.

In [9]:
# Define a function to train and evaluate the model with a given set of hyperparameters
def train(model, train_loader, criterion, optimizer, device):
    model.train()
    running_loss = 0.0
    running_corrects = 0
    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        _, preds = torch.max(outputs, 1)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()
        running_loss += loss.item() * inputs.size(0)
        running_corrects += torch.sum(preds == targets.data)
    epoch_loss = running_loss / len(train_loader.dataset)
    epoch_acc = running_corrects.double() / len(train_loader.dataset)
    return epoch_loss, epoch_acc

def test(model, test_loader, criterion, device):
    model.eval()
    running_loss = 0.0
    running_corrects = 0
    with torch.no_grad():
        for inputs, targets in test_loader:
            inputs, targets = inputs.to(device), targets.to(device)
            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)
            loss = criterion(outputs, targets)
            running_loss += loss.item() * inputs.size(0)
            running_corrects += torch.sum(preds == targets.data)
    epoch_loss = running_loss / len(test_loader.dataset)
    epoch_acc = running_corrects.double() / len(test_loader.dataset)
    return epoch_loss, epoch_acc


THE OUTPUT 
list of tuples containing 5 values: the epoch number, training loss, training accuracy, testing loss, and testing accuracy

In [13]:
# Define a function to train and evaluate the model with a given set of hyperparameters 2
def train_and_evaluate(hyperparams):
    num_blocks, K, learning_rate = hyperparams
    model = Model(num_blocks, K, num_classes, dropout_rate).to(device)

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    losses = []
    for epoch in range(num_epochs):
        train_loss, train_acc = train(model, train_loader, criterion, optimizer, device)
        test_loss, test_acc = test(model, test_loader, criterion, device)

        print(f"Epoch {epoch + 1}/{num_epochs} - Loss: {train_loss:.4f} Acc: {train_acc:.4f} Test Loss: {test_loss:.4f} Test Acc: {test_acc:.4f}")

        # Store loss and accuracy at each epoch
        losses.append((epoch, train_loss, train_acc.item(), test_loss, test_acc.item()))

    return losses, test_acc, hyperparams

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
results = []
dropout_rate = 0.5  # Adjust this value as needed

for idx, (num_blocks, num_convs, learning_rate) in enumerate(hyperparams):
    model = Model(num_blocks, num_convs, num_classes, dropout_rate).to(device)
    accuracy = train_and_evaluate((num_blocks, num_convs, learning_rate))
    print(f"Combination {idx + 1}/{len(hyperparams)}: {num_blocks} blocks, {num_convs} convs, {learning_rate} lr, Accuracy: {accuracy:.4f}")
    results.append((accuracy, (num_blocks, num_convs, learning_rate)))

# Evaluate all hyperparameter combinations - grid search
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
dropout_rate = 0.5  # Adjust this value as needed
results = []

for hyperparams in hyperparams:
    losses, test_acc, best_hyperparams = train_and_evaluate(hyperparams)
    results.append((losses, test_acc, best_hyperparams))

# Find the best accuracy and the corresponding hyperparameters
best_acc = max(results, key=lambda x: x[1])[1]
best_params = max(results, key=lambda x: x[1])[2]

print("Best accuracy:", best_acc)
print("Best parameters:", best_params)

Epoch 1/10 - Loss: 2.0286 Acc: 0.2486 Test Loss: 1.7775 Test Acc: 0.3300
Epoch 2/10 - Loss: 1.7337 Acc: 0.3401 Test Loss: 1.5384 Test Acc: 0.4257
Epoch 3/10 - Loss: 1.6397 Acc: 0.3834 Test Loss: 1.4373 Test Acc: 0.4536
Epoch 4/10 - Loss: 1.5846 Acc: 0.4059 Test Loss: 1.3986 Test Acc: 0.4755
Epoch 5/10 - Loss: 1.5422 Acc: 0.4222 Test Loss: 1.3630 Test Acc: 0.4901
Epoch 6/10 - Loss: 1.5066 Acc: 0.4388 Test Loss: 1.3369 Test Acc: 0.5042
Epoch 7/10 - Loss: 1.4750 Acc: 0.4539 Test Loss: 1.2664 Test Acc: 0.5372
Epoch 8/10 - Loss: 1.4485 Acc: 0.4676 Test Loss: 1.3551 Test Acc: 0.5127
Epoch 9/10 - Loss: 1.4167 Acc: 0.4798 Test Loss: 1.2831 Test Acc: 0.5309
Epoch 10/10 - Loss: 1.4028 Acc: 0.4857 Test Loss: 1.2160 Test Acc: 0.5483


TypeError: ignored

In [None]:
with open('results.pkl', 'wb') as f:
    pickle.dump(results, f)


After the training loop, save the results to a file named results.pkl:
python
Copy code

`with open('results.pkl', 'wb') as f:`
    `pickle.dump(results, f)`

To load the results from the file later, you can use the following code:
python
Copy code

`with open('results.pkl', 'rb') as f:`
    `loaded_results = pickle.load(f)`
    
Now, loaded_results will contain the results you saved previously, and you can use them for further analysis without having to rerun the training process.



In [None]:
# Evaluate all hyperparameter combinations - grid search
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
dropout_rate = 0.5  # Adjust this value as needed
results = []

for hyperparams in hyperparams:
    losses, test_acc, best_hyperparams = train_and_evaluate(hyperparams)
    results.append((losses, test_acc, best_hyperparams))

# Find the best accuracy and the corresponding hyperparameters
best_acc = max(results, key=lambda x: x[1])[1]
best_params = max(results, key=lambda x: x[1])[2]

print("Best accuracy:", best_acc)
print("Best parameters:", best_params)



calculated epoch accuracies were stored in a separate text file which prevents having to re run a long calculation of 180 combinations of hyperparameters (5 num_blocks * 4 num_convs * 6 learning_rates) while training the model for 10 epochs for each combination. This saves time with GPU.

graph to compare hyperparameters and accuracy 

In [None]:
# Plot the accuracies for each hyperparameter combination
accuracies = [r[1] for r in results]
hyperparams_str = [f"{r[2][0]} blocks, {r[2][1]} convs, {r[2][2]} lr" for r in results]

# Create a dictionary to store hyperparameters and their corresponding accuracies
accuracy_dict = {hp_str: acc for hp_str, acc in zip(hyperparams_str, accuracies)}

# Create a line graph
plt.figure(figsize=(12, 6))
plt.plot(hyperparams_str, accuracies, marker='o', linestyle='-', markersize=6)

plt.xlabel("Hyperparameter Combination")
plt.ylabel("Accuracy")
plt.title("Accuracy for Different Hyperparameter Combinations")
plt.xticks(rotation=45)
plt.tight_layout()
plt.grid()
plt.show()



The rotation parameter is an optional argument in the plt.xticks() function that helps rotate the x-axis labels. By default, the rotation value is set to None, which means the labels will be displayed horizontally. However, if you have many hyperparameter combinations or long labels, the x-axis labels can become cluttered and overlap with each other, making them difficult to read.

To improve readability, you can use the rotation parameter to set the angle (in degrees) for rotating the labels. For instance, you can set rotation=45 to rotate the labels 45 degrees counterclockwise:

Copy code
`plt.xticks(rotation=45)`
Adjust the rotation value to achieve the best appearance and readability for your specific graph.

**END**