In [None]:
%matplotlib inline

Save the python notebook as lastname_firstname_hw5.ipynb

- Name: Dyavarashetty Peeyush
- UID: 120428104


Training a Classifier (50 points)
=====================

You'll be using the cifar10 which is a benchmark dataset widely used for image classification tasks in machine learning and computer vision. It consists of 60,000 32x32 color images in 10 classes, with 6,000 images per class, making it ideal for training and evaluating algorithms.

Please write the code wherever """ write the code here""" is mentioned.



In [None]:
# Importing required methods and functions

import torch
import torchvision
import torchvision.transforms as transforms

The output of torchvision datasets are PILImage images of range [0, 1].
We transform them to Tensors of normalized range [-1, 1].

- 10 points



In [None]:
import torch.utils
import torch.utils.data


transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)

# Use torch.utils.data.DataLoader to create a DataLoader instance for training data
# Set batch_size to control the number of samples in each batch
# Set shuffle=True to shuffle the training data for each epoch
# Set num_workers to specify the number of subprocesses to use for data loading

trainloader = torch.utils.data.DataLoader(dataset=trainset, 
                                          batch_size=20, shuffle=True, 
                                          num_workers=8) #"""Write the code here"""

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform) 

# Create a DataLoader instance for test data similar to trainloader
# Set shuffle=False since shuffling is not necessary for testing/validation

testloader = torch.utils.data.DataLoader(dataset=trainset, 
                                          batch_size=20, shuffle=False, 
                                          num_workers=8) #"""Write the code here"""

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Displaying some training images



In [None]:
import matplotlib.pyplot as plt
import numpy as np

# functions to show an image


def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))


# get some random training images
dataiter = iter(trainloader)
images, labels = next(dataiter)

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

2. Define a Convolution Neural Network

- 10 points



In [None]:
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        """Write the code here"""
        # Perform first convolution operation followed by ReLU activation function
        x = self.conv1(x)
        x = F.relu(x)
        # Perform max pooling operation
        x = self.pool(x)
        # Perform second convolution operation followed by ReLU activation function
        x = self.conv2(x)
        x = F.relu(x)
        # Perform max pooling operation
        x = self.pool(x)
        # Reshape the tensor for fully connected layers
        x = torch.reshape(x, (-1, 16 * 5 * 5))
        # Perform first fully connected layer operation followed by ReLU activation function
        x = self.fc1(x)
        x = F.relu(x)
        # Perform second fully connected layer operation followed by ReLU activation function
        x = self.fc2(x)
        x = F.relu(x)
        # Perform the third fully connected layer operation
        x = self.fc3(x)
        return x


net = Net()

3. Define a Loss function and optimizer

- 10 points



In [None]:
import torch.optim as optim

# Define a loss function
criterion = nn.CrossEntropyLoss()#"""Write the code here"""

# Define an optimizer for ex. ADAM, SGD, RMS
optimizer = optim.SGD(params=net.parameters(), lr=5 * 1e-3, momentum=0.9) #"""Write the code here"""


4. Train the network
- 10 points


In [None]:
for epoch in range(10):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data
        # zero the parameter gradients
        net.zero_grad() #"""Write the code here"""

        # forward + backward + optimize
        outputs = net.forward(x=inputs) #"""Write the code here"""
        loss = criterion(outputs, labels.long()) #"""Write the code here"""
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

5. Test the network on the test data



In [None]:
dataiter = iter(testloader)
images, labels = next(dataiter)

# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

Okay, now let us see what the neural network thinks these examples above are:



In [None]:
outputs = net(images)

The outputs are energies for the 10 classes.
Higher the energy for a class, the more the network
thinks that the image is of the particular class.
So, let's get the index of the highest energy:



In [None]:
_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(4)))

The results seem pretty good.

Let us look at how the network performs on the whole dataset.



In [None]:
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

That looks way better than chance, which is 10% accuracy (randomly picking
a class out of 10 classes).
Seems like the network learnt something.

what are the classes that performed well, and the classes that did
not perform well:



In [None]:
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1


for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))

Reflecting on your experience with training and evaluating a CIFAR-10 classification model, discuss one significant challenge you encountered during the process and how you addressed it. Additionally, explain why classifying images into these specific categories (airplane, automobile, bird, etc.) might be important in real-world applications.

- 10 points

Challenges encountered:

1) Understanding new datatypes such as tensors.DataLoader and how to extract data from it.

2) Setting correct loss function and plugging the loss according to it. I first used `nn.MSELoss()`, which always requires Euclidean distance. So, while training using the neural networks, I always get an error because `labels` and `outputs` have different dimension.

3) Setting `batch_size`. If `batch_size` is 15 or 10 or 32, the model does not learn correctly. I learnt that the batch size is important while plugging in the model.

4) Improving accuracy while trying SGD and ADAM. Found that SGD is better.

Importance of classifying images in real-world applications:

1) Sorting various images in web either for showing or for e-commerce. Since, internet has a lot of images, to search for an image is harder when they are not classified. For example, if there is no classification in the internet, finding how animals (even a particular animal such as husky dog, german shepard), monument (Taj Mahal, Effiel Tower, Leaning Tower of Pisa, etc), object or any other pictorial information is hard. 

2) For wildlife monitoring, classification of different species of animals is important to monitor because they can keep track of count and can understand behaviours easily than searching for them and analyzing.

3) For autonomous vehicles, detecting different classified objects such as cars, persons, animals, birds and so on are important, even for aviation safety because if not detected, the chances of accident are high.