# Image Classification using Convolutional Neural Networks in PyTorch

## The CIFAR10 Dataset

In this notebook the CIFAR10 dataset is used: it consists of 60000 32x32 px colour images in 10 classes

In [2]:
from pathlib import Path
import torch
import torchvision
import tarfile
from torchvision.datasets.utils import download_url
from torch.utils.data import random_split

## Data import 

Create a folder for the data

In [13]:
CIFAR10_folder_archive = Path('.')/'data'/'CIFAR10'/'archive'
CIFAR10_folder_raw = Path('.')/'data'/'CIFAR10'/'raw'

if not (CIFAR10_folder_archive).exists():
    CIFAR10_folder_archive.mkdir(parents=True)
    
if not (CIFAR10_folder_raw).exists():
    CIFAR10_folder_raw.mkdir(parents=True)

Download the data from the link

In [14]:
dataset_url = "https://s3.amazonaws.com/fast-ai-imageclas/cifar10.tgz"
download_url(
    url=dataset_url,
    root=str(CIFAR10_folder_archive)
)

Downloading https://s3.amazonaws.com/fast-ai-imageclas/cifar10.tgz to data\CIFAR10\archive\cifar10.tgz


  0%|          | 0/135107811 [00:00<?, ?it/s]

Extract the data

In [15]:
# Extract from archive
with tarfile.open(CIFAR10_folder_archive/'./cifar10.tgz', 'r:gz') as tar:
    tar.extractall(path=CIFAR10_folder_raw)

Check the folders of images

In [21]:
print([folder.name for folder in (CIFAR10_folder_raw/'cifar10').iterdir()])
print([folder.name for folder in (CIFAR10_folder_raw/'cifar10'/'test').iterdir()])
print([folder.name for folder in (CIFAR10_folder_raw/'cifar10'/'train').iterdir()])

['test', 'train']
['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']


Images in train folder

In [26]:
for folder in (CIFAR10_folder_raw/'cifar10'/'train').iterdir():
    print(f'{folder.name} elements: {len(list(folder.iterdir()))}')

airplane elements: 5000
automobile elements: 5000
bird elements: 5000
cat elements: 5000
deer elements: 5000
dog elements: 5000
frog elements: 5000
horse elements: 5000
ship elements: 5000
truck elements: 5000


Images in test folder

In [27]:
for folder in (CIFAR10_folder_raw/'cifar10'/'test').iterdir():
    print(f'{folder.name} elements: {len(list(folder.iterdir()))}')

airplane elements: 1000
automobile elements: 1000
bird elements: 1000
cat elements: 1000
deer elements: 1000
dog elements: 1000
frog elements: 1000
horse elements: 1000
ship elements: 1000
truck elements: 1000


This directory structure (one folder per class) is widly used in computer vision datasets, and most deep learning libraries provide utilites for working with such datasets. PyTorch implements the `ImageFolder` class from `torchvision` to load the data as tensors.

In [29]:
from torchvision.datasets import ImageFolder
from torchvision.transforms import ToTensor

dataset = ImageFolder(str(CIFAR10_folder_raw/'cifar10'/'train'), transform=ToTensor())

Check one element

In [32]:
image_0, label_0 = dataset[0]
print(image_0.shape)
print(label_0)

torch.Size([3, 32, 32])
0


Notice that class has been label-encoded. To know which number corresponds to each class, use the dataset `.classes` attribute

In [33]:
dataset.classes[0]

'airplane'

Visualize the image using matplotlib

In [34]:
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

matplotlib.rcParams['figure.facecolor'] = '#ffffff'

In [None]:
def show_example(img, label_id):
    print('Label: ', dataset.classes[label_id], "("+str(label_id)+")")
    plt.imshow(img.permute(1, 2, 0))