# CIFAR-10


The CIFAR-10 dataset is a widely used benchmark dataset in the field of machine learning and computer vision. It consists of $60,000$ color images of size $32 \times 32$ in 10 classes, with $6,000$ images per class. The dataset is divided into $50,000$ training images and $10,000$ testing images.

Download an extract the dataset.

In [None]:
!wget https://hyperion.bbirke.de/data/datasets/cifar-10-python.tar.gz
!mkdir -p datasets/cifar-10
!tar -xzf cifar-10-python.tar.gz -C datasets/cifar-10

The archive contains the files `data_batch_1`, `data_batch_2`, ..., `data_batch_5`, as well as `test_batch`. Each of these files is a Python "pickled" object produced with `cPickle`. Here is a function which will open such a file and return a dictionary:

In [None]:
import pickle

def unpickle(file):
    with open(file, 'rb') as fo:
        batch_dict = pickle.load(fo, encoding='bytes')
    return {k.decode("utf-8"): v for k, v in batch_dict.items()}

Loaded in this way, each of the batch files contains a dictionary with the following elements:

* **data** - a $10000 \times 3072$ numpy array of `uint8s`. Each row of the array stores a $32 \times 32$ colour image. The first $1024$ entries contain the red channel values, the next $1024$ the green, and the final $1024$ the blue. The image is stored in row-major order, so that the first $32$ entries of the array are the red channel values of the first row of the image.
* **labels** - a list of $10000$ numbers in the range $[0, 9]$. The number at index $i$ indicates the label of the $i$th image in the array data.

The dataset contains another file, called batches.meta. It too contains a Python dictionary object. It has the following entries:

* **label_names** - a $10$-element list which gives meaningful names to the numeric labels in the labels array described above. For example, `label_names[0] == "airplane"`, `label_names[1] == "automobile"`, etc.

In [None]:
label_names = unpickle("datasets/cifar-10/cifar-10-batches-py/batches.meta")['label_names']
val2label = {i: name.decode("utf-8") for i, name in enumerate(label_names)}

for i in range(len(val2label)):
    print(f"label idx: {i} label: {val2label[i]}")

## Your Task

Your task is to train a model and (hopefully) predict an image of Lilo as a cat. You can get a $32 \times 32$ pixel color image with the following command.

In [None]:
!wget https://raw.githubusercontent.com/bbirke/ml-python/main/images/lilo_small.png -O lilo_small.png