# MultiImagePlotter Demo with CIFAR10
For this, we need to have CIFAR10 data organized in subfolders (one for each class). We can then use the standard Aggregator, which returns an image and the corresponding filename.

# Prerequisites

In [1]:
import numpy as np
from pathlib import Path
from collections import Counter
import random

import torch
from torchvision.datasets import CIFAR10, SVHN, MNIST, EMNIST
from torchvision.transforms import Compose, ToTensor

from hyperpyper.utils import DataSetDumper, VisionDatasetDumper
from hyperpyper.utils import FolderScanner as fs
from hyperpyper.transforms import PILTranspose


In [2]:
ROOT_PATH = Path.home() / "Downloads" / "data"

DATA_PATH = ROOT_PATH / "CIFAR10"

DATA_PATH_TEST = Path(DATA_PATH, "test")
DATA_PATH_TRAIN = Path(DATA_PATH, "train")

## Create CIFAR10 dataset organized in subfolders indicating class
The VisionDatasetDumper handles the download and the creation of a folder structure where images are stored. They can then be used as the starting point for experiments. We only need the dataset returned by the VisionDatasetDumper to extract the class labels to be able to match them with class indices.

In [3]:
train_dataset = VisionDatasetDumper(CIFAR10, root=DATA_PATH, dst=DATA_PATH_TRAIN, train=True).dump()

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to C:\Users\bernh\Downloads\data\CIFAR10\cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:14<00:00, 12153518.62it/s]


Extracting C:\Users\bernh\Downloads\data\CIFAR10\cifar-10-python.tar.gz to C:\Users\bernh\Downloads\data\CIFAR10
ToTensor()
dict_keys([6, 9, 4, 1, 2, 7, 8, 3, 5, 0])


### Retrieve a list of files

In [4]:
train_files = fs.get_files(DATA_PATH_TRAIN, extensions='.png', recursive=True)

# Select some random items
selected_files = random.sample(train_files, 5)
selected_files

[WindowsPath('C:/Users/bernh/Downloads/data/CIFAR10/train/9/24693.png'),
 WindowsPath('C:/Users/bernh/Downloads/data/CIFAR10/train/7/35924.png'),
 WindowsPath('C:/Users/bernh/Downloads/data/CIFAR10/train/5/28336.png'),
 WindowsPath('C:/Users/bernh/Downloads/data/CIFAR10/train/3/14018.png'),
 WindowsPath('C:/Users/bernh/Downloads/data/CIFAR10/train/0/49615.png')]

In [5]:
DATA_PATH = ROOT_PATH / "SVHN"

DATA_PATH_TEST = Path(DATA_PATH, "test")
DATA_PATH_TRAIN = Path(DATA_PATH, "train")

train_dataset = VisionDatasetDumper(SVHN, root=DATA_PATH, dst=DATA_PATH_TRAIN, train=True).dump()

Downloading http://ufldl.stanford.edu/housenumbers/train_32x32.mat to C:\Users\bernh\Downloads\data\SVHN\train_32x32.mat


100%|██████████| 182040794/182040794 [00:39<00:00, 4595586.30it/s]


ToTensor()
dict_keys([1, 9, 2, 3, 5, 8, 7, 4, 6, 0])


In [6]:
DATA_PATH = ROOT_PATH / "MNIST"

DATA_PATH_TEST = Path(DATA_PATH, "test")
DATA_PATH_TRAIN = Path(DATA_PATH, "train")

train_dataset = VisionDatasetDumper(MNIST, root=DATA_PATH, dst=DATA_PATH_TRAIN, train=True).dump()

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to C:\Users\bernh\Downloads\data\MNIST\MNIST\raw\train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 11848614.84it/s]


Extracting C:\Users\bernh\Downloads\data\MNIST\MNIST\raw\train-images-idx3-ubyte.gz to C:\Users\bernh\Downloads\data\MNIST\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to C:\Users\bernh\Downloads\data\MNIST\MNIST\raw\train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 1930449.30it/s]


Extracting C:\Users\bernh\Downloads\data\MNIST\MNIST\raw\train-labels-idx1-ubyte.gz to C:\Users\bernh\Downloads\data\MNIST\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to C:\Users\bernh\Downloads\data\MNIST\MNIST\raw\t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 9184170.20it/s]


Extracting C:\Users\bernh\Downloads\data\MNIST\MNIST\raw\t10k-images-idx3-ubyte.gz to C:\Users\bernh\Downloads\data\MNIST\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to C:\Users\bernh\Downloads\data\MNIST\MNIST\raw\t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 4552097.67it/s]


Extracting C:\Users\bernh\Downloads\data\MNIST\MNIST\raw\t10k-labels-idx1-ubyte.gz to C:\Users\bernh\Downloads\data\MNIST\MNIST\raw

ToTensor()
dict_keys([5, 0, 4, 1, 9, 2, 3, 6, 7, 8])


In [7]:
DATA_PATH = ROOT_PATH / "EMNIST"

DATA_PATH_TEST = Path(DATA_PATH, "test")
DATA_PATH_TRAIN = Path(DATA_PATH, "train")

transform = Compose([
    PILTranspose(),
    ToTensor(),
])

train_dataset = VisionDatasetDumper(EMNIST, root=DATA_PATH, dst=DATA_PATH_TRAIN, split='letters', train=True, transform=transform).dump()

Downloading https://www.itl.nist.gov/iaui/vip/cs_links/EMNIST/gzip.zip to C:\Users\bernh\Downloads\data\EMNIST\EMNIST\raw\gzip.zip


100%|██████████| 561753746/561753746 [00:45<00:00, 12408230.77it/s]


Extracting C:\Users\bernh\Downloads\data\EMNIST\EMNIST\raw\gzip.zip to C:\Users\bernh\Downloads\data\EMNIST\EMNIST\raw
Compose(
    PILTranspose()
    ToTensor()
)
dict_keys([23, 7, 16, 15, 17, 13, 11, 22, 24, 10, 14, 18, 21, 26, 19, 5, 2, 25, 9, 12, 1, 8, 4, 3, 20, 6])


In [8]:
train_dataset

Dataset EMNIST
    Number of datapoints: 124800
    Root location: C:\Users\bernh\Downloads\data\EMNIST
    Split: Train
    StandardTransform
Transform: Compose(
               PILTranspose()
               ToTensor()
           )