# Balancing a dataset

This notebook shows a simple approach to balance a dataset, while performing some static data augmentation as well.

The dataset will be expanded to have a similar number of images from each class. New images are created based on geometrical and colour operations performed on the images of the original training dataset.

In [1]:
from PIL import Image, ImageEnhance
import random
import os
import shutil

## Settings

In [2]:
IMAGES_PER_CLASS = 10000
data_path = 'd:/vcpi/gtsrb'

First copy the original training set to the new folder

In [3]:
shutil.copytree(f'{data_path}/train', f'{data_path}/train_balanced_{IMAGES_PER_CLASS}')

'd:/vcpi/gtsrb/train_balanced_1000'

balance and augment the dataset

In [4]:
classes = os.listdir(f'{data_path}/train_balanced_{IMAGES_PER_CLASS}')

list_img = []
for cla in classes:
    list_img = os.listdir(f'{data_path}/train_balanced_{IMAGES_PER_CLASS}/{cla}')
    random.shuffle(list_img)
    for k in range(len(list_img), IMAGES_PER_CLASS):

        filename = f'{data_path}/train_balanced_{IMAGES_PER_CLASS}/{cla}/{list_img[(k - len(list_img)) % len(list_img)]}'
        im = Image.open(filename)

        r = random.uniform(-10.0,10.0)
        im = im.rotate(r)
        r1 = random.uniform(-3.0,3.0)
        r2 = random.uniform(-3.0,3.0)

        im = im.transform(im.size, Image.Transform.AFFINE, (1, 0, r1, 0, 1, r2))

        r = random.uniform(1.0, 1.3)
        im = ImageEnhance.Sharpness(im)
        im = im.enhance(r)

        r = random.uniform(1.0, 1.3)
        im = ImageEnhance.Contrast(im)
        im = im.enhance(r)
        
        im = im.resize((32,32))

        im.save(f'{data_path}/train_balanced_{IMAGES_PER_CLASS}/{cla}/_{k}.png')
