## Data completion | Augmentation
To train the network better on a small dataset, you can use data augmentation, a technique for modifying the images in the training set and adding them to the set. So, the network will receive more different examples for training and, accordingly, will show better results.


The algorithm applies the same random affine changes to each i-th image in each subfolder in the dataset folder and save new images as PNG files in augmented dataset folder.


### Run:
0. Install requirements.
1. Put the folders with the images that you want to augment into the Dataset folder(by default: "Dataset" folder). Each folder must have the same number of images.
2. Open Data_Augmentation.ipynb with program, which can execute ipynb files, for example, Jupiter Notebook.
3. Optionally, tune settings of transformations on images in "Settings" section.
4. Run all lines in notebook.
5. Augmented dataset will be placed in the same folder with script in new folder (by default: "AugmentedDataset" folder). Names of pictures have form: "originalName_numberInAlphabeticalOrder_iterationNumber".


### Requirements:
- Python 3+
- Requires PIL library


### Settings:
- Folder dataset: "Dataset" (can be changed in train_dir)
- Output folder: "AugmentedDataset" (can be changed in result_path)
- Dataset increase multiplier: 2 (can be changed in multiple_output_images)
- Output images size: 1024x1024 (can be changed in img_width, img_height)


- You can edit the parameters of changing images in the settings section: rotation, shift, zoom, reflection

## Settings
Let's specify the paths to the folders with images, their desired size and settings of transformations on images.

In [14]:
train_dir = F'Dataset'             # Path from the script file to the folder with the folders with images
result_path = F'AugmentedDataset'  # Output folder name
multiple_output_images = 20        # How many times to increase the dataset
img_width, img_height = 1024, 1024 # Output image size

rotation_range = 270               # Image rotation in degrees
rotation_range_multiple = 90       # Image rotation multiples in degrees
width_shift_range = 0.2            # Horizontal Shift
height_shift_range = 0.2           # Vertical Shift
zoom_range = 0.1                   # Zoom in / out
horizontal_mirror = True           # Horizontal Mirror
vertical_mirror = True             # Vertical Mirror

## Importing libraries

In [15]:
from os import makedirs 
from os import listdir
import random
from shutil import rmtree 
from PIL import Image # import the Python Image processing Library

## Making augmented dataset
Create a folder for the augmented dataset and generate equally modified images in it.

In [16]:
# Zoom function
def zoom_at(img, x, y, zoom):
    w, h = img.size
    zoom2 = zoom * 2
    img = img.crop((x - w / zoom2, y - h / zoom2, 
                    x + w / zoom2, y + h / zoom2))
    return img.resize((w, h), Image.LANCZOS)

In [17]:
dirs = listdir(path = train_dir)                       # list of dirs
print(f'Folders in dataset dir: {dirs}')

dirPath = f'{train_dir}/{dirs[0]}'                        # check correctness of data: number of files in dirs must be equak
numberFiles = len(listdir(path = dirPath))
print(f'Number of files in {dirPath}: {numberFiles}')
for dir_ in dirs:
    dirPath = f'{train_dir}/{dir_}'
    if(len(listdir(path = dirPath)) != numberFiles):
        print(f'ERROR: not equal number of files in dirs')

# Create output dir
try:
    rmtree(result_path, ignore_errors=False, onerror=None)
except:
    print("Warning: Can't remove result dir")
try:
    makedirs(result_path)
except:
    print("Warning: Can't create result dir")
    
for multiple in range(multiple_output_images):
    print(f'Progress: {multiple}/{multiple_output_images}')
    for i in range(numberFiles):                                                       # rotate and zoom image
        randRotate = random.randint(0, rotation_range / rotation_range_multiple) * rotation_range              # rotation
        randZoom = 1 + (-1)**random.randint(1, 2) * random.random() * zoom_range                               # zoom
        randHorizontalShift = (-1)**random.randint(1, 2) * random.random() * width_shift_range * img_width     # horizontal shift
        randVerticalShift = (-1)**random.randint(1, 2) * random.random() * height_shift_range * img_height     # vertical shift
        
        for dir_ in dirs:                                                              # do same changes on photo[i] for all dirs
            dirPath = f'{train_dir}/{dir_}'
            img = Image.open(dirPath +'/'+ listdir(path = dirPath)[i])                                        # open image
            img = img.resize((img_width, img_height))                                                         # resize image
            if(horizontal_mirror):
                img = img.transform(img.size, Image.AFFINE, (-1, 0, img_width, 0, 1, 0))                      # horizontal mirror image
            if(vertical_mirror):
                img = img.transform(img.size, Image.AFFINE, (1, 0, 0, 0, -1, img_height))                     # vertical mirror image
            img = zoom_at(img, img_width / 2, img_height / 2, randZoom)                                       # zoom image
            img = img.transform(img.size, Image.AFFINE, (1, 0, randHorizontalShift, 0, 1, randVerticalShift)) # shift image
            img = img.rotate(randRotate)                                                                      # rotate image
            img.save(f'{result_path}/{listdir(path=dirPath)[i].split(".")[0]}_{i}_{multiple}.png', "PNG")     # save image

print(f'\nDone. Augmented data in {result_path}')

Folders in dataset dir: ['Expert', 'sample_1', 'sample_2', 'sample_3']
Number of files in Dataset/Expert: 100
Progress: 0/20
Progress: 1/20
Progress: 2/20
Progress: 3/20
Progress: 4/20
Progress: 5/20
Progress: 6/20
Progress: 7/20
Progress: 8/20
Progress: 9/20
Progress: 10/20
Progress: 11/20
Progress: 12/20
Progress: 13/20
Progress: 14/20
Progress: 15/20
Progress: 16/20
Progress: 17/20
Progress: 18/20
Progress: 19/20

Done. Augmented data in AugmentedDataset
