In [9]:
%load_ext autoreload
%autoreload 2

# Synthetic data generation

In this notebook, we call methods from *src/data/synthetic_data_generation.py* to create new image for our model to train on.

We first define the constants of our data directories

In [None]:
from src.data.synthetic_data_generation import *
from pathlib import Path

# Images directories
SYNTH_DIR = Path('data/synthetic_train')
REFERENCE_DIR = Path('data/references')
DEFAULT_TRAIN_DIR = Path('data/train')
BACKGROUND_DIR = Path('data/backgrounds')
TRANSPARENCY_DIR = Path('data/alpha_references')

# CSV files
ORIGINAL_CSV = Path('data/train.csv')
SYNTH_CSV = Path('data/synthetic_train.csv')

# Image constants
IMG_SIZE = (300, 200)
NUM_SYNTH_IMAGES = 220 # this will be 4x due to the horizontal and vertical flips

# Check that all the directories exist
for directory in [SYNTH_DIR, REFERENCE_DIR, DEFAULT_TRAIN_DIR, BACKGROUND_DIR]:
    if not directory.exists():
        raise FileNotFoundError(f"Directory {directory} does not exist.")
# Check that the original CSV file exists
if not ORIGINAL_CSV.exists():
    raise FileNotFoundError(f"CSV file {ORIGINAL_CSV} does not exist. It is needed to generate the synthetic data.")

## Resize our images

We start by taking all the reference and default train images and transform them to a lighter format allowing easiser computations.

First, we remove everything that was in the synthetic folder

In [11]:
clear_output_dir(SYNTH_DIR)

All images in data\synthetic_train have been deleted.


In [12]:
to_be_resized = [REFERENCE_DIR, DEFAULT_TRAIN_DIR]
resize_training_images(IMG_SIZE, to_be_resized, SYNTH_DIR)

Resizing images in data\references...


100%|██████████| 13/13 [00:03<00:00,  3.88it/s]


Resizing images in data\train...


100%|██████████| 90/90 [00:24<00:00,  3.63it/s]

DONE! Resized images saved in data\synthetic_train.





### Creating synthetic images
Now, we will add hundreds of new images in the same folder that are generated using the reference images (with a transparent background). This will create our new csv with the additional annotations. Noise and rotations are randomly applied to individual chocolates and the backgrounds are equally distributed and are fetched from the 'data/backgrounds' directory

In [13]:
generate_synthetic_dataset(NUM_SYNTH_IMAGES, SYNTH_DIR, TRANSPARENCY_DIR, SYNTH_DIR, ORIGINAL_CSV, SYNTH_CSV, IMG_SIZE, augment=True)

Generating images for background  1  of  103


100%|██████████| 1/1 [00:00<00:00,  4.00it/s]


Generating images for background  2  of  103


100%|██████████| 1/1 [00:00<00:00,  4.03it/s]


Generating images for background  3  of  103


100%|██████████| 1/1 [00:00<00:00,  3.91it/s]


Generating images for background  4  of  103


100%|██████████| 1/1 [00:00<00:00,  3.99it/s]


Generating images for background  5  of  103


100%|██████████| 1/1 [00:00<00:00,  3.96it/s]


Generating images for background  6  of  103


100%|██████████| 1/1 [00:00<00:00,  3.83it/s]


Generating images for background  7  of  103


100%|██████████| 1/1 [00:00<00:00,  3.82it/s]


Generating images for background  8  of  103


100%|██████████| 1/1 [00:00<00:00,  3.80it/s]


Generating images for background  9  of  103


100%|██████████| 1/1 [00:00<00:00,  3.72it/s]


Generating images for background  10  of  103


100%|██████████| 1/1 [00:00<00:00,  3.87it/s]


Generating images for background  11  of  103


100%|██████████| 1/1 [00:00<00:00,  3.53it/s]


Generating images for background  12  of  103


100%|██████████| 1/1 [00:00<00:00,  3.52it/s]


Generating images for background  13  of  103


100%|██████████| 1/1 [00:00<00:00,  3.58it/s]


Generating images for background  14  of  103


100%|██████████| 1/1 [00:00<00:00,  3.50it/s]


Generating images for background  15  of  103


100%|██████████| 1/1 [00:00<00:00,  3.58it/s]


Generating images for background  16  of  103


100%|██████████| 1/1 [00:00<00:00,  3.20it/s]


Generating images for background  17  of  103


100%|██████████| 1/1 [00:00<00:00,  3.62it/s]


Generating images for background  18  of  103


100%|██████████| 1/1 [00:00<00:00,  3.51it/s]


Generating images for background  19  of  103


100%|██████████| 1/1 [00:00<00:00,  3.63it/s]


Generating images for background  20  of  103


100%|██████████| 1/1 [00:00<00:00,  3.64it/s]


Generating images for background  21  of  103


100%|██████████| 1/1 [00:00<00:00,  3.88it/s]


Generating images for background  22  of  103


100%|██████████| 1/1 [00:00<00:00,  3.80it/s]


Generating images for background  23  of  103


100%|██████████| 1/1 [00:00<00:00,  3.96it/s]


Generating images for background  24  of  103


100%|██████████| 1/1 [00:00<00:00,  3.90it/s]


Generating images for background  25  of  103


100%|██████████| 1/1 [00:00<00:00,  3.94it/s]


Generating images for background  26  of  103


100%|██████████| 1/1 [00:00<00:00,  3.33it/s]


Generating images for background  27  of  103


100%|██████████| 1/1 [00:00<00:00,  3.34it/s]


Generating images for background  28  of  103


100%|██████████| 1/1 [00:00<00:00,  3.50it/s]


Generating images for background  29  of  103


100%|██████████| 1/1 [00:00<00:00,  3.50it/s]


Generating images for background  30  of  103


100%|██████████| 1/1 [00:00<00:00,  3.58it/s]


Generating images for background  31  of  103


100%|██████████| 1/1 [00:00<00:00,  3.50it/s]


Generating images for background  32  of  103


100%|██████████| 1/1 [00:00<00:00,  3.62it/s]


Generating images for background  33  of  103


100%|██████████| 1/1 [00:00<00:00,  3.85it/s]


Generating images for background  34  of  103


100%|██████████| 1/1 [00:00<00:00,  3.99it/s]


Generating images for background  35  of  103


100%|██████████| 1/1 [00:00<00:00,  3.75it/s]


Generating images for background  36  of  103


100%|██████████| 1/1 [00:00<00:00,  3.78it/s]


Generating images for background  37  of  103


100%|██████████| 1/1 [00:00<00:00,  3.55it/s]


Generating images for background  38  of  103


100%|██████████| 1/1 [00:00<00:00,  3.49it/s]


Generating images for background  39  of  103


100%|██████████| 1/1 [00:00<00:00,  3.78it/s]


Generating images for background  40  of  103


100%|██████████| 1/1 [00:00<00:00,  3.70it/s]


Generating images for background  41  of  103


100%|██████████| 1/1 [00:00<00:00,  3.82it/s]


Generating images for background  42  of  103


100%|██████████| 1/1 [00:00<00:00,  3.49it/s]


Generating images for background  43  of  103


100%|██████████| 1/1 [00:00<00:00,  3.35it/s]


Generating images for background  44  of  103


100%|██████████| 1/1 [00:00<00:00,  3.37it/s]


Generating images for background  45  of  103


100%|██████████| 1/1 [00:00<00:00,  3.48it/s]


Generating images for background  46  of  103


100%|██████████| 1/1 [00:00<00:00,  3.46it/s]


Generating images for background  47  of  103


100%|██████████| 1/1 [00:00<00:00,  3.56it/s]


Generating images for background  48  of  103


100%|██████████| 1/1 [00:00<00:00,  3.53it/s]


Generating images for background  49  of  103


100%|██████████| 1/1 [00:00<00:00,  3.58it/s]


Generating images for background  50  of  103


100%|██████████| 1/1 [00:00<00:00,  3.52it/s]


Generating images for background  51  of  103


100%|██████████| 1/1 [00:00<00:00,  3.30it/s]


Generating images for background  52  of  103


100%|██████████| 1/1 [00:00<00:00,  3.58it/s]


Generating images for background  53  of  103


100%|██████████| 1/1 [00:00<00:00,  3.13it/s]


Generating images for background  54  of  103


100%|██████████| 1/1 [00:00<00:00,  3.51it/s]


Generating images for background  55  of  103


100%|██████████| 1/1 [00:00<00:00,  3.58it/s]


Generating images for background  56  of  103


100%|██████████| 1/1 [00:00<00:00,  3.48it/s]


Generating images for background  57  of  103


100%|██████████| 1/1 [00:00<00:00,  3.60it/s]


Generating images for background  58  of  103


100%|██████████| 1/1 [00:00<00:00,  3.49it/s]


Generating images for background  59  of  103


100%|██████████| 1/1 [00:00<00:00,  3.81it/s]


Generating images for background  60  of  103


100%|██████████| 1/1 [00:00<00:00,  3.56it/s]


Generating images for background  61  of  103


100%|██████████| 1/1 [00:00<00:00,  3.46it/s]


Generating images for background  62  of  103


100%|██████████| 1/1 [00:00<00:00,  3.39it/s]


Generating images for background  63  of  103


100%|██████████| 1/1 [00:00<00:00,  3.11it/s]


Generating images for background  64  of  103


100%|██████████| 1/1 [00:00<00:00,  3.55it/s]


Generating images for background  65  of  103


100%|██████████| 1/1 [00:00<00:00,  3.62it/s]


Generating images for background  66  of  103


100%|██████████| 1/1 [00:00<00:00,  3.38it/s]


Generating images for background  67  of  103


100%|██████████| 1/1 [00:00<00:00,  3.81it/s]


Generating images for background  68  of  103


100%|██████████| 1/1 [00:00<00:00,  3.59it/s]


Generating images for background  69  of  103


100%|██████████| 1/1 [00:00<00:00,  3.50it/s]


Generating images for background  70  of  103


100%|██████████| 1/1 [00:00<00:00,  3.31it/s]


Generating images for background  71  of  103


100%|██████████| 1/1 [00:00<00:00,  3.32it/s]


Generating images for background  72  of  103


100%|██████████| 1/1 [00:00<00:00,  3.13it/s]


Generating images for background  73  of  103


100%|██████████| 1/1 [00:00<00:00,  3.50it/s]


Generating images for background  74  of  103


100%|██████████| 1/1 [00:00<00:00,  3.51it/s]


Generating images for background  75  of  103


100%|██████████| 1/1 [00:00<00:00,  3.74it/s]


Generating images for background  76  of  103


100%|██████████| 1/1 [00:00<00:00,  3.73it/s]


Generating images for background  77  of  103


100%|██████████| 1/1 [00:00<00:00,  3.79it/s]


Generating images for background  78  of  103


100%|██████████| 1/1 [00:00<00:00,  3.67it/s]


Generating images for background  79  of  103


100%|██████████| 1/1 [00:00<00:00,  3.72it/s]


Generating images for background  80  of  103


100%|██████████| 1/1 [00:00<00:00,  3.26it/s]


Generating images for background  81  of  103


100%|██████████| 1/1 [00:00<00:00,  3.20it/s]


Generating images for background  82  of  103


100%|██████████| 1/1 [00:00<00:00,  3.16it/s]


Generating images for background  83  of  103


100%|██████████| 1/1 [00:00<00:00,  3.34it/s]


Generating images for background  84  of  103


100%|██████████| 1/1 [00:00<00:00,  3.62it/s]


Generating images for background  85  of  103


100%|██████████| 1/1 [00:00<00:00,  3.34it/s]


Generating images for background  86  of  103


100%|██████████| 1/1 [00:00<00:00,  3.33it/s]


Generating images for background  87  of  103


100%|██████████| 1/1 [00:00<00:00,  3.19it/s]


Generating images for background  88  of  103


100%|██████████| 1/1 [00:00<00:00,  3.26it/s]


Generating images for background  89  of  103


100%|██████████| 1/1 [00:00<00:00,  3.14it/s]


Generating images for background  90  of  103


100%|██████████| 1/1 [00:00<00:00,  3.25it/s]


Generating images for background  91  of  103


100%|██████████| 1/1 [00:00<00:00,  3.41it/s]


Generating images for background  92  of  103


100%|██████████| 1/1 [00:00<00:00,  3.43it/s]


Generating images for background  93  of  103


100%|██████████| 1/1 [00:00<00:00,  3.62it/s]


Generating images for background  94  of  103


100%|██████████| 1/1 [00:00<00:00,  3.61it/s]


Generating images for background  95  of  103


100%|██████████| 1/1 [00:00<00:00,  3.71it/s]


Generating images for background  96  of  103


100%|██████████| 1/1 [00:00<00:00,  3.34it/s]


Generating images for background  97  of  103


100%|██████████| 1/1 [00:00<00:00,  3.68it/s]


Generating images for background  98  of  103


100%|██████████| 1/1 [00:00<00:00,  3.10it/s]


Generating images for background  99  of  103


100%|██████████| 1/1 [00:00<00:00,  3.62it/s]


Generating images for background  100  of  103


100%|██████████| 1/1 [00:00<00:00,  3.37it/s]


Generating images for background  101  of  103


100%|██████████| 1/1 [00:00<00:00,  3.29it/s]


Generating images for background  102  of  103


100%|██████████| 1/1 [00:00<00:00,  3.11it/s]


Generating images for background  103  of  103


100%|██████████| 8/8 [00:02<00:00,  3.50it/s]


### Data flipping
We can now take all our synthetic and default image and apply 3 kinds of flips: vertical, horizontal and both combined

In [14]:
# First, we group the default csv with the synthetic csv and replace the synthetic csv
merge_csv_files(SYNTH_CSV, ORIGINAL_CSV, SYNTH_CSV) # If you run this cell multiple times, it will keep adding the synthetic data to the original csv. You can delete the original csv if you want to start over.

flip_images(SYNTH_CSV, SYNTH_DIR, SYNTH_CSV)

Merged CSV files saved to data\synthetic_train.csv.


In [18]:
# Testing colorjitter

img = Image.open(SYNTH_DIR / 'L1000900.JPG')

from torchvision import transforms
color_jitter = transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.2)
img_transformed = color_jitter(img)
img_transformed.show()