# A Sample of Data Augmentation
---
Zhiang Chen, Nov 2018 @ DREAMS

### 1. Generate Masks from XML
- Download the xml annotations and original images from LabelMe
- Create dataset folders under directory "datasets"  
e.g. a tornado datasets
```
--data_augmentator
|--datasets
  |--tornado
    |--annotation: tornado1.xml, tornado2.xml, ...
    |--image: tornado1.jpg, tornado2.jpg, ...
```
- Genrate masks from XML files

In [1]:
from xml2mask import polygonReader
objects = ['ndr', 'dr'] # give a list of the annotated objects
pr = polygonReader("tornado", objects)

#save compact masks on .jpg files
#pr.saveMask(dim=(4000,4000))

#generate and save multi-channel images on .npy files
pr.generateMask2(dim=(4000, 4000), resize=(4000, 4000), saveOnline=True) 

260it [18:15,  3.92s/it]


### 2. Augment data
Before running the script, create a folder and move .npy files into the folder. And create a folder for augmentation data.
```
--data_augmentator
|--datasets
  |--tornado
    |--annotation: tornado1.xml, tornado2.xml, ...
    |--image: tornado1.jpg, tornado2.jpg, ...
    |--npy: tornado1.npy, tornado2.npy, ...
    |--aug: augmentation data
```
Resize the data before augmentation.

In [2]:
from augmentation import augmentor
import numpy as np
from tqdm import tqdm
import cv2
import uuid

In [5]:
config = dict(
            mode=2, 
            resize_dim=(800,800),
            batch_number=2,
            rotation_min=-90,
            rotation_max=90,
            fliplr=True,
            flipud=True,
            zoom_min=0.8,
            zoom_max=1.2)

image_path = './datasets/tornado/image/'
annotation_path = './datasets/tornado/npy/'
aug_path = './datasets/tornado/aug/'
aug = augmentor(image_path, annotation_path, **config)

for i,m,f in tqdm(aug):
    unid = uuid.uuid4().hex
    img_name = aug_path + f + unid + '.jpg'
    ann_name = aug_path + f + unid + '.npy'
    cv2.imwrite(img_name, i)
    np.save(ann_name, m)

520it [45:04,  4.03s/it]


### 3. Rename images and annotations by numerical order

It is recommended to back up the data before renaming.

In [6]:
from rename import renamer
rn = renamer('./datasets/tornado/img/', './datasets/tornado/ann/')

In [7]:
rn.rename(mode=1, image2png=True, annotation_suffix='.npy', annotation2suffix='.npy')

Alright, data augmentation done.