# A Sample of Data Augmentation
---
Zhiang Chen, Nov 2018 @ DREAMS

### 1. Generate Masks from XML
- Download the xml annotations and original images from LabelMe
- Create dataset folders under directory "datasets"  
e.g. a tornado datasets
```
--data_augmentator
|--datasets  
    |--rocks
        |--annotation: rocks1.xml, rocks2.xml, ...
        |--image: rocks1.jpg, rocks2.jpg, ...
```
- Genrate masks from XML files

In [1]:
from xml2mask import polygonReader
objects = ['rock'] # give a list of the annotated objects
pr = polygonReader("rocks", objects)

#save compact masks on .jpg files
#pr.saveMask(dim=(4000,4000))

#generate and save multi-channel images on .npy files
pr.generateMask2(dim=(400, 400), resize=(400, 400), saveOnline=True) 

7it [00:01,  5.53it/s]


### 2. Augment data
Before running the script, create a folder and move .npy files into the folder. And create a folder for augmentation data.
```
--data_augmentator
|--datasets
    |--tornado
        |--annotation: rocks1.xml, rocks2.xml, ...
        |--image: rocks1.jpg, rocks2.jpg, ...
        |--npy: rocks1.npy, rocks2.npy, ...
        |--aug: augmentation data
```
Resize the data before augmentation.

In [2]:
from augmentation import augmentor
import numpy as np
from tqdm import tqdm
import cv2
import uuid

In [6]:
config = dict(
            mode=2, 
            resize_dim=(400,400),
            batch_number=10,
            rotation_min=-90,
            rotation_max=90,
            fliplr=True,
            flipud=True,
            zoom_min=1.0,
            zoom_max=1.1)

image_path = './datasets/rocks/image/'
annotation_path = './datasets/rocks/npy/'
aug_path = './datasets/rocks/aug/'
aug = augmentor(image_path, annotation_path, **config)

for i,m,f in tqdm(aug):
    unid = uuid.uuid4().hex
    img_name = aug_path + f + unid + '.jpg'
    ann_name = aug_path + f + unid + '.npy'
    cv2.imwrite(img_name, i)
    np.save(ann_name, m)

690it [01:29,  4.94it/s]


### 3. Rename images and annotations by numerical order

It is recommended to back up the data before renaming.

In [2]:
from rename import renamer
rn = renamer('./datasets/rocks/image/', './datasets/rocks/annotation/')

In [3]:
rn.rename(mode=1, image2png=True, annotation_suffix='.npy', annotation2suffix='.npy')

Alright, data augmentation done.