# A Sample of Data Augmentation
---
Zhiang Chen, Nov 2018 @ DREAMS

### 1. Generate Masks from XML
- Download the xml annotations and original images from LabelMe
- Create dataset folders under directory "datasets"  
e.g. a tornado datasets
```
--data_augmentator
|--datasets  
    |--rocks
        |--annotation: rocks1.xml, rocks2.xml, ...
        |--image: rocks1.jpg, rocks2.jpg, ...
```
- Genrate masks from XML files

In [4]:
from xml2mask import polygonReader
objects = ['drone'] # give a list of the annotated objects
pr = polygonReader("drone", objects)

#save compact masks on .jpg files
#pr.saveMask(dim=(4000,4000))

#generate and save multi-channel images on .npy files
pr.generateMask2(dim=(1024, 768), resize=(768, 1024), saveOnline=True) 

177it [00:01, 158.58it/s]


### 2. Augment data
Before running the script, create a folder and move .npy files into the folder. And create a folder for augmentation data.
```
--data_augmentator
|--datasets
    |--rocks
        |--annotation: rocks1.xml, rocks2.xml, ...
        |--image: rocks1.jpg, rocks2.jpg, ...
        |--npy: rocks1.npy, rocks2.npy, ...
        |--aug: augmentation data
```
Resize the data before augmentation.

In [5]:
from augmentation import augmentor
import numpy as np
from tqdm import tqdm
import cv2
import uuid

In [6]:
config = dict(
            mode=2, 
            resize_dim=(768, 1024),
            batch_number=10,
            rotation_min=-90,
            rotation_max=90,
            fliplr=True,
            flipud=True,
            zoom_min=0.8,
            zoom_max=1.2)

image_path = './datasets/drone/image/'
annotation_path = './datasets/drone/npy/'
aug_path = './datasets/drone/aug/'
aug = augmentor(image_path, annotation_path, **config)

for i,m,f in tqdm(aug):
    unid = uuid.uuid4().hex
    img_name = aug_path + f + "_" + unid + '.jpg'
    ann_name = aug_path + f + "_" + unid + '.npy'
    cv2.imwrite(img_name, i)
    np.save(ann_name, m)

81it [00:04, 18.98it/s]

Cannot find the corresponding anntation file for img_20190510_141453868.jpg


145it [00:07, 19.35it/s]

Cannot find the corresponding anntation file for img_20190510_141308535.jpg


246it [00:12, 18.77it/s]

Cannot find the corresponding anntation file for img_20190510_141453868.jpg


308it [00:16, 19.77it/s]

Cannot find the corresponding anntation file for img_20190510_141308535.jpg


409it [00:21, 19.60it/s]

Cannot find the corresponding anntation file for img_20190510_141453868.jpg


474it [00:24, 16.93it/s]

Cannot find the corresponding anntation file for img_20190510_141308535.jpg


572it [00:29, 21.06it/s]

Cannot find the corresponding anntation file for img_20190510_141453868.jpg


637it [00:33, 20.14it/s]

Cannot find the corresponding anntation file for img_20190510_141308535.jpg


737it [00:38, 17.48it/s]

Cannot find the corresponding anntation file for img_20190510_141453868.jpg


801it [00:41, 22.17it/s]

Cannot find the corresponding anntation file for img_20190510_141308535.jpg


902it [00:46, 23.66it/s]

Cannot find the corresponding anntation file for img_20190510_141453868.jpg


967it [00:50, 20.99it/s]

Cannot find the corresponding anntation file for img_20190510_141308535.jpg


1068it [00:55, 22.13it/s]

Cannot find the corresponding anntation file for img_20190510_141453868.jpg


1129it [00:58, 21.89it/s]

Cannot find the corresponding anntation file for img_20190510_141308535.jpg


1228it [01:03, 22.07it/s]

Cannot find the corresponding anntation file for img_20190510_141453868.jpg


1295it [01:07, 20.83it/s]

Cannot find the corresponding anntation file for img_20190510_141308535.jpg


1393it [01:12, 19.71it/s]

Cannot find the corresponding anntation file for img_20190510_141453868.jpg


1456it [01:16, 15.71it/s]

Cannot find the corresponding anntation file for img_20190510_141308535.jpg


1558it [01:21, 18.58it/s]

Cannot find the corresponding anntation file for img_20190510_141453868.jpg


1622it [01:24, 20.63it/s]

Cannot find the corresponding anntation file for img_20190510_141308535.jpg


1640it [01:25, 19.34it/s]


### 3. Rename images and annotations by numerical order

It is recommended to back up the data before renaming.

In [2]:
from rename import renamer
rn = renamer('./datasets/rocks/image/', './datasets/rocks/annotation/')

In [3]:
rn.rename(mode=1, image2png=True, annotation_suffix='.npy', annotation2suffix='.npy')

Alright, data augmentation done.