# Explore the dataset augmentation function

This notebook showcases how easy is to use the `augment_dataset()` function, which is part of the `image_augmentator` library. 

The `augment_dataset()` function takes as an input a manifest file with annotation details about images (a JSON lines object, with one line corresponding to one image), and augments all the images while also generating a new manifest file for the augmentated dataset. 

The manifest file can be located either locally or in S3. 3 types of tasks are supported by the `augment_dataset()` function: 

- Object detection
- Image classification (single label)
- Image classification (multi-label)


In [17]:

CLASS_NAMES = [  # names of the 10 products that we will be trying to detect
    'flakes',  
    'mm', 
    'coke', 
    'spam', 
    'nutella', 
    'doritos', 
    'ritz', 
    'skittles', 
    'mountaindew', 
    'evian'
]

In the following cell we define the augmentation parameters that will be applied to the whole dataset. Bear in mind that you don't necessarilly need to provide values for all the parameters. Only the ones that are relevant to the use-case you are tackling. You can use the `explore-image-augmentations.ipynb` notebook to understand the impact and the ranges of each parameter and decide whether it is applicable for your use-case. 

In [21]:
dc_param = {}

dc_param['max_number_of_classes'] = len(CLASS_NAMES),
dc_param['how_many']=5,                    
dc_param['random_seed']=0,                  
dc_param['range_scale']=(0.75, 1.2),         
dc_param['range_translation']=(-50, 50),    
dc_param['range_rotation']=(-5, 5),         
dc_param['range_sheer']=(-5, 5),            
dc_param['range_noise']=(0, 0.001),         
dc_param['range_brightness']=(0.8, 1.5),    
dc_param['range_colorfulness']=(0, 2),
dc_param['range_color_temperature']=(-0.5, 1.5),
dc_param['flip_lr']='random',               
dc_param['flip_ud']=None,                
dc_param['enhance']=None,
dc_param['bbox_truncate'] = True,         
dc_param['bbox_discard_thr'] = 0.85, 

In [22]:
# create a destination folder where the augmented dataset will be saved
!mkdir augmented_dataset

mkdir: cannot create directory ‘augmented_dataset’: File exists


In [23]:
%load_ext autoreload
%autoreload 2

from util.image_augmentator import augment_dataset

stats = augment_dataset(
    uri_manifest_file='images/toy-dataset.manifest',
    uri_destination='augmented_dataset',
    ls_class_names = CLASS_NAMES,
    filename_postfix = '_augm_',
    include_original=True,
    verbose = True,
    **dc_param
)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Augmenting image 1 out of 2 [ 50.0 %]
Augmenting image 2 out of 2 [ 100.0 %]


In [24]:
stats

{'original': {'n_samples': 2,
  'class_hist': array([2, 3, 2, 1, 3, 2, 3, 1, 3, 2])},
 'augmentations': {'n_samples': 12,
  'class_hist': array([12, 18, 12,  6, 18, 12, 18,  6, 18, 12])}}

## Conclusion

???