Skip to content

Batch Level Augmentation

Ahmed Taha edited this page Dec 7, 2018 · 2 revisions

Advanced Tensorflow tutorials use tf.data.Dataset. After creating a batch of images, preprocessing factories unstack images and apply a set of augmentation techniques per image. For those expert with tf.data.Dataset, this approach is perfect. tf.data.Dataset brings Tensorflow capabilities, like prefetch and paralelism. It is easy to use for randomly sampling images/data. But things get tricky, not impossible, if a custom sampling procedure is required. Semi-hard negative sampling, for triplet loss, is an example of custom sampling procedure. Tensorflow raises flexbility through tf.py_func by supporting regular programming language statements like if-statements, for-loops, etc. While tf.data.Dataset and tf.py_func are highly recommended for their performance efficiencies, they adds complexity in terms of implementation.

Contrary, tf.placeholder is employed in Tensorflow beginner tutorials for its simplicity. It is more intuitive and easy to integrate with any data sampling implementations. This promotes flexibility across different machine learning frameworks like pytorch, keras and CAFFE. Yet, tf.placeholder is less efficient than tf.data.Dataset.This wiki explains a different approach to do image augmentation. It is similar to the preprocessing factories approach. Yet, it has extra merits and can be handy in certain situations. By disentangling the data sampling procedure from data augmentation, integration with either tf.data.Dataset or tf.placeholder becomes straight forward. The following function, from nn_utils.py, illustrates how to apply augmentation on a Tensor of images.

#===================Sample usage with tf.placeholder===================
self.images_tensor = tf.placeholder(tf.float32, shape=(batch_size, 256, 256,3))
## To apply random cropping and horizontal flipping only
self.images_augmented = augment(self.input, resize=(224,224), horizontal_flip=True)

#===================Sample usage with tf.data.Dataset===================
dataset = dataset.batch(batch_size) # dataset contains both images and labels
dataset = dataset.map(lambda im_batch, lbl_batch: (nn_utils.augment(im_batch,resize=(224,224),horizontal_flip=True), lbl_batch))

#======================================
def augment(images,
            resize=None,  # (width, height) tuple or None
            horizontal_flip=False,
            vertical_flip=False,
            rotate=0,  # Maximum rotation angle in degrees
            noise_probability = 0,
            color_aug_probability = 0,
            mixup=0):  # Mixup coefficient, see https://arxiv.org/abs/1710.09412.pdf  ## Used during training

    ## image \in [0,255]^{256x256x3} 

    # Random Crop on Batch Level
    max_offest = 256 - resize[0] # all images are 256x256x3
    rand = tf.random_uniform([2], minval=0, maxval=max_offest,dtype=tf.int32)
    height_offset = tf.cast(rand[0] , dtype=tf.int32)
    width_offest = tf.cast(rand[1] , dtype=tf.int32)
    images = tf.image.crop_to_bounding_box(images,height_offset, width_offest, const.frame_height , const.frame_width )


# Color Augmentation
    rand = tf.random_uniform([], minval=0, maxval=1, dtype=tf.float32)
    do_color_distortion = tf.less(rand, color_aug_probability)
    images = images / 255.0
    images = tf.cond(do_color_distortion, lambda: tf.identity(
        apply_with_random_selector(images, lambda x, ordering: distort_color(images, ordering), num_cases=4)),
                    lambda: tf.identity(images))

    # Adding Noise
    noise = tf.random_normal(shape=tf.shape(images), mean=0.0, stddev=(50) / (255), dtype=tf.float32)
    # images =
    noisy_coin = tf.random_uniform([], minval=0, maxval=1, dtype=tf.float32)
    do_noise_img = tf.less(noisy_coin , noise_probability)
    images = tf.cond(do_noise_img, lambda : images + noise,lambda: tf.identity(images))
    images = tf.clip_by_value(images, 0.0, 1.0)
    images = images * 255
    
    
	## Different nets take input ranges. 
	## Densenet assumes [-1,1], while Inception assumes [0,1]
    if config.preprocess_func == 'densenet':
        print('DenseNet Format Augmentation')
        images = denseNet_preprocess(images)
    elif config.preprocess_func == 'inception_v1':
        print('Inception Format Augmentation')
        images = inception_preprocessing(images)
    else:
        raise NotImplementedError()



    with tf.name_scope('augmentation'):
        shp = tf.shape(images)
        batch_size, height, width = shp[0], shp[1], shp[2]
        width = tf.cast(width, tf.float32)
        height = tf.cast(height, tf.float32)

        # The list of affine transformations that our image will go under.
        # Every element is Nx8 tensor, where N is a batch size.
        transforms = []
        identity = tf.constant([1, 0, 0, 0, 1, 0, 0, 0], dtype=tf.float32)
        if horizontal_flip:
            coin = tf.less(tf.random_uniform([batch_size], 0, 1.0), 0.5)
            flip_transform = tf.convert_to_tensor(
                [-1., 0., width, 0., 1., 0., 0., 0.], dtype=tf.float32)
            transforms.append(
                tf.where(coin,
                         tf.tile(tf.expand_dims(flip_transform, 0), [batch_size, 1]),
                         tf.tile(tf.expand_dims(identity, 0), [batch_size, 1])))

        if vertical_flip:
            coin = tf.less(tf.random_uniform([batch_size], 0, 1.0), 0.5)
            flip_transform = tf.convert_to_tensor(
                [1, 0, 0, 0, -1, height, 0, 0], dtype=tf.float32)
            transforms.append(
                tf.where(coin,
                         tf.tile(tf.expand_dims(flip_transform, 0), [batch_size, 1]),
                         tf.tile(tf.expand_dims(identity, 0), [batch_size, 1])))

        if rotate > 0:
            angle_rad = rotate / 180 * math.pi
            angles = tf.random_uniform([batch_size], -angle_rad, angle_rad)
            transforms.append(
                tf.contrib.image.angles_to_projective_transforms(
                    angles, height, width))


        if transforms:
            images = tf.contrib.image.transform(
                images,
                tf.contrib.image.compose_transforms(*transforms),
                interpolation='BILINEAR')  # or 'NEAREST'

        def cshift(values):  # Circular shift in batch dimension
            return tf.concat([values[-1:, ...], values[:-1, ...]], 0)

    return images

It can be used with tf.placeholder or tf.data.Dataset. This provides maximum flexibility in terms of data loading and sampling. During training, the augment method supports a large set of augmentations. It supports batch level augmentation like in random-cropping, i.e. no need to unstack images. It supports also per-image augmentation like in color-distortion. During inference center_crop(images) crops the input image center.

The augment method supports

  • Random crop
  • Color distortion
  • Add Noise
  • Horizontal and vertical flippig
  • Rotation

augment employs distort_color method from Tensorflow Slim inception_preprocessing.py. This emphasizes flexibility in terms of reusing existing augmentation libraries.

The random cropping implementation extracts the same random crop from all images within a batch. This differs from the inception_preprocessing.py distorted_bounding_box_crop which extracts a different random crop from different images within a batch. Yet, distorted_bounding_box_crop can easily substitute the current random cropping implementation.

def center_crop(images): ## Used during evaluation
    center_offest = (256 - const.frame_width )//2 # all images are 256x256x3
    images = tf.image.crop_to_bounding_box(images, center_offest , center_offest , const.frame_height, const.frame_width)

    if config.preprocess_func == 'inception_v1':
        print('Inception Format Augmentation')
        images = inception_preprocessing(images)
    elif config.preprocess_func == 'densenet':
        print('DenseNet Format Augmentation')
        images = denseNet_preprocess(images)
    else:
        raise NotImplementedError()

    return images

Sample Augmentation results

Credits

Credits are due to Sergey Arkhangelskiy for sharing his implementation. The current implementation adds extra capabilities like random cropping and color augmentation but it is inspired by his original implementation.

Clone this wiki locally