<a href="https://colab.research.google.com/github/zerotodeeplearning/ztdl-masterclasses/blob/master/notebooks/Data_Augmentation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Learn with us: www.zerotodeeplearning.com

Copyright © 2021: Zero to Deep Learning ® Catalit LLC.

In [None]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Data Augmentation

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
import os
from tensorflow.keras.preprocessing import image

In [None]:
# sports_images_path = tf.keras.utils.get_file(
#     'sports_images',
#     'https://archive.org/download/ztdl_sports_images/sports_images.tgz',
#      untar=True)

In [None]:
![[ ! -f sports_images.tar.gz ]] && gsutil cp gs://ztdl-datasets/sports_images.tar.gz .
![[ ! -d sports_images ]] && echo "Extracting images..." && tar zxf sports_images.tar.gz
sports_images_path  = './sports_images'

In [None]:
train_path = os.path.join(sports_images_path, 'train')
test_path = os.path.join(sports_images_path, 'test')

In [None]:
batch_size = 16
img_size = 224

In [None]:
datagen = image.ImageDataGenerator(
    rescale=1./255.,
    rotation_range=15,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=5,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest')

In [None]:
input_path = os.path.join(train_path, 'Beach volleyball/1e9ce0e76695de2e5d1f6964ab8c538.jpg')
img = image.load_img(input_path, target_size=(img_size, img_size))
img

In [None]:
img_array = image.img_to_array(img)

In [None]:
img_tensor = np.expand_dims(img_array, axis=0)

In [None]:
plt.figure(figsize=(10, 10))

i = 0
for im in datagen.flow(img_tensor, batch_size=1):
    i += 1
    if i > 16:
        break
    plt.subplot(4, 4, i)
    plt.imshow(im[0])
    plt.axis('off')

plt.tight_layout()

### Exercise 1

- Use the `flow_from_directory` method of the data generator to produce a batch of images of sports flowing from the training directory `train_path`.
- display the images with their labels

Your code should look like:

```python
train_datagen = datagen.flow_from_directory(
    # YOUR CODE HERE
)

batch, labels = train_datagen.next()

plt.figure(figsize=(10, 10))
for i in range(len(batch)):
    # YOUR CODE HERE
```

### Tensorflow Data Generators

In [None]:
from imutils import paths

In [None]:
def parse_images(im_path):
  im = tf.io.read_file(im_path)
  im = tf.image.decode_jpeg(im, channels=3)
  im = tf.image.convert_image_dtype(im, tf.float32)
  im = tf.image.resize(im, [img_size, img_size])
  label = tf.strings.split(im_path, os.path.sep)[-2]
  return (im, label)

In [None]:
im_paths = list(paths.list_images(train_path))
path_ds = tf.data.Dataset.from_tensor_slices((im_paths))

In [None]:
AUTO = tf.data.experimental.AUTOTUNE

In [None]:
train_ds = (
    path_ds
    .map(parse_images, num_parallel_calls=AUTO)
    .shuffle(10000)
    .batch(batch_size)
    .prefetch(AUTO)
)

In [None]:
batch, labels = next(iter(train_ds))

In [None]:
plt.figure(figsize=(10, 10))
for i in range(len(batch)):
  plt.subplot(4, 4, i+1)
  plt.imshow(batch[i])
  plt.title(labels[i].numpy().decode('utf-8'))
  plt.axis('off')

plt.tight_layout()

### Exercise 2: Data augmentation with Keras layers

Keras provides a few experimental layers to include data augmentation in the model.

- Define a data augmentation model using a `Sequential` model with a few layers from the `tensorflow.keras.layers.experimental.preprocessing` submodule.
- Apply this model on the batch using the flag `training=True` to ensure data augmentation is applied
- Visualize the augmented images as above
- What are the advantages of including data augmentation in the model?