<a href="https://colab.research.google.com/github/Aayushktyagi/DeepLearning_Resources/blob/master/tf_data_with_tf_keras.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The purpose of this notebook is two-fold:

- Show the usage of `tf.data.Dataset.from_generator` along with Keras `ImageDataGenerator` for Keras models.
- Compare the performance between `tf.data.Dataset.from_generator` and `ImageDtaGenerator`. 

A huge thanks to **Picsou Balthazar** for helping me out in this. 

In [None]:
# Install TensorFlow 2.0 (GPU)
!pip install tensorflow-gpu

In [None]:
# Import the packages
# Import the packages for DL stuff
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import SGD
from imutils import paths
import matplotlib.pyplot as plt
import tensorflow as tf
import pandas as pd
import numpy as np
import time

In [None]:
# verify if the right version was installed
tf.__version__

'2.0.0'

In [None]:
# Get the flowers' dataset
flowers = tf.keras.utils.get_file(
    'flower_photos',
    'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
    untar=True)

In [None]:
!ls {flowers}

daisy  dandelion  LICENSE.txt  roses  sunflowers  tulips


Know more about the dataset here: https://www.tensorflow.org/tutorials/load_data/images#load_using_tfdata. Following is the glimpse of the dataset.

![](https://www.tensorflow.org/tutorials/load_data/images_files/output_suh6Sjv68rY3_0.png)

In [None]:
# Initialize the data augmentation object and set its mean to the
# mean of the ImageNet dataset
img_gen = tf.keras.preprocessing.image.ImageDataGenerator(rotation_range=20)
mean = np.array([123.68, 116.779, 103.939], dtype="float32")
img_gen.mean = mean

In [None]:
# Wrap the generator with tf.data
ds = tf.data.Dataset.from_generator(
    lambda: img_gen.flow_from_directory(flowers,
            class_mode="categorical",
            target_size=(224, 224),
            color_mode="rgb",
            shuffle=True),
    output_types=(tf.float32, tf.float32),
    output_shapes = ([None,224,224,3],[None,5])
)

ds

<DatasetV1Adapter shapes: ((None, 224, 224, 3), (None, 5)), types: (tf.float32, tf.float32)>

In [None]:
# Verify the shapes yielded by the data generator
train_gen = img_gen.flow_from_directory(flowers,
            class_mode="categorical",
            target_size=(224, 224),
            color_mode="rgb",
            shuffle=True)
images, labels = next(train_gen)
images.shape, labels.shape

Found 3670 images belonging to 5 classes.


((32, 224, 224, 3), (32, 5))

In [None]:
# A helper script for which would initialize and compile 
# our model
def get_training_model():
    baseModel = VGG16(weights="imagenet", include_top=False,
        input_tensor=Input(shape=(224, 224, 3)))

    headModel = baseModel.output
    headModel = Flatten(name="flatten")(headModel)
    headModel = Dense(512, activation="relu")(headModel)
    headModel = Dropout(0.5)(headModel)
    headModel = Dense(5, activation="softmax")(headModel)

    model = Model(inputs=baseModel.input, outputs=headModel)

    for layer in baseModel.layers:
        layer.trainable = False

    opt = SGD(lr=1e-4, momentum=0.9)
    model.compile(loss="categorical_crossentropy", optimizer=opt,
        metrics=["accuracy"])
    return model

In [None]:
# Get the total number of images present in the
# root dataset directory
total_data = len(list(paths.list_images(flowers)))

In [None]:
# Kickstart model training with tf.data
model = get_training_model()
start = time.time()
model.fit(ds, 
         steps_per_epoch=total_data//32,
         epochs=5)
print("It took {} seconds".format(time.time() - start))

Train for 114 steps
Epoch 1/5
Found 3670 images belonging to 5 classes.
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
It took 399.8591330051422 seconds


It took **399.85 seconds**. Let's now see how `ImageDataGenerator` performs. 

In [None]:
# Kickstart model training with ImageDataGenerator
model = get_training_model()
start = time.time()
model.fit_generator(train_gen, 
                   steps_per_epoch=total_data//32,
                   epochs=5)
print("It took {} seconds".format(time.time() - start))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
It took 685.5545630455017 seconds


Takes **685.55 seconds**. 

Links to `tf.data` guides: 
- https://www.tensorflow.org/guide/data
- https://www.tensorflow.org/guide/data_performance