# Using Tensorflow DALI plugin: DALI tf.data.Dataset

### Overview

In this tutorial you will find out how to integrate a DALI pipeline with tf.data API and use it in training with various TensorFlow APIs. We will use well known MNIST dataset converted to JPEGs. You can find it in DALI_extra repository ready to use.

Let's start with creating a pipeline to read MNIST images.

In [55]:
import nvidia.dali as dali
from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
import nvidia.dali.types as types

import os

# Path to MNIST dataset
data_path = os.path.join(os.environ['DALI_EXTRA_PATH'], 'db/MNIST/training/')


class MnistPipeline(Pipeline):
    def __init__(self, device, device_id=0, num_threads=4, seed=0):
        super(MnistPipeline, self).__init__(
            batch_size, num_threads, device_id, seed)
        self.device = device
        self.reader = ops.FileReader(file_root=data_path, random_shuffle=True)
        self.decode = ops.ImageDecoder(
            device='mixed' if device is 'gpu' else 'cpu',
            output_type=types.GRAY)
        self.cmn = ops.CropMirrorNormalize(
            device=device,
            output_dtype=types.FLOAT,
            image_type=types.GRAY,
            mean=[0.],
            std=[255.],
            output_layout=types.NCHW)

    def define_graph(self):
        inputs, labels = self.reader(name="Reader")
        images = self.decode(inputs)
        if self.device is 'gpu':
            labels = labels.gpu()
        images = self.cmn(images)

        return (images, labels)

Now we define some parameters of the training:

In [56]:
batch_size = 32
dropout = 0.2
image_size = 28
num_classes = 10
hidden_size = 128
epochs = 5
iterations = 100

Now, instead of the usuall workflow of building a pipeline we wrap it with `DALIDataset` object from DALI TensorFlow plugin. This class is compatible with `tf.data.Dataset`. We need to pass expected shapes and types of the outputs with the pipeline.

In [57]:
import nvidia.dali.plugin.tf as dali_tf
import tensorflow as tf

# Create pipeline
mnist_pipeline = MnistPipeline(device='cpu', device_id=0)

# Define shapes and types of the outputs
shapes = [
    (batch_size, image_size, image_size),
    (batch_size)]
dtypes = [
    tf.float32,
    tf.int32]

# Create dataset
mnist_set = dali_tf.DALIDataset(
    pipeline=mnist_pipeline,
    batch_size=batch_size,
    shapes=shapes,
    dtypes=dtypes,
    device_id=0)

We are ready to start the training. 

### Keras

First, we will pass `mnist_set` to `tf.keras` model.

In [58]:
# Define the model
model = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(image_size, image_size), name='images'),
    tf.keras.layers.Flatten(input_shape=(image_size, image_size)),
    tf.keras.layers.Dense(hidden_size, activation='relu'),
    tf.keras.layers.Dropout(dropout),
    tf.keras.layers.Dense(num_classes, activation='softmax')])
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy'])

# Train using DALI dataset
model.fit(
    mnist_set,
    epochs=epochs,
    steps_per_epoch=iterations)

Train on 100 steps
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7f4dbc272e48>

As you can see, it was very easy to integrate DALI pipeline with `tf.keras` API.

Above code performed the training usgin the CPU. We can easily move the whole processing to the GPU. Both the DALI pipelien and the Keras model will be using the GPU without any CPU buffer between them.

In [59]:
# Create pipeline
mnist_pipeline = MnistPipeline(device='gpu', device_id=0)

# Define the model and place it on the GPU
with tf.device('/gpu:0'):
    model = tf.keras.models.Sequential([
        tf.keras.layers.Input(shape=(image_size, image_size), name='images'),
        tf.keras.layers.Flatten(input_shape=(image_size, image_size)),
        tf.keras.layers.Dense(hidden_size, activation='relu'),
        tf.keras.layers.Dropout(dropout),
        tf.keras.layers.Dense(num_classes, activation='softmax')])
    model.compile(
        optimizer='adam',
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy'])
    
# Train on the GPU. Data pipeline will be using the GPU as well.
model.fit(
    mnist_set,
    epochs=epochs,
    steps_per_epoch=iterations)

Train on 100 steps
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7f4aa52bcbe0>

That is all that was needed to use the GPU as a training accelerator.

### Estimators

In the next section we will use `tf.estimator` API instead of the `tf.keras`. 

In [66]:
# Define the feature columns
tf.compat.v1.disable_eager_execution()
feature_columns = [tf.feature_column.numeric_column(
    "images", shape=[image_size, image_size])]

# And the run config
# run_config = tf.estimator.RunConfig(
#     model_dir='/tmp/tensorflow-checkpoints',
#     device_fn=lambda op: '/gpu:0')

# Finally create the model
model = tf.estimator.DNNClassifier(
    feature_columns=feature_columns,
    hidden_units=[hidden_size],
    n_classes=num_classes,
    dropout=dropout,
#     config=run_config,
    optimizer='Adam')

# In tf.estimator data is passed with the function returning the dataset
mnist_pipeline = MnistPipeline(device='gpu', device_id=0)

def train_data_fn():
#     with tf.device('/gpu:0'):
    mnist_set = dali_tf.DALIDataset(
        pipeline=mnist_pipeline,
        batch_size=batch_size,
        shapes=shapes,
        dtypes=dtypes,
        device_id=0)
    mnist_set = mnist_set.map(
        lambda features, labels: ({'images': features}, labels))
        
    return mnist_set

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpzd615gvd', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f4a84ae7550>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


In [68]:
# Running the training
model.train(input_fn=train_data_fn, steps=epochs * iterations)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpzd615gvd/model.ckpt-500
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 500 into /tmp/tmpzd615gvd/model.ckpt.
INFO:tensorflow:loss = 0.5080383, step = 500
INFO:tensorflow:global_step/sec: 207.625
INFO:tensorflow:loss = 0.4004699, step = 600 (0.484 sec)
INFO:tensorflow:global_step/sec: 211.015
INFO:tensorflow:loss = 0.17886795, step = 700 (0.474 sec)
INFO:tensorflow:global_step/sec: 212.544
INFO:tensorflow:loss = 0.15201095, step = 800 (0.471 sec)
INFO:tensorflow:global_step/sec: 210.254
INFO:tensorflow:loss = 0.18331778, step = 900 (0.475 sec)
INFO:tensorflow:Saving checkpoints for 1000 into /tmp/tmpzd615gvd/model.ckpt.
INFO:tensorflow:Loss for final step: 0.06891136.


<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifierV2 at 0x7f4aa5572780>