## The speed of your input pipeline counts
Here's a quick tip if you're using [TensorFlow Datasets](https://www.tensorflow.org/datasets/catalog/overview) to import your datasets. Use .cache() when writing your input pipeline, otherwise you'll be reading data off disk batch-by-batch, which can increase the training time of your model by **2x** (more if you're using GPUs). 

In this notebook, you'll create two identical models:

* The first will train about **2x** slower than the second.
* The only difference will be the input pipeline.

At the end of the notebook are resources you can use to learn more about writing efficient input pipelines.  

### Background: you may be used to loading toy datasets into memory

If you've previously worked with toy datasets like MNIST and libraries like Keras or Scikit-learn, you may know they import small datasets into memory by default. For example, this code:

```
from tensorflow.keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
```

downloads MNIST and returns NumPy arrays, as you would expect.

### TensorFlow Datasets downloads data to disk by default

[TensorFlow Datasets](https://www.tensorflow.org/datasets/catalog/overview) downloads datasets to disk. This makes sense (a bunch are huge). After you've downloaded a dataset, you need to write an [input pipeline](https://www.tensorflow.org/beta/tutorials/load_data/images) to read it, preprocess it, and feed it to your model. You can do that using tf.data - which at its best can be significantly faster than NumPy alone - but it does have a learning curve. This doesn't have to be too complicated, though. If you're working with small datasets that fit into memory, you can simply use .cache() for better performance, so they work as you'd expect.


For example, this code downloads a dataset to disk, and then reads it back batch by batch. This is probably not what you want:

```
# download and prepare a dataset
dataset = tfds.load(name='mnist', as_supervised=True)
train_ds = dataset['train'].map(format_example)
oops = train_ds.shuffle(shuffle_size).batch(batch_size)

# iterate over it
for i, batch in enumerate(oops):
  # do something with the batch

```

Each batch in the example above is loaded off disk when needed. Even simply iterating over this dataset will take a few seconds.

On the other hand, this code:

```
# use caching
better = train_ds.cache().shuffle(shuffle_size).batch(batch_size)
```

Will keep an in_memory cache of the data. That means every epoch after the first will run as fast as you'd expect. Note that the behavior of ```.cache()``` was just updated, which is why we'll install the nightly branch below.

For small datasets, you can also use a recently added ```in_memory``` flag, like this:

```
dataset = tfds.load(name='mnist', as_supervised=True, in_memory=True)
```

### Install the nightly branch of TF 2.0 beta

This notebook uses an update in the nightly branch. Notice we're installing a specific day (rather than the latest, which you can install with ```!pip install tf-nightly-2.0-preview```). That's to make this demo reproducible. Also, we're using the CPU version here. If you wanted the GPU version, you could use ```!pip install tf-nightly-gpu-2.0-preview```.

### Import TensorFlow and other libraries

In [1]:
!pip install tf-nightly-2.0-preview==2.0.0.dev20190815 -q

[K     |████████████████████████████████| 88.5MB 1.4MB/s 
[K     |████████████████████████████████| 71kB 19.9MB/s 
[K     |████████████████████████████████| 4.1MB 32.9MB/s 
[K     |████████████████████████████████| 450kB 42.7MB/s 
[?25h  Building wheel for opt-einsum (setup.py) ... [?25l[?25hdone


In [2]:
import tensorflow as tf
print(tf.__version__)

import tensorflow_datasets as tfds
tfds.disable_progress_bar()

import time

BATCH_SIZE = 32
SHUFFLE_BUFFER_SIZE = 10000

2.0.0-dev20190815


In [0]:
# If this fails, you may need to restart your runtime after installing 
# a new version of TF above (Runtime -> restart)
assert tf.__version__ == "2.0.0-dev20190815" 

### Create a tiny model
We'll train two identical copies.

In [0]:
def tiny_model():
  model = tf.keras.models.Sequential([
      tf.keras.layers.Flatten(),
      tf.keras.layers.Dense(10, activation='softmax')
  ])
  model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
  return model

### Download a small dataset using TensorFlow Datasets

In [5]:
dataset, info = tfds.load(name='mnist', as_supervised=True, with_info=True)

[1mDownloading and preparing dataset mnist (11.06 MiB) to /root/tensorflow_datasets/mnist/1.0.0...[0m


W0816 11:31:04.871948 140158811002752 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow_datasets/core/file_format_adapter.py:209: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`


[1mDataset mnist downloaded and prepared to /root/tensorflow_datasets/mnist/1.0.0. Subsequent calls will reuse this data.[0m


### A little preprocessing function

In [0]:
def format_example(image, label):
  image = tf.cast(image, tf.float32)
  return image, label

In [0]:
train_ds = dataset['train'].map(format_example)

### A slow input pipeline

Oops! This will read data batch-by-batch off disk, and slow down training by about 2x (or more!)

In [0]:
slow_ds = train_ds.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)

### A faster pipeline
This one uses .cache() to keep images in memory. The speed for the first epoch will be about the same as the slow pipeline (while the cache is being built). Afterwards, the model will train much faster.

In [0]:
fast_ds = train_ds.cache().shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)

### Compare the two
Compare the time taken for the first and subsequent epochs, w/ the fast and slow input pipeline. The first epoch should be about the same (while the cache is being built), the second and third will be much faster.

In [10]:
tiny_model().fit(slow_ds, epochs=3)

W0816 11:31:11.828142 140158811002752 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py:466: BaseResourceVariable.constraint (from tensorflow.python.ops.resource_variable_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Apply a constraint manually following the optimizer update step.


Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7f7902fec828>

In [11]:
tiny_model().fit(fast_ds, epochs=3)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7f790293f588>

When using a GPU, the performance difference will be greater. 


### Next steps

* To learn more about writing efficient input pipelines using tf.data (and how to cache expensive preprocessing, and datasets that do not fit in to memory) see this [guide](https://www.tensorflow.org/beta/tutorials/load_data/images). That guide also has a nice benchmark utility you can use to iterate over datasets (without training a model) to see the performance.

* For future work in progress to make this all easier, see this [Request for Comments](https://github.com/keras-team/governance/pull/6) on the Keras Preprocessing updates. 