<h3 align="center" style='color:blue'>Optimize tensorflow pipeline performance with prefetch and caching</h3>

In [19]:
import tensorflow as tf
import time

In [20]:
tf.__version__

'2.11.0'

<h3 style='color:purple'>Prefetch</h3>

In [21]:
class FileDataset(tf.data.Dataset):
    def read_file_in_batches(num_samples):
        # Opening the file. This time corresponds to the "Open" time in the graph of the cache part shown below.
        time.sleep(0.03)

        for sample_idx in range(num_samples):
            # Reading data (line, record) from the file. This is the task that is typically executed by the CPU
            time.sleep(0.015)

            yield (sample_idx,)

    def __new__(cls, num_samples=3):
        return tf.data.Dataset.from_generator(
            cls.read_file_in_batches,
            output_signature = tf.TensorSpec(shape = (1,), dtype = tf.int64),
            args=(num_samples,)
        )

In [22]:
def benchmark(dataset, num_epochs=2):
    for epoch_num in range(num_epochs):
        for sample in dataset:
            # Performing a training step. This is the task that is typically executed by the GPU
            time.sleep(0.01)

The execution time shown below correspond to these times:

See first, the execution of the tasks without using the prefetch() function.

![alt text](Imagen2.png "Title")

Now, this is using the using the prefetch() function

![alt text](Imagen1.png "Title")

In [26]:
%%timeit
benchmark(FileDataset())

343 ms ± 16.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [27]:
%%timeit
benchmark(FileDataset().prefetch(1))

354 ms ± 13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [28]:
%%timeit
benchmark(FileDataset().prefetch(tf.data.AUTOTUNE))

353 ms ± 7.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


**As you can notice above, using prefetch improves the performance from 304 ms to 238 and 240 ms**

<h3 style='color:purple'>Cache</h3>

See first, the execution of the tasks without using the cache() function.

![alt text](Imagen3.png "Title")

Now, this is using the using the cache() function

![alt text](Imagen4.png "Title")

In [4]:
dataset = tf.data.Dataset.range(5)
dataset = dataset.map(lambda x: x**2)
dataset = dataset.cache("mycache.txt")
# The first time reading through the data will generate the data using
# `range` and `map`.
list(dataset.as_numpy_iterator())

Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089


[0, 1, 4, 9, 16]

In [None]:
# Subsequent iterations read from the cache.
list(dataset.as_numpy_iterator())

In [None]:
def mapped_function(s):
    # Do some hard pre-processing
    tf.py_function(lambda: time.sleep(0.03), [], ())
    return s

In [None]:
%%timeit -r1 -n1
benchmark(FileDataset().map(mapped_function), 5)

In [None]:
%%timeit -r1 -n1
benchmark(FileDataset().map(mapped_function).cache(), 5)

<h3 style='color:purple'>About the shuffle and cache functions</h3>

In [5]:
tf_dataset = dataset.shuffle(3).cache()

In [29]:
for element in tf_dataset:
    print(element.numpy())

4
0
9
16
1


In [30]:
dataset2 = tf.data.Dataset.range(5)
dataset2 = dataset2.map(lambda x: x**2)
list(dataset2.as_numpy_iterator())


[0, 1, 4, 9, 16]

In [31]:
tf_dataset2 = dataset2.shuffle(3)

In [33]:
for element in tf_dataset2:
    print(element.numpy())

1
0
4
16
9


**Further reading** https://www.tensorflow.org/guide/data_performance#caching