**tf.data.Dataset** API support writing descriptive and effficient input pipelines.**Dataset** usage follows a common pattern:
1. Create a source dataset from your input data.
2. Apply dataset transformation to preprocess the data.
3. iterate over the dataset and process the element.

Some of the popular and useful method od tf.data objects are:
- as_numpy_iterator()
- cache()
- shuffle()
- batch()
- map()
- prefetch()
- zip()
- take()
- skip()

Specially these methods are used in creating tf.data pipeline during training the model in ordered way:
- shuffle()
- map()
- cache()
- batch()
- prefetch()




In [36]:
import tensorflow as tf
import numpy as np

In [37]:
ds=tf.data.Dataset.from_tensor_slices(tf.range(1,21))

# Dataset is created. Ds is a iterator.

In [38]:
for element in ds: 
    print(element.numpy()) 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20


In [39]:
#.as_numpy_iterator() : Returns an iterator which convert all elements of the dataset to numpy.

npy_iter=ds.as_numpy_iterator()
list(ds.as_numpy_iterator()) ## We can also list all the element 

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

In [22]:
for arr in npy_iter:
    print(arr)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20


In [23]:
## We can also list all the element in a list

print(list(npy_iter)) ## It returns an empty array because all the element has already been iterated

[]


In [27]:
# .batch() 

# Using this method, we will iterate batches of element in each  iteration

ds=tf.data.Dataset.from_tensor_slices(tf.range(1,21))

ds_batch=ds.batch(4)


for element in ds_batch:
    print(element)

print()
print(list(ds.batch(5).as_numpy_iterator()))

tf.Tensor([1 2 3 4], shape=(4,), dtype=int32)
tf.Tensor([5 6 7 8], shape=(4,), dtype=int32)
tf.Tensor([ 9 10 11 12], shape=(4,), dtype=int32)
tf.Tensor([13 14 15 16], shape=(4,), dtype=int32)
tf.Tensor([17 18 19 20], shape=(4,), dtype=int32)

[array([1, 2, 3, 4, 5]), array([ 6,  7,  8,  9, 10]), array([11, 12, 13, 14, 15]), array([16, 17, 18, 19, 20])]


In [28]:
# .map(); similar to map of python

ds=tf.data.Dataset.from_tensor_slices(tf.range(1,21))
ds=ds.map(lambda x: x**2)

print(list(ds.as_numpy_iterator()))

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400]


In [35]:

#.cache() 
# This method will cache the iteration of dataset.  The first time the dataset is iterated over, its elements will be cached either in the specified file or in memory.
# Subsequent iterations will use the cached data. 
# This method is specially useful when preprocessing can take a lot of time. 

In [40]:
ds=tf.data.Dataset.from_tensor_slices(tf.range(1,21))
ds=ds.map(lambda x: x+3) # We do mapping or other preprocessing technique before caching
ds=ds.cache() # Here when we don't provide the filepath for caching the ds then elements are cached() in the the memory.  However, if the ds is too large we will want to cached it into existing folder


for element in ds:
    print(element)

tf.Tensor(4, shape=(), dtype=int32)
tf.Tensor(5, shape=(), dtype=int32)
tf.Tensor(6, shape=(), dtype=int32)
tf.Tensor(7, shape=(), dtype=int32)
tf.Tensor(8, shape=(), dtype=int32)
tf.Tensor(9, shape=(), dtype=int32)
tf.Tensor(10, shape=(), dtype=int32)
tf.Tensor(11, shape=(), dtype=int32)
tf.Tensor(12, shape=(), dtype=int32)
tf.Tensor(13, shape=(), dtype=int32)
tf.Tensor(14, shape=(), dtype=int32)
tf.Tensor(15, shape=(), dtype=int32)
tf.Tensor(16, shape=(), dtype=int32)
tf.Tensor(17, shape=(), dtype=int32)
tf.Tensor(18, shape=(), dtype=int32)
tf.Tensor(19, shape=(), dtype=int32)
tf.Tensor(20, shape=(), dtype=int32)
tf.Tensor(21, shape=(), dtype=int32)
tf.Tensor(22, shape=(), dtype=int32)
tf.Tensor(23, shape=(), dtype=int32)


In [46]:

"""
-  .shuffle(buffer_size, seed=None, reshuffle_each_iteration=None, name=None): This method is used to shuffle the ds. 
Randomly shuffles the elements of this dataset.

This dataset fills a buffer with buffer_size elements, then randomly samples elements from this buffer, replacing the selected elements with new elements.
 For perfect shuffling, a uffer size greater than or equal to the full size of the dataset is required.

For instance, if your dataset contains 10,000 elements but buffer_size is set to 1,000, then shuffle will initially select a random element from only the first 1,000 elements 
in the buffer. Once an element is selected, its space in the buffer is replaced by the next (i.e. 1,001-st) element, maintaining the 1,000 element buffer.

reshuffle_each_iteration controls whether the shuffle order should be different for each epoch. In TF 1.X, the idiomatic way to create epochs was through the repeat transformation:=

""";


dataset = tf.data.Dataset.range(3)
dataset = dataset.shuffle(3, reshuffle_each_iteration=True)
dataset = dataset.repeat(2)
# [1, 0, 2, 1, 2, 0]

dataset = tf.data.Dataset.range(3)
dataset = dataset.shuffle(3, reshuffle_each_iteration=False)
dataset = dataset.repeat(2)
# [1, 0, 2, 1, 0, 2]

In [None]:
"""" 
.prefetch():  Most dataset input pipelines should end with a call to prefetch. This allows later elements to be prepared while the current element is being processed. 
This often improves latency and throughput, at the cost of using additional memory to store prefetched elements.

Note: Like other Dataset methods, prefetch operates on the elements of the input dataset. It has no concept of examples vs. batches. examples.prefetch(2) will prefetch two elements (2 examples), 
while examples.batch(20).prefetch(2) will prefetch 2 elements (2 batches, of 20 examples each).

""";