**tf.data.Dataset** API support writing descriptive and effficient input pipelines.**Dataset** usage follows a common pattern:
1. Create a source dataset from your input data.
2. Apply dataset transformation to preprocess the data.
3. iterate over the dataset and process the element.

In [2]:
import tensorflow as tf
import numpy as np

In [3]:
### Creating dataset from the list 
dataset=tf.data.Dataset.from_tensor_slices([1,2,3])

2022-12-24 20:01:01.898515: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-24 20:01:01.929393: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-24 20:01:01.929628: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-24 20:01:01.930497: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compi

In [5]:
for element in dataset:
    print(element)

tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)
tf.Tensor(3, shape=(), dtype=int32)


Methods of datasets

In [19]:
# as_numpy_iterator() : Return an iterator which converts all elements of the dataset to numpy.
d=dataset.as_numpy_iterator()
print('type(d) :', d)

## we can use next() to iterate d

print('d.next() :',d.next())
print('d.next() :',d.next())


type(d) : <tensorflow.python.data.ops.dataset_ops._NumpyIterator object at 0x7f603ea719c0>
d.next() : 1
d.next() : 4


In [25]:
## batch(): combines the consecutive elements as batches
dataset=tf.data.Dataset.from_tensor_slices(range(15))
dataset=dataset.batch(5) ## does not change the original dataset so, must be reassigned
iterator=dataset.as_numpy_iterator()
print('iterator next() value', iterator.next())


iterator next() value [0 1 2 3 4]


In [None]:
## take(number,name)
dataset=tf.data.Dataset.range(100)
dataset1=dataset.take(5) ## it will take 5 element of dataset to form dataset1

In [27]:
## cache(): caches the element in this dataset. 

dataset=tf.data.Dataset.range(5)
dataset=dataset.map(lambda x: x+3)
dataset=dataset.cache()
## When the datasets is read first time; then it will generate by range() and map()
print(list(dataset.as_numpy_iterator()))

## After this the iteration is read from the cache directiory

print(list(dataset.as_numpy_iterator()))


[3, 4, 5, 6, 7]
[3, 4, 5, 6, 7]


In [6]:
##  Applying transformation using map

dataset=dataset.map(lambda x: x**2)


In [7]:
for element in dataset:
    print(element)

tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(4, shape=(), dtype=int32)
tf.Tensor(9, shape=(), dtype=int32)


In [9]:
## Listing the element in tf.data.Dataset
list(dataset.as_numpy_iterator())

[1, 4, 9]

Creating dataset consisting features and labels
 

In [33]:
features=tf.random.normal(shape=(2,5))
labels=tf.constant(['A','B'])

In [35]:
dataset=tf.data.Dataset.from_tensor_slices((features,labels))

prefetch()
>Most dataset input pipelines should end with a call to prefetch. This allows later elements to be prepared while the current element is being processed. This often improves latency and throughput, at the cost of using additional memory to store prefetched elements.

Note: Like other Dataset methods, prefetch operates on the elements of the input dataset. It has no concept of examples vs. batches. examples.prefetch(2) will prefetch two elements (2 examples), while examples.batch(20).prefetch(2) will prefetch 2 elements (2 batches, of 20 examples each).

In [42]:
dataset=tf.data.Dataset.from_tensor_slices(range(10))
dataset=dataset.take(3)
dataset.prefetch(tf.data.experimental.AUTOTUNE)

<PrefetchDataset element_spec=TensorSpec(shape=(), dtype=tf.int32, name=None)>

Implying what we learned till now on MNIST datasets

In [43]:
(x_train, y_train), (x_test, y_test)=tf.keras.datasets.mnist.load_data()

In [85]:
## Transformation function 

def transform(x,y):
    
    x=tf.cast(x,dtype=tf.float32)/255.0
    
    x=tf.reshape(x,(784,))
    y=tf.one_hot(y,10)
    print(x.shape)
    return x,y


In [86]:
ds_train=tf.data.Dataset.from_tensor_slices((x_train,y_train))

ds_train=ds_train.map(transform,num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_train=ds_train.batch(64)
ds_train=ds_train.cache()
ds_train=ds_train.prefetch(tf.data.experimental.AUTOTUNE)

(784,)
