Why is it necessary to build tf pipeleine? I mean it is much easier just to train on numpy dataset. So, why use extra effort? 

==> Before, answering this we need to know what happens when we train data using numpy, then all of the data is loaded in to the memory directly.
> In this scenario what happen if the training dataset is too large and can't fit into a memory? 

So, to handle this scenario tensorflow provides **tf.data** api which can handle large dataset. Using tf.data we can use dataset as iterator thus we don't need to load whole dataset within the memory. Beside handling large training dataset other advantages of using tf.data training pipeline are: 

1. Preprocessing during data. tf.data provides flexible way to preprocess dataset while training. Examples are if you are training images then you can do preprocessing steps such as rotation before training.
2. tf.data provide consistent for working with different data types such as csv,images, video and so on.
3. tf.data can also provide better compatibility with distributed training frameworks such as TensorFlow's tf.distribute API, which can help to scale training across multiple GPUs or machines.


 
 We usually use **tf.data.Dataset** method to handle tf data.  We will learn this in upcoming slides


In [2]:
import tensorflow as tf

tf.data dataset object are iterable in nature(similar to iterables and generators).Two of the most used method to create tf dataset are:

- tf.data.Dataset.from_tensors: Creates a Dataset with a single element, comprising the given tensors. This method produces a dataset containing only a single element.

- tf.data.Dataset.from_tensor_slices: Creates a Dataset whose elements are slices of the given tensors. The given tensors are sliced along their first dimension. This operation preserves the structure of the input tensors, removing the first dimension of each tensor and using it as the dataset dimension. All input tensors must have the same size in their first dimensions.

In [3]:
# Creating tensors

tensor1=tf.constant([1,2,3])
tensor2=tf.constant([[1,2,3],[4,5,6]])
tensor3=tf.Variable(tf.random.normal(shape=(10,3)))
tensor4=tf.range(start=1,limit=11,delta=1)
tensor5=tf.constant([[1,2,3]])



In [5]:
# Creating different object of tf.data object using tf.data.Dataset.from_tensors

dataset_1=tf.data.Dataset.from_tensors(tensor1) # from_tensors produces a dataset containing only a single element. To slice the input tensor into multiple elements
dataset_2=tf.data.Dataset.from_tensors(tensor2)
dataset_3=tf.data.Dataset.from_tensors(tensor3)
dataset4=tf.data.Dataset.from_tensors(tensor4)
dataset5=tf.data.Dataset.from_tensors(tensor5)
labeled_dataset=tf.data.Dataset.from_tensors((tensor3,tensor4))



print()

print('len of dataset1 :',len(dataset_1))
print('len of dataset2 :',len(dataset_2))
print('len of the dataset3 :',len(dataset_3))
print('len of the dataset4 :',len(dataset4))
print('len of the dataset5 :',len(dataset5))
print('Len of labeled_dataset : ', len(labeled_dataset))

print()
print('dataset1')
for element in dataset_1:
    print(element)

print()
print('dataset2')
for element in dataset_2:
    print(element)

print()
print('dataset3')
for element in dataset_3:
    print(element)

print()
print('dataset4')
for element in dataset4:
    print(element)

print()
print('dataset5')
for element in dataset5:
    print(element)

print('Labeled Dataset')

for element in labeled_dataset:
    print(element)



len of dataset1 : 1
len of dataset2 : 1
len of the dataset3 : 1
len of the dataset4 : 1
len of the dataset5 : 1
Len of labeled_dataset :  1

dataset1
tf.Tensor([1 2 3], shape=(3,), dtype=int32)

dataset2
tf.Tensor(
[[1 2 3]
 [4 5 6]], shape=(2, 3), dtype=int32)

dataset3
tf.Tensor(
[[-1.2151529  -0.9854829   0.09837101]
 [-0.53210956 -0.04680683 -0.7644955 ]
 [-0.34821305  0.2469302  -0.07629711]
 [ 0.01787011  0.02813985 -1.6390443 ]
 [-0.5280993  -0.84917665 -1.5853513 ]
 [ 1.2583779   0.4072297  -1.7567949 ]
 [ 1.7766263   0.6179925   0.48123974]
 [-1.2589386   0.7698292  -0.08977935]
 [ 0.4528408   0.01507247  0.03878821]
 [ 0.76816046  0.58638066  0.36378568]], shape=(10, 3), dtype=float32)

dataset4
tf.Tensor([ 1  2  3  4  5  6  7  8  9 10], shape=(10,), dtype=int32)

dataset5
tf.Tensor([[1 2 3]], shape=(1, 3), dtype=int32)
Labeled Dataset
(<tf.Tensor: shape=(10, 3), dtype=float32, numpy=
array([[-1.2151529 , -0.9854829 ,  0.09837101],
       [-0.53210956, -0.04680683, -0.764495

In [6]:
# Creating Dataset using tf.data.Dataset.from_tensor_slices

dataset_1=tf.data.Dataset.from_tensor_slices(tensor1) # from_tensors produces a dataset containing only a single element. To slice the input tensor into multiple elements
dataset_2=tf.data.Dataset.from_tensor_slices(tensor2)
dataset_3=tf.data.Dataset.from_tensor_slices(tensor3)
dataset4=tf.data.Dataset.from_tensor_slices(tensor4)
dataset5=tf.data.Dataset.from_tensor_slices(tensor5)
labeled_dataset=tf.data.Dataset.from_tensor_slices((tensor3,tensor4))
print()

print('len of dataset1 :',len(dataset_1))
print('len of dataset2 :',len(dataset_2))
print('len of the dataset3 :',len(dataset_3))
print('len of the dataset4 :',len(dataset4))
print('len of the dataset5 :',len(dataset5))
print('len of the labeled_dataset :',len(labeled_dataset))

print()
print('dataset1')
for element in dataset_1:
    print(element)

print()
print('dataset2')
for element in dataset_2:
    print(element)

print()
print('dataset3')
for element in dataset_3:
    print(element)

print()
print('dataset4')
for element in dataset4:
    print(element)

print()
print('dataset5')
for element in dataset5:
    print(element)

print()
print('labeled Dataset')

for element in labeled_dataset:
    print(element)


len of dataset1 : 3
len of dataset2 : 2
len of the dataset3 : 10
len of the dataset4 : 10
len of the dataset5 : 1
len of the labeled_dataset : 10

dataset1
tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)
tf.Tensor(3, shape=(), dtype=int32)

dataset2
tf.Tensor([1 2 3], shape=(3,), dtype=int32)
tf.Tensor([4 5 6], shape=(3,), dtype=int32)

dataset3
tf.Tensor([-1.2151529  -0.9854829   0.09837101], shape=(3,), dtype=float32)
tf.Tensor([-0.53210956 -0.04680683 -0.7644955 ], shape=(3,), dtype=float32)
tf.Tensor([-0.34821305  0.2469302  -0.07629711], shape=(3,), dtype=float32)
tf.Tensor([ 0.01787011  0.02813985 -1.6390443 ], shape=(3,), dtype=float32)
tf.Tensor([-0.5280993  -0.84917665 -1.5853513 ], shape=(3,), dtype=float32)
tf.Tensor([ 1.2583779  0.4072297 -1.7567949], shape=(3,), dtype=float32)
tf.Tensor([1.7766263  0.6179925  0.48123974], shape=(3,), dtype=float32)
tf.Tensor([-1.2589386   0.7698292  -0.08977935], shape=(3,), dtype=float32)
tf.Tensor([0.4528408  0.0

In [8]:
## The labeled dataset itself contains two tensor features and label

for feature, label in labeled_dataset:
    print('feature', feature)
    print('label ',label)

    break

feature tf.Tensor([-1.2151529  -0.9854829   0.09837101], shape=(3,), dtype=float32)
label  tf.Tensor(1, shape=(), dtype=int32)
