# TENSORFLOW INPUT PILPELINE 
- A convient to handle huge data by loading , processing and manipulating in chunks or batches

- Handles many file types and many formats including cloud (S3)

- Allows Distributed Traning

- Uses ```tf.data.dataset``` class to handle all the stuffs, which stores data in form of tensors.


# Basic Usage

### Importing ```Tensorflow```

In [None]:
import tensorflow as tf

### Creating Dataset
-  a simple tensorflow dataset object from a list/array using 
```tf_data.dataset.from_tensor_slices(list)```

In [None]:
dataset = [12,15,67,-56,78,90,25,-890,-45,67,90,45,34,-100,300]

tf_dataset = tf.data.Dataset.from_tensor_slices(dataset)
tf_dataset

<TensorSliceDataset shapes: (), types: tf.int32>

## Viewing Contents
Using Different Methods
1. View the content by iterating
2. Converting the tensor to numpy object using ```numpy()```
3. Or use ```tf_dataset.as_numpy_iterator```
4. For looking at first few elements just like **df.head()** use - `take()`

In [None]:
# view the content by iterating
for i in tf_dataset:
    print(i)

tf.Tensor(12, shape=(), dtype=int32)
tf.Tensor(15, shape=(), dtype=int32)
tf.Tensor(67, shape=(), dtype=int32)
tf.Tensor(-56, shape=(), dtype=int32)
tf.Tensor(78, shape=(), dtype=int32)
tf.Tensor(90, shape=(), dtype=int32)
tf.Tensor(25, shape=(), dtype=int32)
tf.Tensor(-890, shape=(), dtype=int32)
tf.Tensor(-45, shape=(), dtype=int32)
tf.Tensor(67, shape=(), dtype=int32)
tf.Tensor(90, shape=(), dtype=int32)
tf.Tensor(45, shape=(), dtype=int32)
tf.Tensor(34, shape=(), dtype=int32)
tf.Tensor(-100, shape=(), dtype=int32)
tf.Tensor(300, shape=(), dtype=int32)


In [None]:
# Converting the tensor to numpy object using ```numpy()```
for i in tf_dataset:
    print(i.numpy())

12
15
67
-56
78
90
25
-890
-45
67
90
45
34
-100
300


In [None]:
# Or use ```tf_dataset.as_numpy_iterator```
for i in tf_dataset.as_numpy_iterator():
    print(i)

12
15
67
-56
78
90
25
-890
-45
67
90
45
34
-100
300


In [None]:
#For looking at first few elements just like df.head() use - take()
for i in tf_dataset.take(3): 
    print(i.numpy())

12
15
67


## Filtering 
- use ```tf_dataset.filter(custom_fn)```
- filter invalid data points - here negative 

In [None]:
# lambda x : x>0 - fn to return the positive 

tf_dataset = tf_dataset.filter(lambda x : x>0)

for i in tf_dataset.as_numpy_iterator():
    print('$ '+str(i))

$ 12
$ 15
$ 67
$ 78
$ 90
$ 25
$ 67
$ 90
$ 45
$ 34
$ 300


## Mapping
- Map using `.map(custom_fn)`
- maps the fn to all elements of a dataset
 coverting the elements to rupees using expression ```element * 72``` 

In [None]:
for i in tf_dataset.map(lambda x : x*72):
    print('Rs '+str(i.numpy()))

Rs 864
Rs 1080
Rs 4824
Rs 5616
Rs 6480
Rs 1800


## Shuffling
- Suffle the dataset using `.shuffle(buffer)`
- Buffer is a free parameter
- Usefull for creating piplien for image data analysis where one want to randomly shuffle the dataset

In [None]:
for i in tf_dataset.shuffle(3):
    print(i.numpy())

12
67
90
15
78
90
25
45
300
67
34


## Batching 

- Split Data into batches using batch(no)
- Batch the training dataset
- Distribute it on **multi gpu**
- Usefull if code running in multi gpu enviroment - offices , data centers , etc 

In [None]:
for i in tf_dataset.batch(3):
    print(i.numpy())

# data split into batches of size 2 with 3 elements each

[12 15 67]
[78 90 25]


## Most Usefull `.` notation
Use to chain all the function we defined earlier:

- Load
- Filter
- Shuffle
- Map
- Batch

In [None]:
#tf_dataset = tf.data.Dataset.from_tensor_slices(dataset)

#for i in tf_dataset.as_numpy_iterator():
#    print(i)

In [None]:
# one liner using '.'
# reading + filtering + mapping + shuffling + batching in one  line
tf_dataset_new = tf.data.Dataset.from_tensor_slices(dataset).filter(lambda x: x>0).map(lambda a: a*72).shuffle(2).batch(2)

for i in tf_dataset_new.as_numpy_iterator():
    print(i)

[1080  864]
[5616 4824]
[6480 1800]
[4824 6480]
[3240 2448]
[21600]
