# Tensorflow Pipeline

The pipeline will load the data in batch, or small chunk. Each batch will be pushed to the pipeline and be ready for the training. Building a pipeline is an excellent solution because it allows you to use parallel computing. It means Tensorflow will train the model across multiple CPUs. It fosters the computation and permits for training powerful neural network.

# Steps to create a pipeline

## Load the data

In [1]:
#here we will use numpy to generate arbitary data
import numpy as np

x_input = np.random.sample((3,4)) #data dimension is 3x4
print(x_input)

[[0.90019516 0.42533469 0.78767511 0.88946094]
 [0.29206842 0.7553205  0.58801091 0.40242626]
 [0.84887678 0.52333101 0.94105812 0.10045303]]


## Create placeholders

create the place holders to hold the data while running the pipeline

In [4]:
import tensorflow.compat.v1 as tf
tf.disable_eager_execution()

x = tf.placeholder(tf.float32,name = 'x', shape = [3,4])

## Define the dataset

We need to define the Dataset where we can populate the value of the placeholder x. We need to use the method `tf.data.Dataset.from_tensor_slices`<br>
<b>from_tensor_slices</b>: This method accepts individual (or multiple) Numpy (or Tensors) objects. In case you are feeding multiple objects, pass them as tuple and make sure that all the objects have same size in zeroth dimension.

In [6]:
dataset = tf.data.Dataset.from_tensor_slices(x)

## Create the pipeline

We need to initialize the pipeline where the data will flow. We need to create an iterator with `make_initializable_iterator`. We name it iterator. Then we need to call this iterator to feed the next batch of data, `get_next`. We name this step get_next. Note that in our example, there is only one batch of data<br>

Tensorflow has provided four types of iterators and each of them has a specific purpose and use-case behind it.

Regardless of the type of iterator, get_next function of iterator is used to create an operation in your Tensorflow graph which when run over a session, returns the values from the fed Dataset of iterator. Also, iterator doesn’t keep track of how many elements are present in the Dataset. Hence, it is normal to keep running the iterator’s get_next operation till Tensorflow’s `tf.errors.OutOfRangeError` exception is occurred.

In [14]:
iterator = tf.data.make_initializable_iterator(dataset)
get_next = iterator.get_next()
print(get_next)

Tensor("IteratorGetNext_2:0", shape=(4,), dtype=float32)


## Execute the Operation

We initiate a session, and we run the operation iterator. We feed the feed_dict with the value generated by numpy. These two value will populate the placeholder x. Then we run get_next to print the result.

In [13]:
with tf.Session() as sess:
    sess.run(iterator.initializer, feed_dict={x:x_input})
    try:
        while True:
            print(sess.run(get_next))
    except tf.errors.OutOfRangeError:
        print('---Finished Execution---')

[0.9001952 0.4253347 0.7876751 0.8894609]
[0.29206842 0.7553205  0.5880109  0.40242627]
[0.8488768  0.523331   0.9410581  0.10045303]
---Finished Execution---
