In this practice, we will exercise **'tf.data'**, which is the input pipeline in TensorFlow. This practice aims to help readers to be more friendly in handling data with **'tf.data'**.  

First, let's import python packaages will be used in this practice.

In [1]:
import numpy as np
import tensorflow as tf
tf.__version__

'1.13.1'

We will create dataset with **'tf.data'** module, which will return tensors (mini-batches). 
Let us assume that we want to create 100 one-dimensional data points in range (-5.0, 5.0).

In [2]:
x_range = np.arange(-5.0, 5.0, 0.1)
x_range = list(x_range)

We will create a dataset of including this x-values with **'tf.data.Dataset.from_tensor_slices'**. 

The function **from_tensor_slices** creates a dataset whose elements are slices of the given input tensor. 

In [3]:
x_ds = tf.data.Dataset.from_tensor_slices(x_range)
x_ds

<DatasetV1Adapter shapes: (), types: tf.float32>

To give labels to our model, we should also create a dataset of labels. 
For this purpose, we can use **map** function implemented in **tf.data.Dataset**, which roles same as **map** function in python. 

In [4]:
def map_label(x):
    eps = np.random.normal(0.0,1.0)
    return 2.0*x - 3.0 + eps

In [5]:
y_ds = x_ds.map(map_label)
y_ds

<DatasetV1Adapter shapes: (), types: tf.float32>

When we run the session, **y_ds** will return the label of given x-value \\(y = 2x - 3 + \epsilon \\). 

After preparing the datasets of x-values and lables, zip these two datasets together in one dataset. 

In [6]:
total_ds = tf.data.Dataset.zip((x_ds, y_ds))

We can set the configuration of bathces with simple keywords.
 - dataset.batch(batch_size) : set the batch size of mini-batches
 - datsaet.shuffle(buffer_size) : determine whether to shuffle the dataset.

In [7]:
total_ds = total_ds.batch(10)
total_ds = total_ds.shuffle(buffer_size=100)

After successfully preparing the dataset, create an iterator which will allow sequential assess to Dataset elements.

In [8]:
iterator = total_ds.make_initializable_iterator()
batches = iterator.get_next()

Instructions for updating:
Colocations handled automatically by placer.


In [9]:
batches[0]

<tf.Tensor 'IteratorGetNext:0' shape=(?,) dtype=float32>

In [10]:
batches[1]

<tf.Tensor 'IteratorGetNext:1' shape=(?,) dtype=float32>

We can use those inputs and labels in the **tf.Tensor** format for training your model. 
This enables us to avoid feeding dictionary with **feed_dict**, which isn't officially recommended.

Let's see how the dataset and the iterator work and check the values of mini-batches.
Just run the iterator with **tf.Session**.

In [11]:
with tf.Session() as sess:
    for i in range(3): # number of epoches
        sess.run(iterator.initializer)
        try:
            while True:
                x_b, y_b = sess.run(batches)
                print (x_b)
                print (y_b)
        except tf.errors.OutOfRangeError:
            print ('End of ', (i+1), '-th epoch')

[-1.  -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1]
[-6.2015324 -6.0015326 -5.8015323 -5.6015325 -5.401532  -5.2015324
 -5.001532  -4.8015323 -4.6015325 -4.401532 ]
[-4.  -3.9 -3.8 -3.7 -3.6 -3.5 -3.4 -3.3 -3.2 -3.1]
[-12.201532 -12.001533 -11.801533 -11.601532 -11.401532 -11.201532
 -11.001533 -10.801533 -10.601532 -10.401532]
[1.  1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9]
[-2.2015324  -2.001532   -1.8015321  -1.6015323  -1.4015323  -1.2015322
 -1.0015322  -0.80153215 -0.60153234 -0.4015323 ]
[-2.  -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1]
[-8.201532  -8.001533  -7.8015323 -7.6015325 -7.401532  -7.2015324
 -7.0015326 -6.8015323 -6.6015325 -6.401532 ]
[-1.7763568e-14  1.0000000e-01  2.0000000e-01  3.0000001e-01
  4.0000001e-01  5.0000000e-01  6.0000002e-01  6.9999999e-01
  8.0000001e-01  8.9999998e-01]
[-4.2015324 -4.001532  -3.8015323 -3.6015325 -3.4015322 -3.2015324
 -3.001532  -2.8015323 -2.6015322 -2.4015322]
[-3.  -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3 -2.2 -2.1]
[-10.201532 -10.001533

To split the total dataset into the train and validation sets, we can use **take** and **skip** command for this purpose.

In [12]:
train_ds = total_ds.take(80)
valid_ds = total_ds.skip(20)
train_ds

<DatasetV1Adapter shapes: ((?,), (?,)), types: (tf.float32, tf.float32)>

In [13]:
train_iter = train_ds.make_initializable_iterator()
batches = train_iter.get_next()

with tf.Session() as sess:
    sess.run(train_iter.initializer)
    try:
        while True:
            x_b, y_b = sess.run(batches)
            print (x_b)
            print (y_b)
    except tf.errors.OutOfRangeError:
        print ('End of ', (i+1), '-th epoch')

[1.  1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9]
[-2.2015324  -2.001532   -1.8015321  -1.6015323  -1.4015323  -1.2015322
 -1.0015322  -0.80153215 -0.60153234 -0.4015323 ]
[-3.  -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3 -2.2 -2.1]
[-10.201532 -10.001533  -9.801533  -9.601532  -9.401532  -9.201532
  -9.001533  -8.801532  -8.601532  -8.401532]
[4.  4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9]
[3.7984676 3.9984674 4.1984673 4.398468  4.598468  4.7984676 4.9984674
 5.1984673 5.398468  5.598468 ]
[-5.  -4.9 -4.8 -4.7 -4.6 -4.5 -4.4 -4.3 -4.2 -4.1]
[-14.201532 -14.001533 -13.801533 -13.601532 -13.401532 -13.201532
 -13.001533 -12.801533 -12.601532 -12.401532]
[2.  2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9]
[-2.0153224e-01 -1.5324354e-03  1.9846785e-01  3.9846766e-01
  5.9846795e-01  7.9846776e-01  9.9846756e-01  1.1984679e+00
  1.3984677e+00  1.5984679e+00]
[-1.7763568e-14  1.0000000e-01  2.0000000e-01  3.0000001e-01
  4.0000001e-01  5.0000000e-01  6.0000002e-01  6.9999999e-01
  8.0000001e-01  8.9999998e-01]
[-4.2015324

You can see more details of **tf.data API** in the official presentation:
https://docs.google.com/presentation/d/16kHNtQslt-yuJ3w8GIx-eEH6t_AvFeQOchqGRFpAD7U/edit#slide=id.g254d08e080_0_135