# Create a dataset using TensorFlow

In [2]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf

### Yield
We should use yield when we want to iterate over a sequence, but don't want to store the entire sequence in memory. Yield are used in Python generators. A generator function is defined like a normal function, but whenever it needs to generate a value, it does so with the yield keyword rather than return.

In [12]:
# Generator function
def gen_func():
    x = 12
    while x < 20:
        yield x
        x+=2

# Step 1: Create a dataset/iterator from a range of values
data1= tf.data.Dataset.range(4)
iter1 = data1.make_one_shot_iterator()

# Step 2: Create a dataset/iterator from two tensors
t1 = tf.constant([4, 5])
t2 = tf.constant([6, 7])
ds2 = tf.data.Dataset.from_tensors([t1, t2])
iter2 = ds2.make_one_shot_iterator()

# Step 3: Create a dataset/iterator ffrom rows of a tensor
t3 = tf.constant([[8], [9], [10], [11]])
ds3 = tf.data.Dataset.from_tensor_slices(t3)
iter3 = ds3.make_one_shot_iterator()

# Step 4: Create a dataset/iterator from a generator function
ds4 = tf.data.Dataset.from_generator(gen_func, output_types=tf.int64)
iter4 = ds4.make_one_shot_iterator()

# Step 5: Print the elements of each dataset
with tf.Session() as sess:
    for _ in range(4):
        print(sess.run(iter1.get_next()))
    print(sess.run(iter2.get_next()))
    
    for _ in range(4):
        print(sess.run(iter3.get_next()))
    print(sess.run(iter4.get_next()))

sess.close()

Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, use
    tf.py_function, which takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    
0
1
2
3
[[4 5]
 [6 7]]
[8]
[9]
[10]
[11]
12


In [4]:
data1

<DatasetV1Adapter shapes: (), types: tf.int64>

Create this session by calling tf.Session, and I'll access it as sess. Inside the session, I'll perform a series of print operations to display the dataset's results. 

The first dataset contains four values, so I'll create a for loop that will iterate four times. Inside the loop, I'll call print, and sess.run with the first iterator calling it's get_next method. As I mentioned earlier, get_next returns each value of the iterator, which in turn accesses the values of the corresponding dataset. 

The second dataset contains a single value, so I'll call print and sess.run with iter2.get_next. 

The third dataset contains four elements, so I can use similar code as I used for the first dataset. In this case, I'll just set sess.run to iter3.get_next. And this will print the contents of the third dataset. 

And the last dataset also produces four results, because the generator function produces four values. So I'll create the for loop. And inside the print statement, I'll call sess.run with iter4.get_next. 

When I execute the module it prints the results of each of the different datasets. 

1. In this case, zero through three corresponds to the content of the first dataset, which I simply created from a range. 

2. The second dataset contains a single value containing the content of two tensors. 

3. The third value contains each row of a particular tensor. 

4. And the last dataset contains the four values produced by the generator function, gen_func. This video has demonstrated how datasets and iterators can be used in code. 

In [5]:
iter1

<tensorflow.python.data.ops.iterator_ops.Iterator at 0x253d7b63dd8>

In [6]:
t1

<tf.Tensor 'Const:0' shape=(2,) dtype=int32>

In [7]:
t2

<tf.Tensor 'Const_1:0' shape=(2,) dtype=int32>

In [8]:
ds2

<DatasetV1Adapter shapes: (2, 2), types: tf.int32>

In [9]:
iter2

<tensorflow.python.data.ops.iterator_ops.Iterator at 0x253d7b7ae48>

In [14]:
ds3

<DatasetV1Adapter shapes: (1,), types: tf.int32>

In [15]:
iter3

<tensorflow.python.data.ops.iterator_ops.Iterator at 0x253d5b11c18>

In [16]:
ds4

<DatasetV1Adapter shapes: <unknown>, types: tf.int64>

In [17]:
iter4

<tensorflow.python.data.ops.iterator_ops.Iterator at 0x253d7ba1978>

--------