# TensorFlow Queue Runner

- For very large and may files, TensorFlow provides QueueRunner.
- Data will be loaded on-demand by TensorFlow.
- Step
  - Register multiple data files on the Queue runner
  - Read data with Reader
  - Decode data
  - Batch data to train
  - Start Queue runner
  - Train and Inference...
  - Close Queue runner
    
<img src="../reports/QueueRunner.gif" align="center" height=100% width=60%/>

## Imports

In [4]:
import tensorflow as tf

In [5]:
# 1. Register multiple data files
filename_queue = tf.train.string_input_producer( \
    ['data-01-test-score.csv'], shuffle=False, name='filename_queue')

AttributeError: module 'tensorflow_core._api.v2.train' has no attribute 'string_input_producer'

In [None]:
# 2. Read data with reader
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)

In [None]:
# 3. Decode data
# Default values, in case of empty columns.
# Also, specifies the type of the decoded result.
# decode_csv(): because file read is csv format.
record_defaults = [[0.], [0.], [0.], [0.]]
xy = tf.decode_csv(value, record_defaults=record_defaults)


In [None]:

# 4. Batch data
# Assign data to input and answer data
# Collect batched of csv in
train_x_batch, train_y_batch = \
    tf.train.batch([xy[0:-1], xy[-1:]], batch_size=10)

# Placeholders for a tensor that will be always fed.
X = tf.placeholder(tf.float32, shape=[None, 3])
Y = tf.placeholder(tf.float32, shape=[None, 1])

# Weight and bias
W = tf.Variable(tf.random_normal([3, 1]), name="weight")
b = tf.Variable(tf.random_normal([1]), name="bias")

# Hypothesis
hypothesis = tf.matmul(X, W) + b

# Simplified cost funtion
cost = tf.reduce_mean(tf.square(hypothesis - Y))




In [None]:
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)
train = optimizer.minimize(cost)

In [None]:
# Launch the graph in a session
sess = tf.Session()
# Initializes global variables in the graph
sess.run(tf.global_variables_initializer())


In [None]:

# 5. Start Queue runner
# Start populating the filename queue
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)

for step in range(2001):
    # 6. Train
    x_batch, y_batch = sess.run([train_x_batch, train_y_batch])
    cost_val, hy_val, _ = sess.run([cost, hypothesis, train], \
                                   feed_dict={X: x_batch, Y: y_batch})

    if step % 100 == 0:
        print("Trial: {0}, Cost: {1}".format(step, cost_val))



In [None]:
# 7. Stop Queue runner
coord.request_stop()
coord.join(threads)


For multiple files, add file name to the file name list.


In [None]:
filename_queue = tf.train.string_input_producer(\
    ['data-01.csv', 'data-02.csv', ... ],
    suffle=False, name='filename_queue')

If you want to shuffle the batch, you can use shuffle_batch.


In [None]:
# min_after_dequeue defines how big a buffer we will randomly sample
#  from --bigger means better shuffling,
#  but slower start up and more memory used.
# capacity must be larger than min_after_dequeue and the amount larger
#  determines the maximum we will prefetch.
#  Recommendation:
#   min_after_dequeue + (num_threas + a small safetly margin) * batch_size
min_after_deque = 1000
capacity = min_after_dequeue + 3 * batch_size
example_batch, label_batch = tf.train.suffle_batch(\
    [example, label], batch_size=batch_size, capacity=capacity,
    min_after_dequeue=min_after_dequeue)

#### References
- [1] https://kunicom.blogspot.com/2017/06/08-multi-variables-for-linear.html