# Reading Files with TensorFlow

读取文件内容，可以用标准的Python读文件方法，也可以用TensorFlow内置的op。

## Placeholders

最简单的方法是使用标准的python方式读取。把数据保存到placeholder。

先创建计算图，它读取每一行的数据，然后做加法

In [18]:
import tensorflow as tf

filename = "olympics2016.csv"

features = tf.placeholder(tf.int32, shape=[3], name="features")
country = tf.placeholder(tf.string, name="country")
total = tf.reduce_sum(features, name="total")

接下来，我们引入一个新的op，`Print`，用于打印节点的当前值。

In [19]:
printerop = tf.Print(total, [country, features, total], name="printer")

对printerop求值会发生什么？它会打印第二个参数的当前值，并把第一个参数返回。这是一个Variable，所以我们需要初始化。

In [21]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    with open(filename) as inf:
        # 忽略第一行
        next(inf)
        for line in inf:
            # 读取数据, using python
            country_name, code, gold, silver, bronze, total = line.strip().split(',')
            gold = int(gold)
            silver = int(silver)
            bronze = int(bronze)
            
            # Run the Print op
            result = sess.run(printerop, feed_dict={features: [gold, silver, bronze], country: country_name})

#             print(country_name, result, total)

## Reading CSV files in TensorFlow


http://learningtensorflow.com/ReadingFilesBasic/

TensorFlow支持直接把数据读取到tensor中。步骤有点点麻烦，我们一步一步来。

核心思想是创建一个队列(不是Python列表)包含要读取的文件名，然后创建一个reader op来执行读操作。

In [38]:
def create_file_reader_ops(filename_queue):
    reader = tf.TextLineReader(skip_header_lines=1)
    _, csv_row = reader.read(filename_queue)
    record_defaults = [[""], [""], [0], [0], [0], [0]]
    country, code, gold, silver, bronze, total = tf.decode_csv(csv_row, record_defaults=record_defaults)
    
    features = tf.stack([gold, silver, bronze])
    return features, country

注意，reader接受的是队列对象，而不是Python列表，如何创建队列对象呢？

In [39]:
filenames = [filename]
filename_queue = tf.train.string_input_producer(filenames, num_epochs=1, shuffle=False)
example, country = create_file_reader_ops(filename_queue)

上面就是创建图过程，虽然创建好了，但是不能直接run哦，还需要一点额外的工作，这是因为队列对象不同于普通的计算图 op，我们还需要创建 `Coordinator`来管理运行队列。his co-ordinator will increment through the dataset everytime example and label are evaluated, as they effectively pull data from the file.



In [16]:
import tensorflow as tf 

with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    session.run(tf.local_variables_initializer())
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(session, coord=coord)

    while True:
        try:
            example_data, country_name = session.run([example, country])
            print(example_data, country_name)
        except tf.errors.OutOfRangeError:
            break

(array([46, 37, 38], dtype=int32), 'United States')
(array([27, 23, 17], dtype=int32), 'Great Britain')
(array([26, 18, 26], dtype=int32), 'China')
(array([19, 18, 19], dtype=int32), 'Russia')
(array([17, 10, 15], dtype=int32), 'Germany')
(array([12,  8, 21], dtype=int32), 'Japan')
(array([10, 18, 14], dtype=int32), 'France')
(array([9, 3, 9], dtype=int32), 'South Korea')
(array([ 8, 12,  8], dtype=int32), 'Italy')
(array([ 8, 11, 10], dtype=int32), 'Australia')
(array([8, 7, 4], dtype=int32), 'Netherlands')
(array([8, 3, 4], dtype=int32), 'Hungary')
(array([7, 6, 6], dtype=int32), 'Brazil')
(array([7, 4, 6], dtype=int32), 'Spain')
(array([6, 6, 1], dtype=int32), 'Kenya')
(array([6, 3, 2], dtype=int32), 'Jamaica')
(array([5, 3, 2], dtype=int32), 'Croatia')
(array([5, 2, 4], dtype=int32), 'Cuba')
(array([4, 9, 5], dtype=int32), 'New Zealand')
(array([ 4,  3, 15], dtype=int32), 'Canada')
(array([4, 2, 7], dtype=int32), 'Uzbekistan')
(array([3, 5, 9], dtype=int32), 'Kazakhstan')
(array([3