https://tensorflow.google.cn/api_docs/python/tf/data/Dataset

# Source Datasets

The simplest way to create a dataset is to create it from a python list:

In [1]:
import tensorflow as tf

dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3]) 
for element in dataset: 
  print(element) 

tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)
tf.Tensor(3, shape=(), dtype=int32)


To process lines from files, use tf.data.TextLineDataset:

In [2]:
dataset = tf.data.TextLineDataset(["../../data/file1.txt", "../../data/file2.txt"]) 
for element in dataset: 
  print(element) 

tf.Tensor(b'', shape=(), dtype=string)
tf.Tensor(b'', shape=(), dtype=string)
tf.Tensor(b'', shape=(), dtype=string)
tf.Tensor(b'', shape=(), dtype=string)
tf.Tensor(b'', shape=(), dtype=string)
tf.Tensor(b'', shape=(), dtype=string)
tf.Tensor(b'\xe7\xac\xac\xe4\xb8\x80\xe7\xab\xa0 \xe5\xa4\xa9\xe6\xb6\xaf \xe6\x80\x9d\xe5\x90\x9b \xe4\xb8\x8d\xe5\x8f\xaf \xe5\xbf\x98 ', shape=(), dtype=string)
tf.Tensor(b'', shape=(), dtype=string)
tf.Tensor(b'', shape=(), dtype=string)
tf.Tensor(b'', shape=(), dtype=string)
tf.Tensor(b'\xe6\x98\xa5\xe6\xb8\xb8 \xe6\xb5\xa9\xe8\x8d\xa1 \xe6\x98\xaf \xe5\xb9\xb4 \xe5\xb9\xb4 \xe5\xaf\x92\xe9\xa3\x9f \xe6\xa2\xa8\xe8\x8a\xb1 \xe6\x97\xb6\xe8\x8a\x82 \xe7\x99\xbd\xe9\x94\xa6\xe6\x97\xa0\xe7\xba\xb9 \xe9\xa6\x99 \xe7\x83\x82\xe6\xbc\xab \xe7\x8e\x89\xe6\xa0\x91 \xe7\x90\xbc\xe8\x8b\x9e\xe5\xa0\x86 \xe9\x9b\xaa \xe9\x9d\x99\xe5\xa4\x9c \xe6\xb2\x89\xe6\xb2\x89 \xe6\xb5\xae\xe5\x85\x89 \xe9\x9c\xad\xe9\x9c\xad \xe5\x86\xb7\xe6\xb5\xb8 \xe6\xba\xb6\xe6\xba\x

To process records written in the TFRecord format, use TFRecordDataset:

In [3]:
dataset = tf.data.TFRecordDataset(["file1.tfrecords", "file2.tfrecords"]) 
for element in dataset: 
  print(element) 

NotFoundError: file1.tfrecords; No such file or directory [Op:IteratorGetNextSync]

To create a dataset of all files matching a pattern, use tf.data.Dataset.list_files:

In [4]:
dataset = tf.data.Dataset.list_files("../../data/*.txt")  # doctest: +SKIP 
for element in dataset: 
  print(element) 

tf.Tensor(b'../../data/text_corpus.txt', shape=(), dtype=string)
tf.Tensor(b'../../data/file2.txt', shape=(), dtype=string)
tf.Tensor(b'../../data/file1.txt', shape=(), dtype=string)


# Transformations

Once you have a dataset, you can apply transformations to prepare the data for your model:

In [7]:
dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3]) 
dataset = dataset.map(lambda x: x*2) 
for element in dataset: 
  print(element) 
# list(dataset.as_numpy_iterator()) 

tf.Tensor(2, shape=(), dtype=int32)
tf.Tensor(4, shape=(), dtype=int32)
tf.Tensor(6, shape=(), dtype=int32)
