# ANN Technologies Using TensorFlow 2 #

The canonical way to present data to a TensorFlow ANN, as recommended
by Google, is via a data pipeline composed of a `tf.data.Dataset` object and a
tf.data.Iterator method. A `tf.data.Dataset` object consists of a sequence of
elements in which each element contains one or more tensor objects. The
`tf.data.Iterator` is a method used to loop over a dataset so that successive
individual elements in it may be accessed.


## Usando `NumPy` arrays com datasets##

In [1]:
import tensorflow as tf
import numpy as np

num_items = 11
num_list1 = np.arange(num_items)
num_list2 = np.arange(num_items, num_items**2)

# Criando o dataset
num_list1_dataset = tf.data.Dataset.from_tensor_slices(num_list1)

print(num_list1_dataset)

# Criando o iterador
iterator = tf.compat.v1.data.make_one_shot_iterator(num_list1_dataset)
print(iterator)

# Utilizando o iterador para mostrar o conteúdo do dataset
for item in num_list1_dataset:
    num = iterator.get_next().numpy()
    print(num)
    

<TensorSliceDataset shapes: (), types: tf.int64>
<tensorflow.python.data.ops.iterator_ops.OwnedIterator object at 0x1335b1450>
0
1
2
3
4
5
6
7
8
9
10


In [2]:
# Também é possível acessar os dados em batches
num_list2_dataset = tf.data.Dataset.from_tensor_slices(num_list2)\
    .batch(3, drop_remainder=False)
iterator = tf.compat.v1.data.make_one_shot_iterator(num_list2_dataset)

for item in num_list2_dataset:
    num = iterator.get_next().numpy()
    print(num)

[11 12 13]
[14 15 16]
[17 18 19]
[20 21 22]
[23 24 25]
[26 27 28]
[29 30 31]
[32 33 34]
[35 36 37]
[38 39 40]
[41 42 43]
[44 45 46]
[47 48 49]
[50 51 52]
[53 54 55]
[56 57 58]
[59 60 61]
[62 63 64]
[65 66 67]
[68 69 70]
[71 72 73]
[74 75 76]
[77 78 79]
[80 81 82]
[83 84 85]
[86 87 88]
[89 90 91]
[92 93 94]
[95 96 97]
[ 98  99 100]
[101 102 103]
[104 105 106]
[107 108 109]
[110 111 112]
[113 114 115]
[116 117 118]
[119 120]


Usando a função `zip`

In [3]:
dataset1 = [1,2,3,4,5]
dataset2 = ['a', 'e', 'i', 'o', 'u']
dataset1 = tf.data.Dataset.from_tensor_slices(dataset1)
dataset2 = tf.data.Dataset.from_tensor_slices(dataset2)

zipped_datasets = tf.data.Dataset.zip((dataset1, dataset2))
iterator = tf.compat.v1.data.make_one_shot_iterator(zipped_datasets)

for item in zipped_datasets:
    num = iterator.get_next()[0].numpy()
    print(num)


1
2
3
4
5


Pode-se concatenar 2 datasets com `concatenate`

In [4]:
ds1 = tf.data.Dataset.from_tensor_slices([1,2,3,5,7,11,13,17])
ds2 = tf.data.Dataset.from_tensor_slices([19,23,29,31,37,41])
ds3 = ds1.concatenate(ds2)

for i in ds3:
    print(i.numpy())

# Agora usando o iterator
iterator = tf.compat.v1.data.make_one_shot_iterator(ds3)
for i in range(14):
    num = iterator.get_next().numpy()
    print(num)

1
2
3
5
7
11
13
17
19
23
29
31
37
41
1
2
3
5
7
11
13
17
19
23
29
31
37
41


Outro exemplo

In [5]:
epochs = 2
for e in range(epochs):
    for item in ds3:
        print(item.numpy())

1
2
3
5
7
11
13
17
19
23
29
31
37
41
1
2
3
5
7
11
13
17
19
23
29
31
37
41


## Usando arquivos CSV##
`tf.data.experimental.CsvDataset`

In [10]:
filename = ['./Pasta1.csv']
record_defaults = [tf.string, tf.int32]
dataset_csv = tf.data.experimental.CsvDataset(
    filename,
    record_defaults,
    header=True,
    select_cols=[0,1]
)

for item in dataset_csv:
    print(item[0].numpy().decode('UTF-8'))
    print(item[1].numpy())

a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z


### One-hot Encoding ###
Quando um tensor é construído por labels tendo 1 para o valor correspondente.
Por exemplo, para codificar o 5 com 10 dígitos temos:

0  1  2  3  4  5  6  7  8  9 10

0  0  0  0  0  1  0  0  0  0  0

In [2]:
y = 5
y_train_ohe = tf.one_hot(y, depth=10).numpy()
print(y_train_ohe)


[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
