Neste tutorial vamos cosntruir um rede multilayer perceptron para classificar a base MNIST, que já utilizamos nos exemplos anteriores.

<h3>MLP em alto nível </h3>

A forma mais simples de construir uma rede neural com tensorflow é utilizando-se a API de alto nível TFLearn, similar ao Scikit-learn. A classe DNNClassifier abstrai todo o processo de criação e execuçãp das camadas da red.
Suponha que desejamos uma rede com 2 camadas, tendo 300 neurônios na primeira e 100 na segunda:

In [3]:
import tensorflow as tf
import numpy as np
import pandas as pd

In [109]:
from sklearn.datasets import fetch_mldata
from sklearn.preprocessing import StandardScaler

scalar = StandardScaler()
mnist = fetch_mldata('MNIST original')
X, y = mnist["data"], mnist["target"]
X = scalar.fit_transform(X)
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]



In [110]:
feature_columns = [tf.feature_column.numeric_column('x', shape=[28*28])]
dnn_clf = tf.estimator.DNNClassifier(feature_columns=feature_columns, hidden_units=[300,100], n_classes=10)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/var/folders/bd/jbv_nt291b5c0hbqv43mynnc0000gn/T/tmp36w3kr19', '_tf_random_seed': 1, '_save_summary_steps': 100, '_save_checkpoints_secs': 600, '_save_checkpoints_steps': None, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100}


Para treinarmos nosso classificador precisamos criar uma input_function, que será a função a qual definirá como os dados servirão o modelo. Nesta função podemos definir pre-processamento dos dados, criar os batches que desejamos e embaralharmos os dados, caso desejado.

In [126]:
train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x = {'x': X_train},
    y = y_train.astype(np.int32),
    num_epochs=None,
    batch_size=50,
    shuffle=True
)

In [127]:
dnn_clf.train(input_fn=train_input_fn, steps=40000)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from /var/folders/bd/jbv_nt291b5c0hbqv43mynnc0000gn/T/tmp36w3kr19/model.ckpt-4600
INFO:tensorflow:Saving checkpoints for 4601 into /var/folders/bd/jbv_nt291b5c0hbqv43mynnc0000gn/T/tmp36w3kr19/model.ckpt.
INFO:tensorflow:loss = 251.216, step = 4601
INFO:tensorflow:global_step/sec: 157.378
INFO:tensorflow:loss = 0.408161, step = 4701 (0.638 sec)
INFO:tensorflow:global_step/sec: 155.34
INFO:tensorflow:loss = 0.286941, step = 4801 (0.643 sec)
INFO:tensorflow:global_step/sec: 155.482
INFO:tensorflow:loss = 0.0410905, step = 4901 (0.643 sec)
INFO:tensorflow:global_step/sec: 187.895
INFO:tensorflow:loss = 0.0251613, step = 5001 (0.532 sec)
INFO:tensorflow:global_step/sec: 155.213
INFO:tensorflow:loss = 3.80276e-05, step = 5101 (0.644 sec)
INFO:tensorflow:global_step/sec: 181.656
INFO:tensorflow:loss = 0.00118507, step = 5201 (0.551 sec)
INFO:tensorflow:global_step/sec: 156.124
INFO:tensorflow:loss = 3.33943, step

<tensorflow.python.estimator.canned.dnn.DNNClassifier at 0x1228ae978>

In [128]:
evaluate_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": X_test},
    y=y_test.astype(np.int32),
    num_epochs=1,
    shuffle=False
)

In [129]:
dnn_clf.evaluate(input_fn=evaluate_input_fn)

INFO:tensorflow:Starting evaluation at 2018-07-03-00:11:21
INFO:tensorflow:Restoring parameters from /var/folders/bd/jbv_nt291b5c0hbqv43mynnc0000gn/T/tmp36w3kr19/model.ckpt-104600
INFO:tensorflow:Finished evaluation at 2018-07-03-00:11:22
INFO:tensorflow:Saving dict for global step 104600: accuracy = 0.9675, average_loss = 0.267023, global_step = 104600, loss = 33.8003


{'accuracy': 0.96749997,
 'average_loss': 0.26702258,
 'global_step': 104600,
 'loss': 33.800327}

<h4>MLP Customizada</h4>

Se quisermos maior controle na arquitetura de nossa rede neural, podemos utilizar a API de mais baixo nível do tensorflow. Vamos implementar um MLP construindo seu grafo no tensorflow.

In [131]:
n_inputs = 28*28
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10

In [132]:
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name='X')
y = tf.placeholder(tf.int64, shape=(None), name='y')

A fim de facilitar a adição de cada nova camada na rede, vamos definir uma função que será responsável por criar estas camadas.

In [134]:
def neuron_layer(X, n_neurons, name, activation=None):
    with tf.name_scope(name):
        n_inputs = int(X.get_shape()[1])
        stddev = 2/np.sqrt(n_inputs)
        init = tf.truncated_normal((n_inputs, n_neurons), stddev=stddev)
        W = tf.Variable(init, name='weights')
        b = tf.Variable(tf.zeros([n_neurons]), name='biases') #bias weights to next layer
        z = tf.matmul(X, W) + b
        if activation == 'relu':
            return tf.nn.relu(z)
        else:
            return z

Para iniciar os pesos da rede, utilizamos uma disribição normal truncada, com desvio padrão 2/$\sqrt{n_{inputs}}$, o que melhora a eficiência da rede.

Note que a camada de ativação utilizada é relu, ou linear. Quando linear, ela será os logits da softmax.

In [135]:
with tf.name_scope("dnn"):
    hidden1 = neuron_layer(X, n_hidden1, "hidden1", activation="relu")
    hidden2 = neuron_layer(hidden1, n_hidden2, "hidden2", activation="relu")
    logits = neuron_layer(hidden2, n_outputs, "outputs")

O tensorflow possui diversas funções para criarmos camadas para redes neurais, assim nem sempre precisamos definir nossas próprias funções. Ao utilizarmos a funcao fully_connected() da biblioteca tensorflow.contrib.layers, podemos usar a mesma estritura acima, apenas substituindo neuron_layer() por fully_connected(). Para fins didáticos, continuaremos com o nosso método.

Agora que jé temos toda a estrutura de nossa rede, devemos definir a função de custo que iremos utilizar para treiná-la. Como partimos de um problema de classificação, vamos utilizar a entropia cruzada (cross entropy). Para isso, utilizamos o sparse_softmax_cross_entropy_with_logits(), softmax pois trata-se de um problmea multiclasse. Realizamos os cálculos com logits ao invés dos valores calculados na softmax a fim de evitar problemas matemáticos do tipo log(0), o que permite calcular a entropia cruzada mesmo nesses casos, pois este cálculo é feito antes de tirarmos o log.

In [136]:
with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
        labels=y, logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")

Definindo o método de otimização

In [137]:
learning_rate = 0.01

with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

A útlima etapa para a construção do nosso modelo é definir o método de avaliação. Vamos utilizar a acurácia.

In [176]:
with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
    accuracy_train_summary = tf.summary.scalar('accuracy_train', accuracy)
    accuracy_eval_summary = tf.summary.scalar('accuracy_eval', accuracy)

In [152]:
init = tf.global_variables_initializer()

Caso queiramos salvar o modelo, podemos criar um saver?

In [153]:
saver = tf.train.Saver()

<h5>Execução</h5>

In [141]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("tmp/data")

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting tmp/data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting tmp/data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting tmp/data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting tmp/data/t10k-labels-idx1-ubyte.gz


In [145]:
logdir = "tf_logs"
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

In [178]:
n_epochs = 100
batch_size = 50

In [179]:
with tf.Session() as sess:
    init.run()
    summary_writer = tf.summary.FileWriter(logdir, sess.graph)
    for epoch in range(n_epochs):
        for iteration in range(mnist.train.num_examples // batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
            
        acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
        acc_test = accuracy.eval(feed_dict={X: mnist.test.images,
                                            y: mnist.test.labels})
        summary_train = accuracy_train_summary.eval(feed_dict={X: X_batch, y: y_batch})
        file_writer.add_summary(summary_train, epoch)
        summary_eval = accuracy_eval_summary.eval(feed_dict={X: mnist.test.images,
                                            y: mnist.test.labels})
        file_writer.add_summary(summary_eval, epoch)

        print(epoch, 'Train accuracy:', acc_train, 'Test accuracy:', acc_test)
    save_path = saver.save(sess, "./my_model_final.cktp")

0 Train accuracy: 0.9 Test accuracy: 0.9122
1 Train accuracy: 0.9 Test accuracy: 0.9282
2 Train accuracy: 0.96 Test accuracy: 0.9373
3 Train accuracy: 0.9 Test accuracy: 0.9442
4 Train accuracy: 0.96 Test accuracy: 0.95
5 Train accuracy: 0.96 Test accuracy: 0.9512
6 Train accuracy: 0.94 Test accuracy: 0.9546
7 Train accuracy: 0.98 Test accuracy: 0.958
8 Train accuracy: 0.94 Test accuracy: 0.9596
9 Train accuracy: 1.0 Test accuracy: 0.9605
10 Train accuracy: 1.0 Test accuracy: 0.9626
11 Train accuracy: 1.0 Test accuracy: 0.9648
12 Train accuracy: 0.98 Test accuracy: 0.9667
13 Train accuracy: 1.0 Test accuracy: 0.968
14 Train accuracy: 0.98 Test accuracy: 0.9685
15 Train accuracy: 1.0 Test accuracy: 0.9694
16 Train accuracy: 0.98 Test accuracy: 0.9702
17 Train accuracy: 0.98 Test accuracy: 0.9703
18 Train accuracy: 0.98 Test accuracy: 0.9705
19 Train accuracy: 0.94 Test accuracy: 0.9712
20 Train accuracy: 1.0 Test accuracy: 0.9728
21 Train accuracy: 0.94 Test accuracy: 0.9737
22 Train ac

<h3>Batch Normalization implementation </h3>

In [41]:
import tensorflow as tf
from tensorflow.contrib.layers import fully_connected
from tensorflow.contrib.layers import batch_norm

In [42]:
n_inputs = 28*28
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10

In [43]:
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")

In [44]:
is_training = tf.placeholder(tf.bool, shape=(), name='is_training')
bn_params = {
    'is_training': is_training,
    'decay': 0.99,
    'updates_collections': None
}

In [45]:
tf.reset_default_graph() 

hidden1 = fully_connected(X, n_hidden1, scope='hidden1', 
                          normalizer_fn=batch_norm, normalizer_params=bn_params)

hidden2 = fully_connected(hidden1, n_hidden2, scope='hidden2', 
                          normalizer_fn=batch_norm, normalizer_params=bn_params)

logits = fully_connected(hidden2, n_outputs, activation_fn=None, scope='outputs',
                          normalizer_fn=batch_norm, normalizer_params=bn_params)

Uma forma menos entediante de criarmos essas camadas é utilizando o arg_scope()

In [48]:
tf.reset_default_graph() 

with tf.contrib.framework.arg_scope(
    [fully_connected],
    normalizer_fn=batch_norm,
    normalizer_params=bn_params, reuse=True):
    hidden1 = fully_connected(X, n_hidden1, scope='hidden1')
    hidden2 = fully_connected(hidden1, n_hidden2, scope='hidden2')
    logits = fully_connected(hidden2, n_outputs, scope='outputs', activation_fn=None)

O restante do processo de construção da rede continua identico ao processo anterior, porém a execução tem uma leve alteração. Durante o treinamento, o placeholder is_training deve ser setado como True, e como False para teste/predicao: </br>
sess.run(training_op, feed_dict={is_training: True, X: X_batch, y: y_batch})