2.5.1 implementando o primeiro exemplo

implementar a classe NaiveDense que cria duas variaveis TensorFlow, w e b, e declara a função call que aplica a função de ativação e retorna a saída

Naive Dense = camada densa

In [39]:
import tensorflow as tf

class NaiveDense:
    def __init__(self, input_size, output_size, activation):
        self.activation = activation

        #cria uma matriz W com shape (input_size, output_size)
        # ele tem esse shape porque a camada espera que para cada
        # quantidade de inputs (input_size) ele vai produzir n outputs
        # com valores aleatorios
        w_shape = (input_size, output_size)
        w_initial_value = tf.random.uniform(w_shape,
                                            minval=0,
                                            maxval=1e-1)
        self.W = tf.Variable(w_initial_value)

        #cria um vetor b com shape (output_size) com valores
        # iniciais zero
        b_shape = (output_size)
        b_initial_value = tf.zeros(b_shape)
        self.b = tf.Variable(b_initial_value)

    def __call__(self, inputs):
        #aplica o forward pass
        return self.activation(tf.matmul(inputs, self.W) + self.b)

    #método conveniente para chamar os pesos da camada
    @property
    def weights(self):
        return [self.W, self.b]

A simple Sequential class


Criar uma classe NaiveSequential para encadear as camadas. Ela envolve uma lista de camadas e declara a __call__() que chama as camadas subjacentes (em baixo) nos inputs. tambem tem a propriedade weights que mantem record dos parametros dessas camdas.


In [38]:
class NaiveSequential:
    def __init__(self, layers):
        self.layers = layers

    def __call__(self, inputs):
        x = inputs
        for layer in self.layers:
            x = layer(x)
        return x

    @property
    def weights(self):
        weights = []
        for layer in self.layers:
            weights += layer.weights
        return weights

Com essas duas classes podemos criar um mock de um Keras model:

In [37]:
model = NaiveSequential([
    NaiveDense(input_size=28 * 28, output_size=512, activation=tf.nn.relu),
    NaiveDense(input_size=512, output_size=10, activation=tf.nn.softmax)
])
#modelo deve ter 4 pesos: 2 matrizes de w e 2 vetores de b
assert len(model.weights) == 4

agora usaremos o MNIST para obter nossos lotes de dados e testar o modelo, iterando sobre os mini-lotes (mini-batches)

In [36]:
import math

class BatchGenerator:
    def __init__(self, images, labels, batch_size):
        assert len(images) == len(labels)
        self.index = 0
        self.images = images
        self.labels = labels
        self.batch_size = batch_size
        self.num_batches = math.ceil(len(images) / batch_size)

    def next(self):
        images = self.images[self.index : self.index + self.batch_size]
        labels = self.labels[self.index : self.index + self.batch_size]
        self.index += self.batch_size
        return images, labels

2.5.2 running one training step

- 1 Compute the predictions of the model for the images in the batch. 
- 2 Compute the loss value for these predictions, given the actual labels. 
- 3 Compute the gradient of the loss with regard to the model’s weights.
- 4 Move the weights by a small amount in the direction opposite to the gradient.

In [35]:
learning_rate = 1e-3
def update_weights(gradients, weights):
    for g, w in zip(gradients, weights):
        #atualiza os pesos com o gradiente
        #o learning rate é um hiperparâmetro que controla a magnitude da atualização
        w.assign_sub(learning_rate * g) #assign_sub = '-=' do tensorflow

def one_training_step(model, images_batch, labels_batch):
    #roda o forward pass (computa as predições do modelo
    # dentro do scope do GradientTape)
    with tf.GradientTape() as tape:
        predictions = model(images_batch)
        per_sample_losses = tf.keras.losses.sparse_categorical_crossentropy(
            labels_batch, predictions
        )
        average_loss = tf.reduce_mean(per_sample_losses)
    #computa o gradiente da perda com relação aos pesos. os gradientes de saida
    #sao uma lista onde cada entry correpsonde ao peso do model.weights list
    gradients = tape.gradient(average_loss, model.weights)
    update_weights(gradients, model.weights)
    return average_loss


na prática, o update step é realizado com o Optmizer do keras, não é necessário fazer na mão

In [34]:
from tensorflow.keras import optimizers

optimizer = optimizers.SGD(learning_rate=1e-3)

# def update_weights(gradients, weights):
#     optimizer.apply_gradients(zip(gradients, weights))

2.5.3 FULL TRAINING LOOP


In [27]:
def fit(model, images, labels, epochs, batch_size=128):
    for epoch_counter in range(epochs):
        print(f"Epoch {epoch_counter}")
        batch_generator = BatchGenerator(images, labels, batch_size)
        for batch_counter in range(batch_generator.num_batches):
            images_batch, labels_batch = batch_generator.next()
            loss = one_training_step(model, images_batch, labels_batch)
            if batch_counter % 100 == 0:
                print(f"Batch {batch_counter}, loss: {loss.numpy()}")



In [40]:
from tensorflow.keras.datasets import mnist
#carrega o dataset MNIST
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
#reshape para 2D, cada imagem é um vetor de 28*28 = 784 pixels
train_images = train_images.reshape((60000, 28 * 28))
test_images = test_images.reshape((10000, 28 * 28))
#normaliza os valores dos pixels para o intervalo [0, 1]
train_images = train_images.astype("float32") / 255
test_images = test_images.astype("float32") / 255

fit(model,train_images, train_labels, epochs=5, batch_size=128)


Epoch 0
Batch 0, loss: 8.800809860229492
Batch 100, loss: 2.272515296936035
Batch 200, loss: 2.1893208026885986
Batch 300, loss: 2.112562656402588
Batch 400, loss: 2.2069334983825684
Epoch 1
Batch 0, loss: 1.918933391571045
Batch 100, loss: 1.907104253768921
Batch 200, loss: 1.808807611465454
Batch 300, loss: 1.7170875072479248
Batch 400, loss: 1.8184126615524292
Epoch 2
Batch 0, loss: 1.5861531496047974
Batch 100, loss: 1.605689525604248
Batch 200, loss: 1.492302656173706
Batch 300, loss: 1.4303542375564575
Batch 400, loss: 1.5052928924560547
Epoch 3
Batch 0, loss: 1.324166178703308
Batch 100, loss: 1.3665997982025146
Batch 200, loss: 1.234771966934204
Batch 300, loss: 1.2150312662124634
Batch 400, loss: 1.2753760814666748
Epoch 4
Batch 0, loss: 1.1244912147521973
Batch 100, loss: 1.1831228733062744
Batch 200, loss: 1.0375418663024902
Batch 300, loss: 1.0546451807022095
Batch 400, loss: 1.1107895374298096


podemos avaliar o moddelo pegando o argmax das predições sobre as imagens de teste e comparando com as labels esperadas

In [None]:
predictions = model(test_images)
predictions = predictions.numpy()
predicted_labels = tf.argmax(predictions, axis=1)
matches = predicted_labels == test_labels
accuracy = tf.reduce_mean(tf.cast(matches, tf.float32))
print(f"Test set accuracy: {accuracy.numpy()}")