<a href="https://colab.research.google.com/github/ProfAI/tf00/blob/master/3%20-%20Funzioni%20di%20costo%2C%20attivazione%20e%20ottimizzazione/learningrate_momentum_rmsprop.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Learning Rate e Momentum
In questo notebook vedremo come utilizzare Learning Rate e Momentum per migliorare la fase di addestramento di una Rete Neurale

## Importiamo i Moduli

In [None]:
import numpy as np

import tensorflow as tf
import tensorflow_datasets as tfds

from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

from time import time

## Prepariamo i Dati

In [None]:
def load_data(dataset, num_samples=None):
  images = []
  labels = []

  for i, example in enumerate(tfds.as_numpy(dataset)):

    if(num_samples!=None and i>=num_samples):
      break

    images.append(example["image"])
    labels.append(example["label"])

  images = np.array(images)
  labels = np.array(labels)

  return images, labels


dataset = tfds.load('fashion_mnist', split='train', shuffle_files=True)
images, labels = load_data(dataset)
num_classes = np.unique(labels).shape[0]

X = images.reshape(images.shape[0], images.shape[1]*images.shape[2])
y = tf.one_hot(labels, num_classes).numpy()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

X_train = X_train/255
X_test = X_test/255

[1mDownloading and preparing dataset fashion_mnist/3.0.0 (download: 29.45 MiB, generated: Unknown size, total: 29.45 MiB) to /root/tensorflow_datasets/fashion_mnist/3.0.0...[0m


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Dl Completed...', max=1.0, style=Progre…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Dl Size...', max=1.0, style=ProgressSty…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Extraction completed...', max=1.0, styl…









HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Shuffling and writing examples to /root/tensorflow_datasets/fashion_mnist/3.0.0.incompleteZRAIMQ/fashion_mnist-train.tfrecord


HBox(children=(FloatProgress(value=0.0, max=60000.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Shuffling and writing examples to /root/tensorflow_datasets/fashion_mnist/3.0.0.incompleteZRAIMQ/fashion_mnist-test.tfrecord


HBox(children=(FloatProgress(value=0.0, max=10000.0), HTML(value='')))

[1mDataset fashion_mnist downloaded and prepared to /root/tensorflow_datasets/fashion_mnist/3.0.0. Subsequent calls will reuse this data.[0m


## Creiamo la Rete Neurale
Definiamo due funzioni, una per definire l'architettura delle Rete e un'altra per creare un grafico della Log Loss.

In [None]:
def build_model():
  model = tf.keras.models.Sequential()
  model.add(tf.keras.layers.Dense(512, activation='relu', input_shape=[X_train.shape[1]]))
  model.add(tf.keras.layers.Dense(256, activation='relu'))
  model.add(tf.keras.layers.Dense(128, activation='relu'))
  model.add(tf.keras.layers.Dense(num_classes, activation='softmax'))

  return model


def plot_loss_chart(title, figsize=(14,10), validation=True):

  plt.figure(figsize=figsize)
  plt.title(title)
  plt.xlabel("Epoca")
  plt.ylabel("Log-Loss")
  
  plt.plot(model.history.history['loss'], label="Training Loss")

  if(validation):
    plt.plot(model.history.history['val_loss'], label="Validation Loss")

  plt.legend()

## Learning Rate
Il Learning Rate ci permette di controllare la dimensione di ogni step del Gradient Descent. Per  impostare il learning rate possiamo definire l'algoritmo di ottimizzazione tramite una delle apposite classi di tf.keras e utilizzare il parametro *learning_rate*.

#### Learning Rate troppo piccolo
Un valore del Learning Rate troppo piccolo può rendere eccessivamente piccoli gli step del Gradient Descent, rallentando la fase di addestramento in maniera eccessiva.

In [None]:
model = build_model()

sgd = tf.keras.optimizers.SGD(learning_rate=0.001)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

model.fit(X_train, y_train, batch_size=X_train.shape[0],
          validation_data=(X_test, y_test), validation_batch_size = X_test.shape[0], 
          epochs=10, callbacks=[tf.keras.callbacks.History()])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f3612648048>

### Learning Rate troppo grande
Un valore troppo grande del Learning Rate rende gli step del Gradient Descent troppo larghi, facendo divergere il modello.

In [None]:
model = build_model()

sgd = tf.keras.optimizers.SGD(learning_rate=1)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

model.fit(X_train, y_train, batch_size=X_train.shape[0],
          validation_data=(X_test, y_test), validation_batch_size = X_test.shape[0], 
          epochs=10, callbacks=[tf.keras.callbacks.History()])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f36124564a8>

### Learning Rate ottimale
Il valore ottimale del Learning Rate varia da problema a problema e andrebbe cercato in una scala logaritmica che va da 0.0001 a 10 (0.0001, 0.001, 0.01, 0.1, 1, 10).

In [None]:
model = build_model()

sgd = tf.keras.optimizers.SGD(learning_rate=0.1)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

model.fit(X_train, y_train, batch_size=X_train.shape[0],
          validation_data=(X_test, y_test), validation_batch_size = X_test.shape[0], 
          epochs=10, callbacks=[tf.keras.callbacks.History()])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f361223f400>

## Learning Rate Dinamico
Una scelta intelligente è quella di far variare il valore del Learning Rate durante l'addestramento, rendendolo più piccolo man mano che il modello inizia a convergere. Questa tecnica è conosciuta come **Learning Rate Decay**. Esistono diverse tecniche per implementare il Learning Rate Decay, vediamone due.

### Exponential Decay
Questa tecnica riduce il Learning Rate in maniera esponenziale utilizzando il seguente algoritmo:
<br><br>
*initial_learning_rate * decay_rate ^ (step / decay_steps)*
<br><br>


In [None]:
model = build_model()

lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate=0.1,
    decay_steps=10000,
    decay_rate=0.9)

sgd = tf.keras.optimizers.SGD(learning_rate=lr_schedule)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

model.fit(X_train, y_train, batch_size=X_train.shape[0],
          validation_data=(X_test, y_test), validation_batch_size = X_test.shape[0], 
          epochs=10, callbacks=[tf.keras.callbacks.History()])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f36041e90f0>

### Inverse Time Decay
Un'alternativa all'Exponential Decay è l'Inverse Time Decay, che sistema il valore del Learning Rate utilizzando il seguente algoritmo:
<br><br>
*initial_learning_rate / (1 + decay_rate * step / decay_step)*
<br><br>

In [None]:
model = build_model()

lr_schedule = tf.keras.optimizers.schedules.InverseTimeDecay(
    initial_learning_rate=0.1,
    decay_steps=10000,
    decay_rate=0.9)

sgd = tf.keras.optimizers.SGD(learning_rate=lr_schedule)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

model.fit(X_train, y_train, batch_size=X_train.shape[0],
          validation_data=(X_test, y_test), validation_batch_size = X_test.shape[0], 
          epochs=10, callbacks=[tf.keras.callbacks.History()])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f36041784a8>

## RMSProp e ADAM

In [None]:
model = build_model()

model.compile(loss='categorical_crossentropy', optimizer="rmsprop", metrics=['accuracy'])

model.fit(X_train, y_train, batch_size=X_train.shape[0],
          validation_data=(X_test, y_test), validation_batch_size = X_test.shape[0], 
          epochs=10, callbacks=[tf.keras.callbacks.History()])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f35d0270f98>

In [None]:
model = build_model()

model.compile(loss='categorical_crossentropy', optimizer="adam", metrics=['accuracy'])

model.fit(X_train, y_train, batch_size=X_train.shape[0],
          validation_data=(X_test, y_test), validation_batch_size = X_test.shape[0], 
          epochs=10, callbacks=[tf.keras.callbacks.History()])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f35d077a7f0>

In [None]:
model = build_model()

lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate=0.1,
    decay_steps=10000,
    decay_rate=0.9)

adam = tf.keras.optimizers.Adam(learning_rate=lr_schedule)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])

model.fit(X_train, y_train, batch_size=X_train.shape[0],
          validation_data=(X_test, y_test), validation_batch_size = X_test.shape[0], 
          epochs=10, callbacks=[tf.keras.callbacks.History()])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f35d0484240>