<a href="https://colab.research.google.com/github/Toni-Navarro/deep-learning/blob/master/Keras/Train%20and%20Evaluate.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Train and evaluate

In [3]:
%tensorflow_version 2.x

TensorFlow 2.x selected.


In [0]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import optimizers
from tensorflow.keras import losses

Let's build a simple model which will be trained with MNIST dataset

In [0]:
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)

model = keras.Model(inputs=inputs, outputs=outputs)

Let's load the data

In [0]:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Preprocess the data (these are Numpy arrays)
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255

y_train = y_train.astype('float32')
y_test = y_test.astype('float32')

# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]

In training process, there are two mandatory arguments, optimizer and loss function

In [0]:
# pass optimizer by name: default parameters will be used
model.compile(loss='mean_squared_error', optimizer='sgd')

#or

sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer=sgd)

## [Keras optimizers](https://keras.io/optimizers/)

The parameters clipnorm and clipvalue can be used with all optimizers to control [gradient clipping](https://deepai.org/machine-learning-glossary-and-terms/gradient-clipping)

### Base class

In [0]:
#this is base class for building optimizers, not an actual optimizer that can be used for training models.

#keras.optimizers.Optimizer(**kwargs)

### [SGD](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/SGD) Stochastic gradient descent 

In [0]:
keras.optimizers.SGD(lr=0.01, momentum=0., decay=0., nesterov=False)

###[Adagrad](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adagrad)

In [0]:
keras.optimizers.Adagrad(lr=0.01, epsilon=1e-08, decay=0.0)

###[Adadelta](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adadelta)

In [0]:
keras.optimizers.Adadelta(lr=1.0, rho=0.95, epsilon=1e-08, decay=0.0)

###[Adam](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam)

In [0]:
keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

###[Adamax](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adamax)

In [0]:
keras.optimizers.Adamax(lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

###[Ftrl](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Ftrl)

In [0]:
tf.keras.optimizers.Ftrl(
    learning_rate=0.001, learning_rate_power=-0.5, initial_accumulator_value=0.1,
    l1_regularization_strength=0.0, l2_regularization_strength=0.0, name='Ftrl',
    l2_shrinkage_regularization_strength=0.0, **kwargs
)

###[Nadam](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Nadam)

In [0]:
keras.optimizers.Nadam(lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=1e-08, schedule_decay=0.004)

###[RMSprop](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/RMSprop)

In [0]:
keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)

## [Loss functions](http://faroit.com/keras-docs/2.0.5/losses/)

In [0]:
#mean_squared_error
keras.losses.mean_squared_error(y_true, y_pred)

In [0]:
#mean_absolute_error
keras.losses.mean_absolute_error(y_true, y_pred)

In [0]:
#mean_absolute_percentage_error
keras.losses.mean_squared_logarithmic_error(y_true, y_pred)

In [0]:
#mean_squared_logarithmic_error
mean_squared_logarithmic_error(y_true, y_pred)

In [0]:
#squared_hinge
keras.losses.squared_hinge(y_true, y_pred)

In [0]:
#hinge
keras.losses.hinge(y_true, y_pred)

In [0]:
#categorical_hinge
keras.losses.categorical_hinge(y_true, y_pred)

In [0]:
#logcosh
keras.losses.logcosh(y_true, y_pred)

#log(cosh(x)) is approximately equal to (x ** 2) / 2 for small x and to abs(x) - log(2) for large x. 
#This means that 'logcosh' works mostly like the mean squared error, 
#but will not be so strongly affected by the occasional wildly incorrect prediction.

In [0]:
#categorical_crossentropy
categorical_crossentropy(y_true, y_pred)

#Note: when using the categorical_crossentropy loss, your targets should be in categorical format 
#(e.g. if you have 10 classes, the target for each sample should be a 10-dimensional vector 
#that is all-zeros expect for a 1 at the index corresponding to the class of the sample). 
#In order to convert integer targets into categorical targets, you can use the Keras utility to_categorical:

In [0]:
#sparse_categorical_crossentropy
sparse_categorical_crossentropy(y_true, y_pred)

In [0]:
#binary_crossentropy
binary_crossentropy(y_true, y_pred)

In [0]:
#kullback_leibler_divergence
kullback_leibler_divergence(y_true, y_pred)

In [0]:
#poisson
poisson(y_true, y_pred)

In [0]:
#cosine_proximity
cosine_proximity(y_true, y_pred)

## [Metrics](http://faroit.com/keras-docs/2.0.5/metrics/)

A metric is a function that is used to judge the performance of your model. Metric functions are to be supplied in the metrics parameter when a model is compiled.
A metric function is similar to an loss function, except that the results from evaluating a metric are not used when training the model.

In [0]:
#binary_accuracy
binary_accuracy(y_true, y_pred)

In [0]:
#categorical_accuracy
categorical_accuracy(y_true, y_pred)

In [0]:
#sparse_categorical_accuracy
sparse_categorical_accuracy(y_true, y_pred)

In [0]:
#top_k_categorical_accuracy
top_k_categorical_accuracy(y_true, y_pred, k=5)

In [0]:
#Custom metrics

#Custom metrics can be passed at the compilation step. The function would need to take (y_true, y_pred) as arguments and return a single tensor value.

import keras.backend as K

def mean_pred(y_true, y_pred):
    return K.mean(y_pred)

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy', mean_pred])

## Training process

In [0]:
#Specify the training configuration (optimizer, loss, metrics)

model.compile(optimizer=keras.optimizers.RMSprop(),  # Optimizer
              # Loss function to minimize
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              # List of metrics to monitor
              metrics=['sparse_categorical_accuracy'])

In [8]:
#Train the model by slicing the data into "batches" of size "batch_size",
#and repeatedly iterating over the entire dataset for a given number of "epochs"

print('# Fit model on training data')
history = model.fit(x_train, y_train,
                    batch_size=64,
                    epochs=3,
                    # We pass some validation for
                    # monitoring validation loss and metrics
                    # at the end of each epoch
                    validation_data=(x_val, y_val))

print('\nhistory dict:', history.history)

# Fit model on training data
Train on 50000 samples, validate on 10000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3

history dict: {'loss': [0.3298180997776985, 0.15687856449246407, 0.11534494979798794], 'sparse_categorical_accuracy': [0.90768, 0.95346, 0.96558], 'val_loss': [0.2099911789894104, 0.14184427043050526, 0.1117049465239048], 'val_sparse_categorical_accuracy': [0.9393, 0.9609, 0.9681]}


In [9]:
# Evaluate the model on the test data using `evaluate`
print('\n# Evaluate on test data')
results = model.evaluate(x_test, y_test, batch_size=128)
print('test loss, test acc:', results)


# Evaluate on test data
test loss, test acc: [0.10941215894818306, 0.9657]


In [10]:
# Generate predictions (probabilities -- the output of the last layer)
# on new data using `predict`
print('\n# Generate predictions for 3 samples')
predictions = model.predict(x_test[:3])
print('predictions shape:', predictions.shape)


# Generate predictions for 3 samples
predictions shape: (3, 10)


In [11]:
print(predictions)

[[ -3.4216528  -9.221339   -2.2195168  -1.696523  -11.821804   -3.8746653
  -15.412098    8.597394   -4.5410852  -2.1572285]
 [-11.423707   -2.8596866   8.363127   -3.1735256 -20.807539   -7.559353
   -9.677645  -18.588085   -5.8435307 -21.758343 ]
 [ -8.393426    4.232292   -3.068518   -3.7088594  -3.4746392  -4.9324656
   -4.3015184  -1.4798787  -2.0609794  -4.4231043]]


For later reuse, let's put our model definition and compile step in functions

In [0]:
def get_uncompiled_model():
  inputs = keras.Input(shape=(784,), name='digits')
  x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
  x = layers.Dense(64, activation='relu', name='dense_2')(x)
  outputs = layers.Dense(10, name='predictions')(x)
  model = keras.Model(inputs=inputs, outputs=outputs)
  return model

def get_compiled_model():
  model = get_uncompiled_model()
  model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
                loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['sparse_categorical_accuracy'])
  return model