# Training and Evaluation with TensorFlow Keras
Source: https://www.tensorflow.org/alpha/guide/keras/training_and_evaluation

This guide covers training, evaluation, and prediction (inference) models in TensorFlow 2.0 in two broad situations:

- Using build-in training & evaluation loops
- Writing your own training & evaluation loops from scratch

## Setup

In [2]:
import tensorflow as tf

tf.keras.backend.clear_session()  # For easy reset of notebook state.

## Part I: Using build-in training & evaluation loops

### API overview: a first end-to-end example

In [3]:
from tensorflow import keras
from tensorflow.keras import layers

inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, activation='softmax', name='predictions')(x)

model = keras.Model(inputs=inputs, outputs=outputs)

In [45]:
# Load a toy dataset for the sake of this example
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
print('x_train.shape: ',x_train.shape)
print('y_tain.shape: ',y_train.shape)
print('x_test.shape: ',x_train.shape)
print('y_test.shape: ',y_train.shape)
# Preprocess the data (these are Numpy arrays)
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255

# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]
print('x_train.shape: ',x_train.shape)


x_train.shape:  (60000, 28, 28)
y_tain.shape:  (60000,)
x_test.shape:  (60000, 28, 28)
y_test.shape:  (60000,)
x_train.shape:  (50000, 784)


In [5]:
# Specify the training configuration (optimizer, loss, metrics)
model.compile(optimizer=keras.optimizers.RMSprop(),  # Optimizer
              # Loss function to minimize
              loss=keras.losses.SparseCategoricalCrossentropy(),
              # List of metrics to monitor
              metrics=[keras.metrics.SparseCategoricalAccuracy()])

print('# Fit model on training data')
history = model.fit(x_train, y_train,
                    batch_size=64,
                    epochs=3,
                    # We pass some validation for
                    # monitoring validation loss and metrics
                    # at the end of each epoch
                    validation_data=(x_val, y_val))

print('\nhistory dict:', history.history)

# Evaluate the model on the test data using `evaluate`
print('\n# Evaluate on test data')
results = model.evaluate(x_test, y_test, batch_size=128)
print('test loss, test acc:', results)

# Generate predictions (probabilities -- the output of the last layer)
# on new data using `predict`
print('\n# Generate predictions for 3 samples')
predictions = model.predict(x_test[:3])
print('predictions shape:', predictions.shape)

# Fit model on training data
Train on 50000 samples, validate on 10000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3

history dict: {'loss': [0.3454093938779831, 0.1591050224161148, 0.11487513874411583], 'sparse_categorical_accuracy': [0.9026, 0.952, 0.96522], 'val_loss': [0.1798063812494278, 0.15136583090424538, 0.11329531900882721], 'val_sparse_categorical_accuracy': [0.9463, 0.9556, 0.9654]}

# Evaluate on test data
test loss, test acc: [0.11658219387680292, 0.9644]

# Generate predictions for 3 samples
predictions shape: (3, 10)


### Specifying a loss, metrics, and an optimizer

In [6]:
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
              loss=keras.losses.SparseCategoricalCrossentropy(),
              metrics=[keras.metrics.SparseCategoricalAccuracy()])

**or**

In [7]:
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
              loss='sparse_categorical_crossentropy',
              metrics=['sparse_categorical_accuracy'])

For later reuse, let's put our model definition and compile step in functions; we will call them several times across different examples in this guide.



In [8]:
def get_uncompiled_model():
  inputs = keras.Input(shape=(784,), name='digits')
  x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
  x = layers.Dense(64, activation='relu', name='dense_2')(x)
  outputs = layers.Dense(10, activation='softmax', name='predictions')(x)
  model = keras.Model(inputs=inputs, outputs=outputs)
  return model

def get_compiled_model():
  model = get_uncompiled_model()
  model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
              loss='sparse_categorical_crossentropy',
              metrics=['sparse_categorical_accuracy'])
  return model

#### Many built-in optimizers, losses, and metrics are available
- Optimizers: - SGD() (with or without momentum) - RMSprop() - Adam() - etc.

- Losses: - MeanSquaredError() - KLDivergence() - CosineSimilarity() - etc.

- Metrics: - AUC() - Precision() - Recall() - etc.



### How to Write custom losses and metrics
Create custom metrics by subclassing the **Metric** class. You will need to implement 4 methods:
- __init__(self), in which you will create state variables for your metric.

- update_state(self, y_true, y_pred, sample_weight=None), which uses the targets y_true and the model predictions y_pred to update the state variables.

- result(self), which uses the state variables to compute the final results.

- reset_states(self), which reinitializes the state of the metric.

Here's a simple example showing how to implement a `CatgoricalTruePositives` metric, that counts how many samples where correctly classified as belonging to a given class:

In [9]:
class CatgoricalTruePositives(keras.metrics.Metric):
  
    def __init__(self, name='binary_true_positives', **kwargs):
      super(CatgoricalTruePositives, self).__init__(name=name, **kwargs)
      self.true_positives = self.add_weight(name='tp', initializer='zeros')

    def update_state(self, y_true, y_pred, sample_weight=None):
      y_pred = tf.argmax(y_pred)
      values = tf.equal(tf.cast(y_true, 'int32'), tf.cast(y_pred, 'int32'))
      values = tf.cast(values, 'float32')
      if sample_weight is not None:
        sample_weight = tf.cast(sample_weight, 'float32')
        values = tf.multiply(values, sample_weight)
      return self.true_positives.assign_add(tf.reduce_sum(values))  # TODO: fix

    def result(self):
      return tf.identity(self.true_positives)  # TODO: fix
    
    def reset_states(self):
      # The state of the metric will be reset at the start of each epoch.
      self.true_positives.assign(0.)


In [10]:
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
              loss=keras.losses.SparseCategoricalCrossentropy(),
              metrics=[CatgoricalTruePositives()])
model.fit(x_train, y_train,
          batch_size=64,
          epochs=3)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x1580e5dd748>

#### Handling losses and metrics that don't fit the standard signature
A regularization loss may only require the activation of a layer (there are no targets in this case), and this activation may not be a model output.

In [11]:
class ActivityRegularizationLayer(layers.Layer):
  
  def call(self, inputs):
    self.add_loss(tf.reduce_sum(inputs) * 0.1)
    return inputs  # Pass-through layer.
  
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)

# Insert activity regularization as a layer
x = ActivityRegularizationLayer()(x)

x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, activation='softmax', name='predictions')(x)

model = keras.Model(inputs=inputs, outputs=outputs)


In [12]:
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
              loss='sparse_categorical_crossentropy')

# The displayed loss will be much higher than before
# due to the regularization component.
model.fit(x_train, y_train,
          batch_size=64,
          epochs=1)



<tensorflow.python.keras.callbacks.History at 0x1580e93bba8>

You can do the same for **logging metric** values:

In [13]:
class MetricLoggingLayer(layers.Layer):
  
  def call(self, inputs):
    # The `aggregation` argument defines
    # how to aggregate the per-batch values
    # over each epoch:
    # in this case we simply average them.
    self.add_metric(keras.backend.std(inputs),
                    name='std_of_activation',
                    aggregation='mean')
    return inputs  # Pass-through layer.

  
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)

# Insert std logging as a layer.
x = MetricLoggingLayer()(x)

x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, activation='softmax', name='predictions')(x)

model = keras.Model(inputs=inputs, outputs=outputs)


In [14]:
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
              loss='sparse_categorical_crossentropy')
model.fit(x_train, y_train,
          batch_size=64,
          epochs=1)



<tensorflow.python.keras.callbacks.History at 0x1580feb0668>

In **Functional API**, you can also call `model.add_loss(loss_tensor)`, or `model.add_metric(metric_tensor, name, aggregation)`.

Here's a simple example:

In [15]:
inputs = keras.Input(shape=(784,), name='digits')
x1 = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x2 = layers.Dense(64, activation='relu', name='dense_2')(x1)
outputs = layers.Dense(10, activation='softmax', name='predictions')(x2)
model = keras.Model(inputs=inputs, outputs=outputs)

model.add_loss(tf.reduce_sum(x1) * 0.1)

model.add_metric(keras.backend.std(x1),
                 name='std_of_activation',
                 aggregation='mean')

model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
              loss='sparse_categorical_crossentropy')
model.fit(x_train, y_train,
          batch_size=64,
          epochs=1)



<tensorflow.python.keras.callbacks.History at 0x15813490550>

## Automatically setting apart a validation holdout set

In [16]:
model = get_compiled_model()
model.fit(x_train, y_train, batch_size=64, validation_split=0.2, epochs=3)

Train on 40000 samples, validate on 10000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x1580cfa7128>

### Training & evaluation from tf.data Datasets
Let's now take a look at the case where your data comes in the form of a tf.data Dataset.

You can pass a Dataset instance directly to the methods `fit()`, `evaluate()`, and `predict()`:



In [17]:
model = get_compiled_model()

# First, let's create a training Dataset instance.
# For the sake of our example, we'll use the same MNIST data as before.
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
# Shuffle and slice the dataset.
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

# Now we get a test dataset.
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))
test_dataset = test_dataset.batch(64)

# Since the dataset already takes care of batching,
# we don't pass a `batch_size` argument.
model.fit(train_dataset, epochs=3)

# You can also evaluate or predict on a dataset.
print('\n# Evaluate')
model.evaluate(test_dataset)

Epoch 1/3
Epoch 2/3
Epoch 3/3

# Evaluate


[0.14607782095261393, 0.954]

## specific number of batches from this Dataset, you can pass the `steps_per_epoch`

In [18]:
model = get_compiled_model()

# Prepare the training dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

# Only use the 100 batches per epoch (that's 64 * 100 samples)
model.fit(train_dataset, epochs=3, steps_per_epoch=100)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x1581414c588>

#### Using a validation dataset
You can pass a Dataset instance as the v`alidation_data` argument in` fit`:

In [19]:
model = get_compiled_model()

# Prepare the training dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

# Prepare the validation dataset
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(64)

model.fit(train_dataset, epochs=3, validation_data=val_dataset)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x1581414cdd8>

#### Specify number of batches from this Dataset
You can pass the `validation_steps` argument:

In [20]:
model = get_compiled_model()

# Prepare the training dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

# Prepare the validation dataset
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(64)

model.fit(train_dataset, epochs=3,
          # Only run validation using the first 10 batches of the dataset
          # using the `validation_steps` argument
          validation_data=val_dataset, validation_steps=10)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x15816789c88>

### Using sample weighting and class weighting
Besides input data and target data, it is possible to pass sample weights or class weights to a model when using fit:

- When training from Numpy data: via the `sample_weight` and `class_weight arguments`.
- When training from Datasets: by having the Dataset return a tuple (`input_batch`, `target_batch`, s`ample_weight_batch`) .


A "sample weights" array is an array of numbers that specify how much weight each sample in a batch should have in computing the total loss. It is commonly used in imbalanced classification problems (the idea being to give more weight to rarely-seen classes). When the weights used are ones and zeros, the array can be used as a mask for the loss function (entirely discarding the contribution of certain samples to the total loss).

A "class weights" dict is a more specific instance of the same concept: it maps class indices to the sample weight that should be used for samples belonging to this class. For instance, if class "0" is twice less represented than class "1" in your data, you could use `class_weight={0: 1., 1: 0.5}`.



#### Give more importance to the correct classification of class #5 (which is the digit "5" in the MNIST dataset).



In [21]:
import numpy as np

class_weight = {0: 1., 1: 1., 2: 1., 3: 1., 4: 1.,
                # Set weight "2" for class "5",
                # making this class 2x more important
                5: 2.,
                6: 1., 7: 1., 8: 1., 9: 1.}
model.fit(x_train, y_train,
          class_weight=class_weight,
          batch_size=64,
          epochs=4)

# Here's the same example using `sample_weight` instead:
sample_weight = np.ones(shape=(len(y_train),))
sample_weight[y_train == 5] = 2.

model = get_compiled_model()
model.fit(x_train, y_train,
          sample_weight=sample_weight,
          batch_size=64,
          epochs=4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<tensorflow.python.keras.callbacks.History at 0x15818684da0>

Here's a matching Dataset example:



In [22]:
sample_weight = np.ones(shape=(len(y_train),))
sample_weight[y_train == 5] = 2.

# Create a Dataset that includes sample weights
# (3rd element in the return tuple).
train_dataset = tf.data.Dataset.from_tensor_slices(
    (x_train, y_train, sample_weight))

# Shuffle and slice the dataset.
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

model = get_compiled_model()
model.fit(train_dataset, epochs=3)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x158180f6470>

### Passing data to multi-input, multi-output models

Consider the following model, which has an image input of shape `(32, 32, 3)` (that's `(height, width, channels)`) and a timeseries input of shape `(None, 10)` (that's `(timesteps, features)`). Our model will have two outputs computed from the combination of these `inputs: a "score"` (of shape `(1,)`) and a probability distribution over 5 classes (of shape `(10,)`).



In [23]:
from tensorflow import keras
from tensorflow.keras import layers

image_input = keras.Input(shape=(32, 32, 3), name='img_input')
timeseries_input = keras.Input(shape=(None, 10), name='ts_input')

x1 = layers.Conv2D(3, 3)(image_input)
x1 = layers.GlobalMaxPooling2D()(x1)

x2 = layers.Conv1D(3, 3)(timeseries_input)
x2 = layers.GlobalMaxPooling1D()(x2)

x = layers.concatenate([x1, x2])

score_output = layers.Dense(1, name='score_output')(x)
class_output = layers.Dense(5, activation='softmax', name='class_output')(x)

model = keras.Model(inputs=[image_input, timeseries_input],
                    outputs=[score_output, class_output])

At **compilation time**, we can specify different losses to different ouptuts, by passing the loss functions as a list:



In [24]:
model.compile(
    optimizer=keras.optimizers.RMSprop(1e-3),
    loss=[keras.losses.MeanSquaredError(),
          keras.losses.CategoricalCrossentropy()])

Likewise for metrics:



In [25]:
model.compile(
    optimizer=keras.optimizers.RMSprop(1e-3),
    loss=[keras.losses.MeanSquaredError(),
          keras.losses.CategoricalCrossentropy()],
    metrics=[[keras.metrics.MeanAbsolutePercentageError(),
              keras.metrics.MeanAbsoluteError()],
             [keras.metrics.CategoricalAccuracy()]])

or Since we gave names

In [26]:
model.compile(
    optimizer=keras.optimizers.RMSprop(1e-3),
    loss={'score_output': keras.losses.MeanSquaredError(),
          'class_output': keras.losses.CategoricalCrossentropy()},
    metrics={'score_output': [keras.metrics.MeanAbsolutePercentageError(),
                              keras.metrics.MeanAbsoluteError()],
             'class_output': [keras.metrics.CategoricalAccuracy()]})

We recommend the use of explicit names and dicts if you have more than 2 outputs.
 
 One might wish to privilege the "score" loss in our example, by giving to 2x the importance of the class loss, using the `loss_weight` argument:

In [27]:
model.compile(
    optimizer=keras.optimizers.RMSprop(1e-3),
    loss={'score_output': keras.losses.MeanSquaredError(),
          'class_output': keras.losses.CategoricalCrossentropy()},
    metrics={'score_output': [keras.metrics.MeanAbsolutePercentageError(),
                              keras.metrics.MeanAbsoluteError()],
             'class_output': [keras.metrics.CategoricalAccuracy()]},
    loss_weight={'score_output': 2., 'class_output': 1.})

Choose not to compute a loss for certain outputs

In [28]:
# List loss version
model.compile(
    optimizer=keras.optimizers.RMSprop(1e-3),
    loss=[None, keras.losses.CategoricalCrossentropy()])

# Or dict loss version
model.compile(
    optimizer=keras.optimizers.RMSprop(1e-3),
    loss={'class_output': keras.losses.CategoricalCrossentropy()})

W0320 22:57:44.684697 29612 training_utils.py:1152] Output score_output missing from loss dictionary. We assume this was done on purpose. The fit and evaluate APIs will not be expecting any data to be passed to score_output.


In [29]:
model.compile(
    optimizer=keras.optimizers.RMSprop(1e-3),
    loss=[keras.losses.MeanSquaredError(),
          keras.losses.CategoricalCrossentropy()])

# Generate dummy Numpy data
img_data = np.random.random_sample(size=(100, 32, 32, 3))
ts_data = np.random.random_sample(size=(100, 20, 10))
score_targets = np.random.random_sample(size=(100, 1))
class_targets = np.random.random_sample(size=(100, 5))

# Fit on lists
model.fit([img_data, ts_data], [score_targets, class_targets],
          batch_size=32,
          epochs=3)

# Alernatively, fit on dicts
model.fit({'img_input': img_data, 'ts_input': ts_data},
          {'score_output': score_targets, 'class_output': class_targets},
          batch_size=32,
          epochs=3)

Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x1581a3498d0>

Dataset use case: similarly as what we did for Numpy arrays, the Dataset should return a tuple of dicts.



In [30]:
train_dataset = tf.data.Dataset.from_tensor_slices(
    ({'img_input': img_data, 'ts_input': ts_data},
     {'score_output': score_targets, 'class_output': class_targets}))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

model.fit(train_dataset, epochs=3)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x158180f6ba8>

### Using callbacks
Callbacks in Keras are objects that are called at different point during training (at the start of an epoch, at the end of a batch, at the end of an epoch, etc.) and which can be used to implement behaviors such as:

- Doing validation at different points during training (beyond the built-in per-epoch validation)
- Checkpointing the model at regular intervals or when it exceeds a certain accuracy threshold
- Changing the learning rate of the model when training seems to be plateauing
- Doing fine-tuning of the top layers when training seems to be plateauing
- Sending email or instant message notifications when training ends or where a certain performance threshold is exceeded
- Etc.

Callbacks can be passed as a list to your call to `fit`:



In [31]:
model = get_compiled_model()

callbacks = [
    keras.callbacks.EarlyStopping(
        # Stop training when `val_loss` is no longer improving
        monitor='val_loss',
        # "no longer improving" being defined as "no better than 1e-2 less"
        min_delta=1e-2,
        # "no longer improving" being further defined as "for at least 2 epochs"
        patience=2,
        verbose=1)
]
model.fit(x_train, y_train,
          epochs=20,
          batch_size=64,
          callbacks=callbacks,
          validation_split=0.2)

Train on 40000 samples, validate on 10000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 00007: early stopping


<tensorflow.python.keras.callbacks.History at 0x1582a3a7fd0>

#### Many built-in callbacks
- **ModelCheckpoint**: Periodically save the model.
- **EarlyStopping**: Stop training when training is no longer improving the validation metrics.
- **TensorBoard**: periodically write model logs that can be visualized in TensorBoard (more details in the section "Visualization").
- **CSVLogger**: streams loss and metrics data to a CSV file.
- etc.

#### How to Write your own Callback
Create a custom callback by extending the base class `keras.callbacks.Callback`.

Access to its associated model through the class property `self.model`.

**Simple Example**: Saving a list of per-batch loss values during training:

In [32]:
class LossHistory(keras.callbacks.Callback):

    def on_train_begin(self, logs):
        self.losses = []

    def on_batch_end(self, batch, logs):
        self.losses.append(logs.get('loss'))

### Checkpointing models
When you're training model on relatively large datasets, it's crucial to save checkpoints of your model at frequent intervals.

The easiest way to achieve this is with the ModelCheckpoint callback:



In [33]:
model = get_compiled_model()

callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath='mymodel_{epoch}.h5',
        # Path where to save the model
        # The two parameters below mean that we will overwrite
        # the current checkpoint if and only if
        # the `val_loss` score has improved.
        save_best_only=True,
        monitor='val_loss',
        verbose=1)
]
model.fit(x_train, y_train,
          epochs=3,
          batch_size=64,
          callbacks=callbacks,
          validation_split=0.2)

Train on 40000 samples, validate on 10000 samples
Epoch 1/3
Epoch 00001: val_loss improved from inf to 0.23448, saving model to mymodel_1.h5
Epoch 2/3
Epoch 00002: val_loss improved from 0.23448 to 0.17551, saving model to mymodel_2.h5
Epoch 3/3
Epoch 00003: val_loss improved from 0.17551 to 0.16445, saving model to mymodel_3.h5


<tensorflow.python.keras.callbacks.History at 0x1582aafc860>

##### You call also write your own callback for saving and restoring models.

For a complete guide on serialization and saving, see Guide to Saving and Serializing Models.



### Using learning rate schedules
A common pattern when training deep learning models is to gradually reduce the learning as training progresses. This is generally known as "learning rate decay".

#### Passing a schedule to an optimizer
static learning rate decay schedule by passing a schedule object as the learning_rate argument in your optimizer:


In [34]:
initial_learning_rate = 0.1
lr_schedule = keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate,
    decay_steps=100000,
    decay_rate=0.96,
    staircase=True)

optimizer = keras.optimizers.RMSprop(learning_rate=lr_schedule)

Several built-in schedules are available: ExponentialDecay, PiecewiseConstantDecay, PolynomialDecay, and InverseTimeDecay.

#### dynamic learning rate schedule
A dynamic learning rate schedule (for instance, decreasing the learning rate when the validation loss is no longer improving) cannot be achieved with these schedule objects since the optimizer does not have access to validation metrics.

However, callbacks do have access to all metrics, including validation metrics! You can thus achieve this pattern by using a callback that modifies the current learning rate on the optimizer. In fact, this is even built-in as the ReduceLROnPlateau callback.

### Visualizing loss and metrics during training

In [35]:
tensorboard --logdir=/full_path_to_your_logs

SyntaxError: invalid syntax (<ipython-input-35-d76f85f8d4f6>, line 1)

#### Using the TensorBoard callback

In [36]:
tensorboard_cbk = keras.callbacks.TensorBoard(log_dir='/full_path_to_your_logs')
model.fit(dataset, epochs=10, callbacks=[tensorboard_cbk])

NameError: name 'dataset' is not defined

In [37]:
keras.callbacks.TensorBoard(
  log_dir='/full_path_to_your_logs',
  histogram_freq=0,  # How often to log histogram visualizations
  embeddings_freq=0,  # How often to log embedding visualizations
  update_freq='epoch')  # How often to write logs (default: once per epoch)

<tensorflow.python.keras.callbacks.TensorBoard at 0x1582740bc18>

## Part II: Writing your own training & evaluation loops from scratch

### Using the GradientTape: a first end-to-end example

In [38]:
# Get the model.
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, activation='softmax', name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)

# Instantiate an optimizer.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy()

# Prepare the training dataset.
batch_size = 64
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

# Iterate over epochs.
for epoch in range(3):
  print('Start of epoch %d' % (epoch,))
  
  # Iterate over the batches of the dataset.
  for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):

    # Open a GradientTape to record the operations run
    # during the forward pass, which enables autodifferentiation.
    with tf.GradientTape() as tape:

      # Run the forward pass of the layer.
      # The operations that the layer applies
      # to its inputs are going to be recorded
      # on the GradientTape.
      logits = model(x_batch_train)  # Logits for this minibatch

      # Compute the loss value for this minibatch.
      loss_value = loss_fn(y_batch_train, logits)

      # Use the gradient tape to automatically retrieve
      # the gradients of the trainable weights with respect to the loss.
      grads = tape.gradient(loss_value, model.trainable_variables)

      # Run one step of gradient descent by updating
      # the value of the weights to minimize the loss.
      optimizer.apply_gradients(zip(grads, model.trainable_variables))

    # Log every 200 batches.
    if step % 200 == 0:
        print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
        print('Seen so far: %s samples' % ((step + 1) * 64))

Start of epoch 0
Training loss (for one batch) at step 0: 2.416548728942871
Seen so far: 64 samples
Training loss (for one batch) at step 200: 2.24124813079834
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 2.148831367492676
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 2.1140458583831787
Seen so far: 38464 samples
Start of epoch 1
Training loss (for one batch) at step 0: 2.0647637844085693
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.876824975013733
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 1.7586379051208496
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 1.7045397758483887
Seen so far: 38464 samples
Start of epoch 2
Training loss (for one batch) at step 0: 1.609616994857788
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.4744676351547241
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 1.3040378093719482
Seen so far: 25664

### Low-level handling of metrics
Let's add metrics to the mix. You can readily reuse the built-in metrics (or custom ones you wrote) in such training loops written from scratch. Here's the flow:

- Instantiate the metric at the start of the loop
- Call metric.update_state() after each batch
- Call metric.result() when you need to display the current value of the metric
- Call metric.reset_states() when you need to clear the state of the metric (typically at the end of an epoch)



Let's use this knowledge to compute SparseCategoricalAccuracy on validation data at the end of each epoch:



In [39]:
# Get model
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, activation='softmax', name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)

# Instantiate an optimizer to train the model.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy()

# Prepare the metrics.
train_acc_metric = keras.metrics.SparseCategoricalAccuracy() 
val_acc_metric = keras.metrics.SparseCategoricalAccuracy()

# Prepare the training dataset.
batch_size = 64
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

# Prepare the validation dataset.
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(64)


# Iterate over epochs.
for epoch in range(3):
  print('Start of epoch %d' % (epoch,))
  
  # Iterate over the batches of the dataset.
  for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
    with tf.GradientTape() as tape:
      logits = model(x_batch_train)
      loss_value = loss_fn(y_batch_train, logits)
    grads = tape.gradient(loss_value, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))
      
    # Update training metric.
    train_acc_metric(y_batch_train, logits)

    # Log every 200 batches.
    if step % 200 == 0:
        print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
        print('Seen so far: %s samples' % ((step + 1) * 64))

  # Display metrics at the end of each epoch.
  train_acc = train_acc_metric.result()
  print('Training acc over epoch: %s' % (float(train_acc),))
  # Reset training metrics at the end of each epoch
  train_acc_metric.reset_states()

  # Run a validation loop at the end of each epoch.
  for x_batch_val, y_batch_val in val_dataset:
    val_logits = model(x_batch_val)
    # Update val metrics
    val_acc_metric(y_batch_val, val_logits)
  val_acc = val_acc_metric.result()
  val_acc_metric.reset_states()
  print('Validation acc: %s' % (float(val_acc),))

Start of epoch 0
Training loss (for one batch) at step 0: 2.266350269317627
Seen so far: 64 samples
Training loss (for one batch) at step 200: 2.2238714694976807
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 2.2236430644989014
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 2.049403667449951
Seen so far: 38464 samples
Training acc over epoch: 0.2945399880409241
Validation acc: 0.45820000767707825
Start of epoch 1
Training loss (for one batch) at step 0: 1.8961845636367798
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.8781015872955322
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 1.892068862915039
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 1.605812430381775
Seen so far: 38464 samples
Training acc over epoch: 0.5465199947357178
Validation acc: 0.6521999835968018
Start of epoch 2
Training loss (for one batch) at step 0: 1.4581108093261719
Seen so far: 64 samples
Training

### Low-level handling of extra losses

In [40]:
class ActivityRegularizationLayer(layers.Layer):
  
  def call(self, inputs):
    self.add_loss(1e-2 * tf.reduce_sum(inputs))
    return inputs
  
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
# Insert activity regularization as a layer
x = ActivityRegularizationLayer()(x)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, activation='softmax', name='predictions')(x)

model = keras.Model(inputs=inputs, outputs=outputs)

When you call a model, like this:

In [42]:
logits = model(x_train)
# the losses it creates during the forward pass are added to the model.losses attribute:
logits = model(x_train[:64])
print(model.losses)


[<tf.Tensor: id=1005933, shape=(), dtype=float32, numpy=6.8379745>]


In [43]:
logits = model(x_train[:64])
logits = model(x_train[64: 128])
logits = model(x_train[128: 192])
print(model.losses)

[<tf.Tensor: id=1005994, shape=(), dtype=float32, numpy=6.6601048>]


To take these losses into account during training, all you have to do is to modify your training loop to add sum(model.losses) to your total loss:



In [44]:
optimizer = keras.optimizers.SGD(learning_rate=1e-3)

for epoch in range(3):
  print('Start of epoch %d' % (epoch,))

  for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
    with tf.GradientTape() as tape:
      logits = model(x_batch_train)
      loss_value = loss_fn(y_batch_train, logits)

      # Add extra losses created during this forward pass:
      loss_value += sum(model.losses)
      
    grads = tape.gradient(loss_value, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))

    # Log every 200 batches.
    if step % 200 == 0:
        print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
        print('Seen so far: %s samples' % ((step + 1) * 64))

Start of epoch 0
Training loss (for one batch) at step 0: 9.203619956970215
Seen so far: 64 samples
Training loss (for one batch) at step 200: 2.4950966835021973
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 2.383124351501465
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 2.3560397624969482
Seen so far: 38464 samples
Start of epoch 1
Training loss (for one batch) at step 0: 2.3431715965270996
Seen so far: 64 samples
Training loss (for one batch) at step 200: 2.328341484069824
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 2.3178470134735107
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 2.3198299407958984
Seen so far: 38464 samples
Start of epoch 2
Training loss (for one batch) at step 0: 2.3188881874084473
Seen so far: 64 samples
Training loss (for one batch) at step 200: 2.3141603469848633
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 2.307861566543579
Seen so far: 256