# Keras API for model layers and training

### Acknowledgments & Credits

This lesson is adapted largely from the excellent curriculum materials by Cliburn Chan (2021) at https://github.com/cliburn/bios-823-2021/ under the MIT License.

### References

- Keras: https://keras.io/
- **Keras guides**: https://keras.io/guides/
- Keras API documentation: https://keras.io/api/


In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

In [None]:
import keras

## Keras

In [None]:
Dense = keras.layers.Dense

We can consider a DL model as just a black box with a bunch of unnown parameters. For example, when the output is a Dense layer with just one node, the entire network model is just doing some form of regression. If we use a single node with a sigmoid activation function, the model is essentially doing logistic regression.

## Building blocks

A `keras` model is composed of **layers**. Each layer has its own **activation** function. Each layer also has its own biases and weights. To set initial random weights, there are several possible strategies known as **initializers**. To fit the model, you need to specify a **loss** function. During training, the **optimizer** finds biases and weights that minimize the loss function. Model performance is evaluated using **metrics**.

Commonly used versions of these classes or functions come built-in with `keras`.

![img](https://miro.medium.com/proxy/1*YL2a2dbDQ5754h_ktDj8mQ.png)

### Layers

In [None]:
for x in dir(keras.layers):
   if x[0].isupper() and not x.startswith('_'):
       print(x)

### Activations

In [None]:
for x in dir(keras.activations):
    if x[0].islower() and not x.startswith('_'):
        print(x)

#### Example

In [None]:
x = keras.ops.arange(-5, 5, 0.01)
y_sigmoid = keras.activations.sigmoid(x)
y_tanh = keras.activations.tanh(x)

In [None]:
plt.plot(x, y_sigmoid, label='sigmoid');
plt.plot(x, y_tanh, label='tanh');
plt.hlines(0, -5, 5, linestyles="--", color='black', lw=0.5);
plt.vlines(0, -1, 1, linestyles="--", color='black', lw=0.5);
plt.legend();

In [None]:
y_relu = keras.activations.relu(x)
y_gelu = keras.activations.gelu(x)
y_elu = keras.activations.elu(x)

In [None]:
plt.plot(x, y_relu, label='relu');
plt.plot(x, y_gelu, label='gelu');
plt.plot(x, y_elu, label='elu');
plt.hlines(0, -5, 5, linestyles="--", color='black', lw=0.5);
plt.vlines(0, -1, 2, linestyles="--", color='black', lw=0.5);
plt.ylim(-1.1, 2.1);
plt.legend();

### Initializers

In [None]:
[x for x in dir(keras.initializers) if 
 x[0].isupper() and 
 not x.startswith('_')]

#### Example

In [None]:
init = keras.initializers.Identity()
init(shape=(3,3)).numpy()

In [None]:
init = keras.initializers.GlorotNormal(seed=0)
init(shape=(2,3)).numpy()

In [None]:
init = keras.initializers.LecunNormal(seed=0)
init(shape=(2,3)).numpy()

### Losses

In [None]:
[x for x in dir(keras.losses) if
 x[0].isupper() and 
 not x.startswith('_')]

#### Example

In [None]:
loss = keras.losses.BinaryCrossentropy()

In [None]:
y_true = np.array([1,0,0,1])
y_pred = np.array([0.9, 0.2, 0.3, 0.8])
loss(y_true, y_pred).numpy()

In [None]:
loss = keras.losses.CategoricalCrossentropy()
y_true = np.array([[0,1], [1,0], [1,0], [0,1]])
y_pred = np.array([[0.1, 0.9], [0.8, 0.2], [0.7, 0.3], [0.2, 0.8]])
loss(y_true, y_pred).numpy()

In [None]:
loss = keras.losses.SparseCategoricalCrossentropy()
y_true = np.array([1,0,0,1])
y_pred = np.array([[0.1, 0.9], [0.8, 0.2], [0.7, 0.3], [0.2, 0.8]])
loss(y_true, y_pred).numpy()

### Metrics

In [None]:
[x for x in dir(keras.metrics) if
 x[0].isupper() and 
 not x.startswith('_')]

#### Example

In [None]:
metric = keras.metrics.Accuracy()

In [None]:
metric.reset_state()

In [None]:
metric.update_state(
    [[1], [2], [3]],
    [[1], [1], [3]]
)

In [None]:
metric.result().numpy()

### Optimizers

In [None]:
[x for x in dir(keras.optimizers) if
 x[0].isupper() and 
 not x.startswith('_')]

## Sequential and Functional APIs

In [None]:
import tensorflow_datasets as tfds

## Building models

### Prepare data

In [None]:
(ds_train, ds_val), info = tfds.load(
    'iris',
    split=[
       'train[:70%]',
       'train[70%:]'
    ],
    batch_size=8,
    with_info=True,
    as_supervised=True,
)

In [None]:
for row, label in ds_train.take(1):
    print(row.numpy())
    print(label.numpy())

In [None]:
import tensorflow as tf

In [None]:
shape = info.features.shape['features']
shape

In [None]:
info.features['label'].num_classes

### Pre-process data

If you need to pre-process your data, see https://keras.io/api/preprocessing/. We will not do any pre-processing for simplicity.

### Sequential API

If the entire pipeline is a single chain of layers, the Sequential API is the simplest to use.

In [None]:
from keras.models import Sequential, Model
from keras.layers import Dense

In [None]:
model = Sequential()
model.add(Dense(8, input_shape=shape))
model.add(Dense(4, activation='relu'))
model.add(Dense(3, activation='softmax'))

In [None]:
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

In [None]:
model.summary()

In [None]:
hist = model.fit(
    ds_train,
    validation_data=ds_val,
    epochs=50, 
    verbose=0)

In [None]:
hist.history.keys()

In [None]:
fig, axes = plt.subplots(1,2,figsize=(12, 4))
for ax, measure in zip(axes, ['loss', 'accuracy']):
    ax.plot(hist.history[measure], label=measure)
    ax.plot(hist.history['val_' + measure], label='val_' + measure)
    ax.set_xlabel('epoch')
    ax.legend()

In [None]:
from keras.utils import plot_model

In [None]:
plot_model(model)

### Saving and loading models

We'll first generate a set of predictions for the validation dataset so we can verify that the model is saved and then loaded correctly.

In [None]:
ds_val_len = ds_val.unbatch().reduce(0, lambda x, _: x + 1).numpy()
xs = ds_val.map(lambda x, y: x).unbatch().batch(ds_val_len)

In [None]:
y_pred1 = model.predict(xs)


In [None]:
y_pred1.shape

Saving the model structure and weights:

In [None]:
model.save('iris.keras')

Saving only the model weights:

In [None]:
model.save_weights('iris.weights.h5')

Loading the model structure and weights:

In [None]:
model1 = keras.models.load_model('iris.keras')


Check predictions are the same:

In [None]:
np.all(model1.predict(xs) == y_pred1)

Loading only the model weights (we need to re-create the model structure first):

In [None]:
model3 = Sequential()
model3.add(Dense(8, input_shape=shape))
model3.add(Dense(4, activation='relu'))
model3.add(Dense(3, activation='softmax'))

In [None]:
model3.load_weights('iris.weights.h5')

Check predictions are the same:

In [None]:
np.all(model3.predict(xs) == y_pred1)

### Functional API

In [None]:
from keras.layers import Dense, Input

In [None]:
input = Input(shape=info.features.shape['features'])
x = Dense(8)(input)
x = Dense(4, activation='relu')(x)
output = Dense(3, activation='softmax')(x)
model = Model(inputs=[input], outputs=[output])

In [None]:
model.summary()

In [None]:
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

In [None]:
hist = model.fit(
    ds_train,
    validation_data=ds_val,
    epochs=50,
    verbose=0)

In [None]:
fig, axes = plt.subplots(1,2,figsize=(12, 4))
for ax, measure in zip(axes, ['loss', 'accuracy']):
    ax.plot(hist.history[measure], label=measure)
    ax.plot(hist.history['val_' + measure], label='val_' + measure)
    ax.legend()

In [None]:
plot_model(model)

#### Flexibility of the Functional API

The Functional API is more flexible than the Sequential API:
- Allows multiple inputs (in contrast to only one input for the Sequential API).
- Allows acyclic graphs of layers such as skip connections (in contrast to only linear stacks of layers for the Sequential API).
- Allows multiple outputs (in contrast to only one output for the Sequential API). Note that if you have multiple outputs, you probably also want multiple loss functions given as a list in the compile step (unless the same loss function is applicable to each outputs).

In [None]:
from keras.layers import Add

In [None]:
input = Input(shape=(4,))
x1 = Dense(8)(input)
x2 = Dense(4, activation='relu')(x1)
x3 = Add()([input, x2])
output = Dense(3, activation='softmax')(x3)
model = Model(inputs=[input], outputs=[output])

In [None]:
plot_model(model, show_shapes=True)