### This chapter covers:
- Understanding the tension between generalization and optimization, the fundamental issue in machine learning Evaluation methods for machine learning models
- Best practices to improve model fitting
- Best practices to achieve better generalization

### Weight regularization

In Keras, weight regularization is added by passing weight regularizer instances to layers as keyword arguments. Let’s add L2 weight regularization to our initial movie-review classification model.

In [3]:
from keras import regularizers, layers
from tensorflow import keras

model = keras.Sequential([
    layers.Dense(16,
                kernel_regularizer=regularizers.l2(0.002),
                activation="relu"),
    layers.Dense(16,
                kernel_regularizer=regularizers.l2(0.002)),
    layers.Dense(1, activation="sigmoid")
])

model.compile(optimizer="rmsprop",
            loss="binary_crossentropy",
            metrics=["accuracy"])

In [6]:
from keras.datasets import imdb
import numpy as np

# Load the data
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

# Vectorize the sequences (convert to multi-hot encoding)
def vectorize_sequences(sequences, dimension=10000):
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        for j in sequence:
            if j < dimension:  # Only encode words in our vocabulary
                results[i, j] = 1
    return results

# Vectorize the data
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)

# Convert labels to float32
y_train = np.asarray(train_labels).astype("float32")
y_test = np.asarray(test_labels).astype("float32")

# Now fit the model with vectorized data
history_l2_reg = model.fit(x_train, y_train,
    epochs=20, batch_size=512, validation_split=0.4)

Epoch 1/20
[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 39ms/step - accuracy: 0.7882 - loss: 0.5884 - val_accuracy: 0.8566 - val_loss: 0.4681
Epoch 2/20
[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.8919 - loss: 0.3937 - val_accuracy: 0.8685 - val_loss: 0.4089
Epoch 3/20
[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.9117 - loss: 0.3270 - val_accuracy: 0.8707 - val_loss: 0.3963
Epoch 4/20
[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 22ms/step - accuracy: 0.9271 - loss: 0.2889 - val_accuracy: 0.8860 - val_loss: 0.3668
Epoch 5/20
[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 23ms/step - accuracy: 0.9325 - loss: 0.2714 - val_accuracy: 0.8885 - val_loss: 0.3619
Epoch 6/20
[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 16ms/step - accuracy: 0.9372 - loss: 0.2568 - val_accuracy: 0.8794 - val_loss: 0.3846
Epoch 7/20
[1m30/30[0m [32m━━━━

As an alternative to L2 regularization, you can use one of the following Keras weight regularizers.

### Dropout

In Keras, you can introduce dropout in a model via the Dropout layer, which is applied to the output of the layer right before it. Let’s add two Dropout layers in the IMDB model to see how well they do at reducing overfitting.

In [None]:
model = keras.Sequential([
    layers.Dense(16, activation="relu"),
    layers.Dropout(0.5),
    layers.Dense(16, activation="relu"),
    layers.Dropout(0.5),
    layers.Dense(1, activation="sigmoid")
])

model.compile(optimizer="rmsprop",
            loss="binary_crossentropy",
            metrics=["accuracy"])

history_dropout = model.fit(
    x_train, y_train,
    epochs=20, batch_size=512, validation_split=0.4)