## Accuracy Results
With BatchNormalization, the accuracy improves significantly for the train dataset. However, the accuracy is about the same for the test datasets. This indicates overfitting. 

| Train/Test | without | with BatchNormalization |
|:----|:-------------|:-----------|
| train | 0.7071  | 0.8849       |
| test | 0.6909  | 0.7005    |

We are going to take a look at Regularization to overcome this issue.

In [1]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import cifar10

In [17]:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print(x_train.shape, y_train.shape)
print(x_train.dtype, y_train.dtype)

(50000, 32, 32, 3) (50000, 1)
uint8 uint8


In [18]:
# convert to "float32" and normalize the value for faster computation
# this only applies to x_train and x_test (not y_train, y_test)
x_train = x_train.astype("float32") / 255.0
x_test  = x_test.astype("float32") / 255.0
print(x_train.shape, y_train.shape)
print(x_train.dtype, y_train.dtype)

(50000, 32, 32, 3) (50000, 1)
float32 uint8


In [19]:
# Option A: Sequential API (very convenient, not ver flexible)
model = keras.Sequential(
    [
        keras.Input(shape=(32, 32, 3)),
        layers.Conv2D(32, 3, padding="valid", activation="relu"),
        layers.MaxPooling2D(),
        layers.Conv2D(64, 3, activation="relu"),
        layers.MaxPooling2D(),
        layers.Conv2D(128, 3, activation="relu"),
        layers.Flatten(),
        layers.Dense(64, activation="relu"),
        layers.Dense(10),        
    ]
)
print (model.summary())

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_3 (Conv2D)            (None, 30, 30, 32)        896       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 13, 13, 64)        18496     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 6, 6, 64)          0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 4, 4, 128)         73856     
_________________________________________________________________
flatten_1 (Flatten)          (None, 2048)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 64)               

In [20]:
model.compile(
    # if were to use CategoricalCrossentropy, need to do hot-encoding
    # from_logits=True... because we are not activating "softmax"
    # set from_logits to False, if we are activating "softmax" on the output layer
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(lr=3e-4),
    metrics=["accuracy"],
)

In [21]:
model.fit(x_train, y_train, batch_size=64, epochs=10, verbose=2)
model.evaluate(x_test, y_test, batch_size=64, verbose=2)

Epoch 1/10
782/782 - 23s - loss: 1.6789 - accuracy: 0.3878
Epoch 2/10
782/782 - 23s - loss: 1.3629 - accuracy: 0.5101
Epoch 3/10
782/782 - 21s - loss: 1.2467 - accuracy: 0.5578
Epoch 4/10
782/782 - 21s - loss: 1.1501 - accuracy: 0.5963
Epoch 5/10
782/782 - 23s - loss: 1.0788 - accuracy: 0.6220
Epoch 6/10
782/782 - 22s - loss: 1.0158 - accuracy: 0.6455
Epoch 7/10
782/782 - 22s - loss: 0.9636 - accuracy: 0.6655
Epoch 8/10
782/782 - 22s - loss: 0.9180 - accuracy: 0.6801
Epoch 9/10
782/782 - 21s - loss: 0.8763 - accuracy: 0.6957
Epoch 10/10
782/782 - 21s - loss: 0.8423 - accuracy: 0.7071
157/157 - 1s - loss: 0.8903 - accuracy: 0.6909


[0.8902608156204224, 0.6909000277519226]

In [22]:
# Option B: Functional API (a bit more flexible)
def my_model():
    inputs = keras.Input(shape=(32, 32, 3))
    
    # setting activation in Conv2D because we want to do the batch normalization
    # Conv2D -> BatchNormalization -> activate
    x = layers.Conv2D(32, 3)(inputs)
    x = layers.BatchNormalization()(x)
    x = keras.activations.relu(x)
    x = layers.MaxPooling2D()(x)
    
    x = layers.Conv2D(64, 3)(x)
    x = layers.BatchNormalization()(x)
    x = keras.activations.relu(x)
    x = layers.MaxPooling2D()(x)
    
    x = layers.Conv2D(128, 3)(x)
    x = layers.BatchNormalization()(x)
    x = keras.activations.relu(x)
    
    x = layers.Flatten()(x)
    x = layers.Dense(64, activation="relu")(x)
    
    outputs = layers.Dense(10)(x)
    
    model = keras.Model(inputs=inputs, outputs=outputs)
    
    return model

In [24]:
model = my_model()
model.compile(
    # if were to use CategoricalCrossentropy, need to do hot-encoding
    # from_logits=True... because we are not activating "softmax"
    # set from_logits to False, if we are activating "softmax" on the output layer
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(lr=3e-4),
    metrics=["accuracy"],
)

In [25]:
model.fit(x_train, y_train, batch_size=64, epochs=10, verbose=2)
model.evaluate(x_test, y_test, batch_size=64, verbose=2)

Epoch 1/10
782/782 - 48s - loss: 1.3384 - accuracy: 0.5233
Epoch 2/10
782/782 - 46s - loss: 0.9736 - accuracy: 0.6576
Epoch 3/10
782/782 - 48s - loss: 0.8136 - accuracy: 0.7162
Epoch 4/10
782/782 - 45s - loss: 0.7108 - accuracy: 0.7524
Epoch 5/10
782/782 - 47s - loss: 0.6274 - accuracy: 0.7821
Epoch 6/10
782/782 - 46s - loss: 0.5545 - accuracy: 0.8094
Epoch 7/10
782/782 - 44s - loss: 0.4948 - accuracy: 0.8307
Epoch 8/10
782/782 - 45s - loss: 0.4365 - accuracy: 0.8506
Epoch 9/10
782/782 - 45s - loss: 0.3831 - accuracy: 0.8699
Epoch 10/10
782/782 - 45s - loss: 0.3397 - accuracy: 0.8849
157/157 - 2s - loss: 0.9601 - accuracy: 0.7005


[0.9601001739501953, 0.7005000114440918]