# Convolutional Neural Network with Sequential and Functional API

## 1. Imports and Configurations

#### Notice how in *neuralnetwork.ipynb* when inspecting GPU by nvtop or nvidia-smi, the GPU is not fully utilized. This is because TensorFlow is not configured to use the GPU efficiently.

- You need to configure tensorflow to use the GPU more efficiently. 
- You need to do this before initializing TensorFlow sessions, as it can't be changed after that in the middle of the program.
- Use `tf.config.experimental.set_memory_growth` to enable dynamic memory allocation for the GPU. Configuring this will Memory growth setting tells TensorFlow to allocate GPU memory incrementally as needed, rather than allocating the full memory of the GPU upfront. This can be helpful in avoiding the TensorFlow process occupying all of the GPU memory, which might be needed/in use by other processes or applications.

In [1]:
import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

"""
This code is used to set up TensorFlow to work with the first available GPU on the system 
and to configure it to use memory more efficiently.
"""
# List all the GPUs available, if any. If you have only CPU, this will return an empty list.
physical_devices = tf.config.experimental.list_physical_devices('GPU')
# If you have more than one GPU, you can select which one you want to use.
assert len(physical_devices) > 0, "Not enough GPU hardware devices available"
# This line enables memory growth for the first GPU device.
config = tf.config.experimental.set_memory_growth(physical_devices[0], True)


## 2. Load Data
Import CIFAR-10 dataset from Keras. CIFAR-10 is a dataset of 50,000 32x32 color training images, labeled over 10 categories, and 10,000 test images.

Notice that as we are using a convolutional neural network, we do not need to flatten the images.

In [2]:
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import get_file

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Get the location of the dataset
cifar10_dataset_path = get_file(
    fname="cifar-10-batches-py",
    origin='https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz',
    untar=True
)

print("CIFAR-10 dataset is located at:", cifar10_dataset_path)

CIFAR-10 dataset is located at: /home/jay/.keras/datasets/cifar-10-batches-py


## 3. Define Model

- `keras.Input(shape=(32, 32, 3))` creates a tensor of shape (32, 32, 3) which is the shape of CIFAR-10 images. 32x32 is the size of the image, and 3 is the number of channels (RGB).

- `layerslayers.Conv2D(32, 3, padding="valid", activation="relu")` creates a convolutional layer with 32 filters of size 3x3 kernel size. instead of 3 we can also use (3, 3). padding="valid" means that the output of the convolutional layer will be smaller than the input. padding="same" means that the output of the convolutional layer will be the same size as the input. activation="relu" means that the activation function of the convolutional layer will be ReLU which stands for Rectified Linear Unit and is defined as f(x)=max(0,x).

- `layers.MaxPooling2D(pool_size=(2, 2))` creates a pooling layer with a 2x2 pooling window, so half the size of the input in each dimension. This is used to reduce the size of the input and to make the model more robust to changes in the position of the features in the input image.

In [3]:
# Sequential API (Very convenient, not very flexible)
model = keras.Sequential(
    [
        keras.Input(shape=(32, 32, 3)),
        layers.Conv2D(32, 3, padding="valid", activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, 3, activation="relu"),
        layers.MaxPooling2D(),
        layers.Conv2D(128, 3, activation="relu"),
        layers.Flatten(),
        layers.Dense(64, activation="relu"),
        layers.Dense(10),
    ]
)

# On printing model.summary(), notice that this model has 225K parameters which is pretty small. 
# The first CNN that revolutionized computer vision, AlexNet, had 60M parameters.
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 30, 30, 32)        896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 13, 13, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 4, 4, 128)         73856     
_________________________________________________________________
flatten (Flatten)            (None, 2048)              0         
_________________________________________________________________
dense (Dense)                (None, 64)                1

Using Functional API to create the model and using Batch Normalization. 

- `keras.Model(inputs=inputs, outputs=outputs)` creates a model with the given inputs and outputs. This is the functional API of Keras. The functional API is used to create models that are more complex than the sequential API allows, such as models with multiple inputs, multiple outputs, shared layers, and residual connections.

- `keras.layers.Flatten()` flattens the input. This is used to convert the input from a 2D image to a 1D vector. This is needed as the first layer of the model needs to be a 1D layer.

#### Batch Normalization
- It involves normalizing the inputs of each layer in such a way that they have a mean of zero and a standard deviation of one. This is similar to how input data is often normalized, but batch normalization does this for the inputs to each layer within the model. This has the effect of **stabilizing the learning process and dramatically reducing the number of training epochs required to train deep networks.**
- Batch Normalization is used to normalize the input layer by adjusting and scaling the activations. This is done to speed up the training process and to reduce the sensitivity to network initialization. Batch Normalization is **used after the convolutional layer and before the activation function.**

In [4]:
def my_model():
    inputs = keras.Input(shape=(32, 32, 3))
    x = layers.Conv2D(32, 3)(inputs)
    x = layers.BatchNormalization()(x)
    x = keras.activations.relu(x)
    x = layers.MaxPooling2D()(x)
    x = layers.Conv2D(64, 3)(x)
    x = layers.BatchNormalization()(x)
    x = keras.activations.relu(x)
    x = layers.MaxPooling2D()(x)
    x = layers.Conv2D(128, 3)(x)
    x = layers.BatchNormalization()(x)
    x = keras.activations.relu(x)
    x = layers.Flatten()(x)
    x = layers.Dense(64, activation="relu")(x)
    outputs = layers.Dense(10)(x)
    model = keras.Model(inputs=inputs, outputs=outputs)
    return model

model = my_model()

## 4. Compile Model

In [5]:
model.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(lr=3e-4),
    metrics=["accuracy"],
)

## 5. Train and Evaluate Model

- When we use the model defined by the functional API, we notice faster training and better accuracy on training due to the use of Batch Normalization.
- But, due to overfitting our Test accuracy is lower than the Sequential model.

In [6]:
print("Training model...")
model.fit(x_train, y_train, batch_size=64, epochs=10, verbose=2)

print("\nEvaluating model...")
results = model.evaluate(x_test, y_test, batch_size=64, verbose=0)
print(f"Test loss: {results[0]:.4f}")
print(f"Test accuracy: {results[1]:.4f}")

Training model...


Epoch 1/10
782/782 - 24s - loss: 1.3364 - accuracy: 0.5221
Epoch 2/10
782/782 - 4s - loss: 0.9492 - accuracy: 0.6687
Epoch 3/10
782/782 - 4s - loss: 0.7976 - accuracy: 0.7232
Epoch 4/10
782/782 - 4s - loss: 0.7003 - accuracy: 0.7564
Epoch 5/10
782/782 - 4s - loss: 0.6159 - accuracy: 0.7875
Epoch 6/10
782/782 - 4s - loss: 0.5489 - accuracy: 0.8097
Epoch 7/10
782/782 - 4s - loss: 0.4917 - accuracy: 0.8311
Epoch 8/10
782/782 - 4s - loss: 0.4320 - accuracy: 0.8543
Epoch 9/10
782/782 - 4s - loss: 0.3853 - accuracy: 0.8689
Epoch 10/10
782/782 - 4s - loss: 0.3358 - accuracy: 0.8870

Evaluating model...
Test loss: 1.0047
Test accuracy: 0.7000
