**Question 1 - Computing CNN Memory Usage**

Let us assume you have built a CNN with following details
It consists of three convolutional layers, each with 3 x 3 kernels,
a stride of 2, and SAME padding.

o The first layer outputs 100 feature maps

o The second layer outputs 200 feature maps

o The third layer outputs 400 feature maps
The input images are RGB images of 720 x 1280 pixels.

Questions

*What is the total number of parameters in CNN?*


(3 * 3 * 3) * 1 = 27(weights) + 1 (bias) = 28 with 1 input feature

first layer: (3 * 3 * 3) * 100 = 2700(weights) + 100(bias per weight) = 2800

second layer: (3 * 3 * 100) * 200 = 180 000(weights) + 200(bias per weight) = 180 200

third layer: (3 * 3 * 200) * 400 = 720 000(weights) + 400(bias per weight) =
720 400

Total number of parameters: 2800 + 180 200 + 720 400 = 903 400
--------------------------------------------------------------

*What is the minimum total RAM needed, if parameters are stored in 32-bit floats.
Assume making predictions for a single image.*

**This is the minimum, counting only model parameters for inference on one image:**

1 parameter = 32 bits = 4 bytes = 903 400 * 4 = 3 613 600 bytes ~ 3.45 MB
---------------------------------------
*How does the answer to question 2 change if everything is stored in 8-bit floats.
Assume making predictions for a single image.*

1 parameter = 1 byte = 903 400 * 1  = 903 400 bytes ~ 0.86 MB
--------------------------------------
*How does the answer to question 2 change when training on a mini-batch of 20
images?*

Parameters: 3.45MB

Gradients: +3.45MB

Activations: scale with batch size --> much larger

So RAM increases significantly, roughly several times the inference RAM.

Parameters alone do not change, but training memory grows because of gradients and activations.


In [None]:
#import needed libraries
import numpy as np
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
from tensorflow.keras import datasets, layers, models

In [None]:
# load the dataset
fashion_mnist = keras.datasets.fashion_mnist
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
[1m29515/29515[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
[1m26421880/26421880[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
[1m5148/5148[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
[1m4422102/4422102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 0us/step


In [None]:
print("x_tarin:", x_train.shape)
print("x_test:", x_test.shape)

x_tarin: (60000, 28, 28)
x_test: (10000, 28, 28)


In [None]:
# normalize the dataset
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

In [None]:
# reshape the dataset
x_train = x_train.reshape(-1, 28, 28, 1)
x_test  = x_test.reshape(-1, 28, 28, 1)

In [None]:
# for plot learning curves
def plot_learning_curve(history, title):
    plt.figure()
    plt.plot(history.history['loss'], label='Training loss')
    plt.plot(history.history['val_loss'], label='Validation loss')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.title(title)
    plt.legend()
    plt.show()

**Data Augmentation** is a technique of artificially increasing the training set by creating modified copies of a dataset using existing data.

In [None]:
# data augmentation technique
data_augmentation = tf.keras.Sequential([
    layers.RandomFlip("horizontal"),
    layers.RandomRotation(0.1),
])

**Learning Rate Scheduler** - for automatically adjust learning rate

In [None]:
# define learning rate scheduler
lr_scheduler = tf.keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss',  # reduce LR when validation loss stops improving
    factor=0.5,          # reduce LR by half
    patience=3,          # wait 3 epochs before reducing
    min_lr=1e-6,
    verbose=1
)

In [None]:
model = models.Sequential()

# data augmentation
model.add(data_augmentation)

# convolution layer
model.add(layers.Conv2D(64, (3, 3), padding="same", activation="relu", input_shape=(28, 28, 1)))
# batch normalization
model.add(layers.BatchNormalization())

# convolution layer
model.add(layers.Conv2D(64, (3, 3), padding="same", activation="relu", input_shape=(28, 28, 1)))
# batch normalization
model.add(layers.BatchNormalization())

# max pooling layer - for reducing image size
model.add(layers.MaxPooling2D((2, 2)))

# dropout - regualization technique to prevent overfitting
model.add(layers.Dropout(0.25))

# convert to one-dimensional arrays
model.add(layers.Flatten())

# dense layer or fully connected layer
model.add(layers.Dense(128, activation="relu"))
# batch normalization
model.add(layers.BatchNormalization())
# dropout - regualization technique to prevent overfitting
model.add(layers.Dropout(0.50))

# output layer
model.add(layers.Dense(10, activation="softmax"))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [None]:
# settings how the model should learn
model.compile(
    # The optimizer controls how the network updates its weights
    optimizer = tf.keras.optimizers.Adam(),
    # Loss function measures how far the model’s predictions are from the true labels
    loss = 'sparse_categorical_crossentropy',
    # Metrics are used to monitor the model’s performance.
    metrics = ['accuracy']
)

In [None]:
# train the model
history = model.fit(
    x_train, y_train,
    epochs=5,
    batch_size=32,
    validation_split=0.2,
    callbacks = [lr_scheduler]
)

Epoch 1/5
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m332s[0m 219ms/step - accuracy: 0.7045 - loss: 0.8830 - val_accuracy: 0.8288 - val_loss: 0.4628 - learning_rate: 0.0010
Epoch 2/5
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m383s[0m 219ms/step - accuracy: 0.8178 - loss: 0.5092 - val_accuracy: 0.8582 - val_loss: 0.3920 - learning_rate: 0.0010
Epoch 3/5


Plot the model loss diagram

In [None]:
plot_learning_curve(history, "CNN Model")

Training loss approximately is 0.36 or less.

Validation loss approximately is 0.30.

Validation loss showed less loss than Training loss in different 6%.

Evaluate the model accuracy and loss using test set

In [None]:
model_accuracy = model.evaluate(x_test, y_test)
print("Test loss:", model_accuracy[0])
print("Test accuracy:", model_accuracy[1])

Conclusion:

In this lab 6, we implemented CNN model for classify **FASHION MNIST Dataset** that 10 classes of images with diffrenet layers as convolution, max-poolling, and dense. For optimizing the trainnig process of model, we added adam optimizer, and learning rate scheduler for adjustiong the learning rate during the training process of the model. In addition, we used data augmentation for increasing the accuracy of the model in different shape of the image. The model shows high accuracy and less loss: 88% and 32%.