In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
sns.set_style('darkgrid')
import matplotlib.pyplot as plt
%matplotlib inline
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import warnings
warnings.filterwarnings('ignore')
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

In [3]:
from keras.datasets import cifar10, mnist

In [4]:
# (X_train, y_train), (X_test, y_test) = cifar10.load_data()
(X_train, y_train), (X_test, y_test) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


In [5]:
X_train, X_val, y_train, y_test = train_test_split(X_train, y_train, test_size=10000, random_state=42)

In [6]:
X_train.shape, X_val.shape, X_test.shape

((50000, 28, 28), (10000, 28, 28), (10000, 28, 28))

In [7]:
y_train.shape

(50000,)

In [8]:
class_names = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [9]:
total_classes = len(np.unique(y_train))
total_classes

10

In [10]:
tf.keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

In [11]:
28*28

784

Hight, Widh and Channel Required for CNN model (not Required in sequential model)

In [12]:
X_train.shape[1:]+(1,)

(28, 28, 1)

In [13]:
from keras.src.layers import Activation
inputs = keras.layers.Input(shape=X_train.shape[1:]+(1,))
# Hidden Layer 1
x = keras.layers.Conv2D(10, (5,5), activation='relu')(inputs)
x = keras.layers.MaxPooling2D((2,2))(x)
# Hiddin Layer 2
x = keras.layers.Conv2D(20, (5,5), activation='relu')(x)
x = keras.layers.MaxPooling2D((2,2))(x)

x = keras.layers.Flatten()(x)
x = keras.layers.Dense(100, activation='relu')(x)

outputs = keras.layers.Dense(total_classes, activation='softmax')(x)

In [14]:
model_mnist = keras.models.Model(inputs=inputs, outputs=outputs)

In [15]:
model_mnist.summary()

Certainly! The model summary provides details about each layer, including the output shape and the number of parameters. Here's a breakdown of how those are calculated for your model:

Output Shape:

Input Layer: The output shape matches the input shape you provided, which is (None, 28, 28, 1).
The None indicates the batch size, which can vary.

Conv2D Layers: The output shape of a Conv2D layer is determined by the input shape, the filter size, and any padding.

For the first Conv2D layer with 10 filters of size (5,5), the output shape is (None, 24, 24, 10).

The spatial dimensions (24x24) are reduced from the input (28x28) due to the filter size.

The last dimension (10) is the number of filters.

The second Conv2D layer with 20 filters of size (5,5) operates on the output of the first pooling layer (None, 12, 12, 10). The output shape is (None, 8, 8, 20).

MaxPooling2D Layers: MaxPooling2D layers reduce the spatial dimensions of the input.

The first MaxPooling2D layer with a pool size of (2,2) reduces the 24x24 input to 12x12. The output shape is (None, 12, 12, 10).

The second MaxPooling2D layer with a pool size of (2,2) reduces the 8x8 input to 4x4. The output shape is (None, 4, 4, 20).

Flatten Layer: The Flatten layer reshapes the input into a 1D array. The output shape is (None, 320) because 4 * 4 * 20 = 320.

Dense Layers: The output shape of a Dense layer is (None, units), where units is the number of neurons in the layer.

The first Dense layer has 100 units, so the output shape is (None, 100).
The final Dense layer has total_classes (10) units, so the output shape is (None, 10).

Parameters:

Conv2D Layers: The number of parameters in a Conv2D layer is calculated as (filter_height * filter_width * input_channels + 1) * number_of_filters. The '+ 1' is for the bias term.
First Conv2D: (5 * 5 * 1 + 1) * 10 = 26 * 10 = 260
Second Conv2D: (5 * 5 * 10 + 1) * 20 = 251 * 20 = 5020
MaxPooling2D Layers: MaxPooling2D layers have no trainable parameters.
Flatten Layer: The Flatten layer has no trainable parameters.
Dense Layers: The number of parameters in a Dense layer is calculated as (input_units + 1) * output_units. The '+ 1' is for the bias term.
First Dense: (320 + 1) * 100 = 321 * 100 = 32100
Final Dense: (100 + 1) * 10 = 101 * 10 = 1010