Layers of a CNN

**Input Layer**

Shape = [batch_size, image_width, image_height, channels]

batch_size - random sample from the original training set thats used during applying stochastic gradient descent.
channels - number of color channels of the input images. This number could be 3 for RGB images or 1 for binary images.

If the dataset is composed of monochrome 28x28 pixel images, then the desired shape for our input layer would be

In [None]:
[batch_size, 28, 28, 1]

To change the shape of the input layer

In [None]:
input_layer = tf.reshape(features["x"], [-1, 28, 28, 1])

# The batch size is denoted as -1, which means it will be determined dynamically based on the input data. 
# This allows us to fine-tune the CNN model by trying varying batch sizes during training or inference.

**2. Convolutional Step**

The main purposeof these convolutional steps is to extract fetaures from the input images then feed them to a linear classifier. The whole idea of stacking convolutional steps is to be able to detect features anywhere in the image.

If we wanted to apply 20 filters each of size 5x5 to the input layer with a ReLUactivation function

In [None]:
conv_layer_1 = tf.layers.conv2d(
  inputs=input_layer,
  filters=20,
  kernel_size=[5, 5],
  padding="same",
  activation=tf.nn.relu,)

inputs - represents the input layer defined int he first step
filters - specifies the number of filters to be applied to the input image. The higher the number of filters, the more features are extracted from the input image.
kernel_size - represents the size of the filter/feature detector
padding - we use 'same' here to introduce zero padding to the corner pixels of the input image
activation - specifies the fuction to be used for the output of the convolutional opertation

Introducing Non-Linearity

We talked about feeding the output of the convolution step to an activation function, in this case, ReLU.
The ReLU activation function replaces all negative pixel values with zero, this is done to introduce non linearity in the output image , as the data we are using is usually non-linear.

Without activation functions, a CNN (or any neural network) would only be able to learn linear relationships between inputs and outputs, no matter how many layers it has. Real-world data, however—like images, audio, and text—often involve complex, non-linear patterns. Activation functions allow the network to model these complex relationships.

**The Pooling Step**

This step is mainly for reducing dimensionality by reducung the size of the feature map (the result map from the covolutional step) while keeping the important information in the newly reduced version.

In [None]:
# We can connect the output of the first convolutional layer to the pooling layer by using the following code:
pooling_layer_1 = tf.layers.max_pooling2d(
  inputs=conv_layer_1,
  pool_size=[2, 2],
  strides=2)
# The pooling layer reduces the spatial dimensions of the input, which helps to reduce the number of parameters and computation in the network.

# The pooling layer receives the input from the convolutional step with the following shape:
[batch_size, image_height, image_width, channels]

# that is.........

[batch_size, 28, 28, 20]

The output of the pooling operation will have the following shape

In [None]:
[batch_size, 14, 14, 20]
# The output shape of the pooling layer is determined by the input shape, the pool size, and the stride.

In this example, we have reduced the xsize of the output of the convolution step by 50%. This step is very useful because it keeps only the important information and 
it also reduces the models complexity and hence avoids overfitting.

**FULLY CONNECTED LAYER**

After stacking up a bunch of convolution and pooling steps, we follow them with a fully conncted layer where we feed the extracted high level features that we got from the input image to this fully connceted layer to use them nd do the actual classification based on these features.

For example, in the case of the digit classification task, we can follow the convolution and ppoling step with a fully connected layer that has 1024 neurons and ReLU activation to perform the actual classification. This fully connected layer accepts the input in the following format:

In [None]:
[batch_size, features]

So we need to flatten or reshape our input feature map from pool_layer2to match this format. We use the following line of code

In [None]:
pool1_flat = tf.reshape(pooling_layer_1, [-1, 14 * 14 * 20])
# The flattening step converts the 2D feature maps into a 1D vector, which can be fed into the fully connected layers.

In this reshape function, we have used -1 to indicate that the batch size will be dynamically determined and each example from the pooling_layer_1 output will have a width of 14, and a height of 14 with 20 channels each.

So the final output of the reshape operation will be 

In [None]:
[batch_size, 3136]

Then finally, we use the dense() function of tensorflow to define our fully connected layer with the required number of neurons (units) and the final activation function

In [None]:
dense_layer_1 = tf.layers.dense(
  inputs=pool1_flat,
  units=1024,
  activation=tf.nn.relu)