<a href="https://colab.research.google.com/github/Redcoder815/Deep_Learning_TensorFlow/blob/main/15CNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import tensorflow as tf

The Cross-Correlation Operation

In [2]:
def corr2d(X, K):
    """Compute 2D cross-correlation."""
    h, w = K.shape
    Y = tf.Variable(tf.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1)))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i, j].assign(tf.reduce_sum(
                X[i: i + h, j: j + w] * K))
    return Y

In [3]:
X = tf.constant([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]])
K = tf.constant([[0.0, 1.0], [2.0, 3.0]])
corr2d(X, K)

<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[19., 25.],
       [37., 43.]], dtype=float32)>

Dynamic Weight Creation: In Keras, the build method of a tf.keras.layers.Layer subclass is a special method where you typically define and create the layer's weights (like self.weight and self.bias here). It's called automatically the first time the layer is used (i.e., when its call method is invoked) and receives the input shape(s) as an argument.

Shape Inference: This allows the layer to dynamically infer the necessary shapes for its weights based on the actual input data shape, rather than requiring you to specify them upfront during __init__. In this Conv2D example, kernel_size is passed to build to determine the shape of the weight and bias.

Separation of Concerns: It helps separate the initialization logic (__init__) from the weight creation logic (build). __init__ is for setting up static configuration that doesn't depend on input shapes, while build is for creating variables that do depend on input shapes.

Defining in __init__:

If you define self.weight and self.bias directly in __init__ without knowing the input shape, you would run into a problem. The kernel_size (or the input shape from which kernel_size is derived) is often not available when __init__ is called. The __init__ method is for setting up static configurations that don't depend on the actual input data's shape.

For example, in your Conv2D class, self.weight's shape depends on kernel_size. If kernel_size isn't passed to __init__ or cannot be determined at that point, you can't properly define self.weight.

Defining in build (as in the current code):

Shape Inference: The build method is specifically designed to be called once, the first time the layer is executed (call method is invoked), and it receives the input_shape (or in your case, kernel_size which dictates the weight shape) as an argument. This means you have the necessary information to correctly define the shapes of your weights and biases.

Dynamic Sizing: It allows the layer to be flexible. You can define a Conv2D layer without knowing the exact dimensions of the input tensor it will receive. The layer will adapt and build its weights correctly once it sees an input.

Best Practice: For Keras Layer subclasses, defining trainable variables (tf.Variable) or using self.add_weight() within the build method is the recommended practice for weights that depend on the input shape.

In summary, __init__ is for parameters that are independent of the input shape, while build is for creating parameters (like weights and biases) whose shapes depend on the input data that will flow through the layer.

Convolutional Layers

In [4]:
class Conv2D(tf.keras.layers.Layer):
    def __init__(self):
        super().__init__()

    def build(self, kernel_size):
        initializer = tf.random_normal_initializer()
        self.weight = self.add_weight(name='w', shape=kernel_size,
                                      initializer=initializer)
        self.bias = self.add_weight(name='b', shape=(1, ),
                                    initializer=initializer)

    def call(self, inputs):
        return corr2d(inputs, self.weight) + self.bias

Object Edge Detection in Images

In [5]:
X = tf.Variable(tf.ones((6, 8)))
X[:, 2:6].assign(tf.zeros(X[:, 2:6].shape))
X

<tf.Variable 'Variable:0' shape=(6, 8) dtype=float32, numpy=
array([[1., 1., 0., 0., 0., 0., 1., 1.],
       [1., 1., 0., 0., 0., 0., 1., 1.],
       [1., 1., 0., 0., 0., 0., 1., 1.],
       [1., 1., 0., 0., 0., 0., 1., 1.],
       [1., 1., 0., 0., 0., 0., 1., 1.],
       [1., 1., 0., 0., 0., 0., 1., 1.]], dtype=float32)>

In [6]:
K = tf.constant([[1.0, -1.0]])

In [7]:
Y = corr2d(X, K)
Y

<tf.Variable 'Variable:0' shape=(6, 7) dtype=float32, numpy=
array([[ 0.,  1.,  0.,  0.,  0., -1.,  0.],
       [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
       [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
       [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
       [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
       [ 0.,  1.,  0.,  0.,  0., -1.,  0.]], dtype=float32)>

In [8]:
corr2d(tf.transpose(X), K)

<tf.Variable 'Variable:0' shape=(8, 5) dtype=float32, numpy=
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]], dtype=float32)>

Learning a Kernel

In [13]:
# Construct a two-dimensional convolutional layer with 1 output channel and a
# kernel of shape (1, 2). For the sake of simplicity, we ignore the bias here
conv2d = tf.keras.layers.Conv2D(1, (1, 2), use_bias=False)

# The two-dimensional convolutional layer uses four-dimensional input and
# output in the format of (example, height, width, channel), where the batch
# size (number of examples in the batch) and the number of channels are both 1
X = tf.reshape(X, (1, 6, 8, 1))
Y = tf.reshape(Y, (1, 6, 7, 1))
lr = 3e-2  # Learning rate

Y_hat = conv2d(X)
for i in range(10):
    # By default, tf.GradientTape watches all tf.Variable objects accessed within its context.
    # We can remove 'watch_accessed_variables=False' and the explicit g.watch() call.
    with tf.GradientTape() as g:
        Y_hat = conv2d(X)
        l = (abs(Y_hat - Y)) ** 2
        # Compute gradient and update the kernel directly
        gradient = g.gradient(l, conv2d.kernel)
        conv2d.kernel.assign_sub(lr * gradient)
        if (i + 1) % 2 == 0:
            print(f'epoch {i + 1}, loss {tf.reduce_sum(l):.3f}')

epoch 2, loss 2.259
epoch 4, loss 0.767
epoch 6, loss 0.287
epoch 8, loss 0.113
epoch 10, loss 0.046


In [14]:
tf.reshape(conv2d.get_weights()[0], (1, 2))

<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[ 0.9738485, -1.0177288]], dtype=float32)>

Padding

Let's break down the line tf.reshape(X, (1, ) + X.shape + (1, )) step by step. This operation is crucial for preparing a 2D tensor (X) to be processed by a Keras Conv2D layer, which typically expects a 4D input.

X: This is your input tensor, which in the example of comp_conv2d, starts as a 2D tensor (e.g., (8, 8)).

X.shape: This attribute returns the current shape of the tensor X as a tuple. So, if X is an (8, 8) tensor, X.shape will be (8, 8).

(1, ): This creates a tuple containing a single integer 1. The comma after the 1 is important; it tells Python that (1, ) is a tuple with one element, not just an integer 1 in parentheses.

+ (Tuple Concatenation): The + operator, when used with tuples, concatenates them. Let's see how the new shape tuple is formed:

First, (1, ) + X.shape combines the initial (1, ) with X.shape. If X.shape is (8, 8), this becomes (1, ) + (8, 8) = (1, 8, 8).
Next, this intermediate tuple (1, 8, 8) is concatenated with the final (1, ): (1, 8, 8) + (1, ) = (1, 8, 8, 1).
In summary, tf.reshape(X, (1, ) + X.shape + (1, )) takes your original 2D tensor X (e.g., height x width) and reshapes it into a 4D tensor with the following dimensions:

The first 1 adds a batch dimension. This is because convolutional layers typically process data in batches, even if you're only feeding one image at a time.
X.shape (e.g., 8, 8) provides the height and width dimensions.
The final 1 adds a channel dimension. For grayscale images or single-feature maps, the number of channels is 1.
So, an (8, 8) tensor X becomes a (1, 8, 8, 1) tensor, which is the expected (batch_size, height, width, channels) format for tf.keras.layers.Conv2D.

In the comp_conv2d function, Y is the output of the conv2d layer. As we discussed earlier, Keras Conv2D layers typically output a 4D tensor with the shape (batch_size, height, width, channels).

Let's assume Y has a shape like (1, 8, 8, 1) (one example in the batch, 8x8 spatial dimensions, one channel).

Y.shape: This will be the tuple (1, 8, 8, 1).

Y.shape[1:3]: This is a Python slice operation on the Y.shape tuple. It selects elements from index 1 (inclusive) up to index 3 (exclusive).

Index 0 is 1 (batch size)
Index 1 is 8 (height)
Index 2 is 8 (width)
Index 3 is 1 (channels)
So, Y.shape[1:3] extracts (8, 8).

tf.reshape(Y, (8, 8)): This takes the 4D tensor Y and reshapes it into a 2D tensor with the dimensions (height, width).

Why is this done?

The comp_conv2d helper function's purpose is to encapsulate the process of applying a 2D convolution and then returning a 2D result. Since the input X was originally 2D, and for many visual tasks (like simply displaying the resulting feature map), you often want to go back to a 2D representation, the tf.reshape(Y, Y.shape[1:3]) step effectively "strips" away the batch dimension (at index 0) and the channel dimension (at index 3), leaving only the height and width of the feature map. This makes the output easier to interpret or use in subsequent 2D operations.

In [15]:
# We define a helper function to calculate convolutions. It initializes
# the convolutional layer weights and performs corresponding dimensionality
# elevations and reductions on the input and output
def comp_conv2d(conv2d, X):
    # (1, 1) indicates that batch size and the number of channels are both 1
    X = tf.reshape(X, (1, ) + X.shape + (1, ))
    Y = conv2d(X)
    # Strip the first two dimensions: examples and channels
    return tf.reshape(Y, Y.shape[1:3])
# 1 row and column is padded on either side, so a total of 2 rows or columns
# are added
conv2d = tf.keras.layers.Conv2D(1, kernel_size=3, padding='same')
X = tf.random.uniform(shape=(8, 8))
comp_conv2d(conv2d, X).shape

TensorShape([8, 8])

tf.keras.layers.Conv2D layer, it means that the convolutional layer will apply padding to the input such that the output feature map has the same height and width as the input feature map, assuming a stride of 1.

Here's a breakdown:

Goal: Maintain the spatial dimensions (height and width) of the input. If your input is (H, W) and your kernel is (k_h, k_w), and you use padding='same' with a stride of 1, the output will also be (H, W).

How it works: Keras (and TensorFlow) automatically calculates the amount of padding needed on all sides of the input to achieve this same output size. It adds zeros around the borders of the input image. If the required padding is an odd number, it will often add one more pixel of padding to one side (e.g., bottom or right) than the other.

In your notebook's example: You can see this in action with cells eXT3wl5yG-m7 and qTAD_hRIH-Kc:

In eXT3wl5yG-m7, the input X is (8, 8). When conv2d = tf.keras.layers.Conv2D(1, kernel_size=3, padding='same') is applied, comp_conv2d(conv2d, X).shape returns TensorShape([8, 8]). The output size matches the input size.
Similarly, in qTAD_hRIH-Kc, with kernel_size=(5, 3) and padding='same', the (8, 8) input also results in an TensorShape([8, 8]) output.
This is useful because it simplifies network design, allowing you to stack multiple convolutional layers without constantly shrinking the feature map's spatial dimensions, making it easier to maintain resolution and context through the network.

In [16]:
# We use a convolution kernel with height 5 and width 3. The padding on
# either side of the height and width are 2 and 1, respectively
conv2d = tf.keras.layers.Conv2D(1, kernel_size=(5, 3), padding='same')
comp_conv2d(conv2d, X).shape

TensorShape([8, 8])

Stride

In [17]:
conv2d = tf.keras.layers.Conv2D(1, kernel_size=3, padding='same', strides=2)
comp_conv2d(conv2d, X).shape

TensorShape([4, 4])

When you set padding='valid' in a tf.keras.layers.Conv2D layer, it means that no padding is applied to the input. The convolution operation is performed only on the 'valid' parts of the input where the kernel fully overlaps with the input data. This almost always results in an output feature map that is smaller than the input feature map.

Here's a breakdown:

Goal: Perform convolution without adding extra zeros around the borders of the input.

How it works: The convolution kernel slides across the input. If the kernel cannot fully fit within the input dimensions at a given position, that position is skipped. This means that the output feature map will be smaller than the input. The reduction in size depends on the kernel_size and strides.

Formula for output size (for padding='valid'):

Output Height = (Input Height - Kernel Height) / Stride Height + 1
Output Width = (Input Width - Kernel Width) / Stride Width + 1 (The results are floored if they are not integers).
In your notebook's example 9rZD7ZLXKFV8:

Input X has shape (8, 8).
conv2d = tf.keras.layers.Conv2D(1, kernel_size=(3, 5), padding='valid', strides=(3, 4))
kernel_size=(3, 5) means kernel height is 3, kernel width is 5.
strides=(3, 4) means stride height is 3, stride width is 4.
Let's calculate the output shape:

Height: (8 - 3) / 3 + 1 = 5 / 3 + 1 = 1.66... + 1 = 2 (after flooring).
Width: (8 - 5) / 4 + 1 = 3 / 4 + 1 = 0.75 + 1 = 1.75 = 1 (after flooring).
Indeed, comp_conv2d(conv2d, X).shape returns TensorShape([2, 1]), matching these calculations. This shows a significant reduction in spatial dimensions compared to padding='same'.

In [18]:
conv2d = tf.keras.layers.Conv2D(1, kernel_size=(3,5), padding='valid',
                                strides=(3, 4))
comp_conv2d(conv2d, X).shape

TensorShape([2, 1])

Multiple Input Channels

In [19]:
def corr2d_multi_in(X, K):
    # Iterate through the 0th dimension (channel) of K first, then add them up
    return tf.reduce_sum([corr2d(x, k) for x, k in zip(X, K)], axis=0)

In [20]:
X = tf.constant([[[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]],
               [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]])
K = tf.constant([[[0.0, 1.0], [2.0, 3.0]], [[1.0, 2.0], [3.0, 4.0]]])

corr2d_multi_in(X, K)

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[ 56.,  72.],
       [104., 120.]], dtype=float32)>

Multiple Output Channels

In [21]:
def corr2d_multi_in_out(X, K):
    # Iterate through the 0th dimension of K, and each time, perform
    # cross-correlation operations with input X. All of the results are
    # stacked together
    return tf.stack([corr2d_multi_in(X, k) for k in K], 0)

In [22]:
K = tf.stack((K, K + 1, K + 2), 0)
K.shape

TensorShape([3, 2, 2, 2])

In [23]:
corr2d_multi_in_out(X, K)

<tf.Tensor: shape=(3, 2, 2), dtype=float32, numpy=
array([[[ 56.,  72.],
        [104., 120.]],

       [[ 76., 100.],
        [148., 172.]],

       [[ 96., 128.],
        [192., 224.]]], dtype=float32)>

1x1 Convolutional Layer

In [24]:
def corr2d_multi_in_out_1x1(X, K):
    c_i, h, w = X.shape
    c_o = K.shape[0]
    X = tf.reshape(X, (c_i, h * w))
    K = tf.reshape(K, (c_o, c_i))
    # Matrix multiplication in the fully connected layer
    Y = tf.matmul(K, X)
    return tf.reshape(Y, (c_o, h, w))

In [25]:
X = tf.random.normal((3, 3, 3), 0, 1)
K = tf.random.normal((2, 3, 1, 1), 0, 1)
Y1 = corr2d_multi_in_out_1x1(X, K)
Y2 = corr2d_multi_in_out(X, K)
assert float(tf.reduce_sum(tf.abs(Y1 - Y2))) < 1e-6

Maximum Pooling and Average Pooling

In [26]:
def pool2d(X, pool_size, mode='max'):
    p_h, p_w = pool_size
    Y = tf.Variable(tf.zeros((X.shape[0] - p_h + 1, X.shape[1] - p_w +1)))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            if mode == 'max':
                Y[i, j].assign(tf.reduce_max(X[i: i + p_h, j: j + p_w]))
            elif mode =='avg':
                Y[i, j].assign(tf.reduce_mean(X[i: i + p_h, j: j + p_w]))
    return Y

In [27]:
X = tf.constant([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]])
pool2d(X, (2, 2))

<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[4., 5.],
       [7., 8.]], dtype=float32)>

In [28]:
pool2d(X, (2, 2), 'avg')

<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[2., 3.],
       [5., 6.]], dtype=float32)>

Padding and Stride

In [29]:
X = tf.reshape(tf.range(16, dtype=tf.float32), (1, 4, 4, 1))
X

<tf.Tensor: shape=(1, 4, 4, 1), dtype=float32, numpy=
array([[[[ 0.],
         [ 1.],
         [ 2.],
         [ 3.]],

        [[ 4.],
         [ 5.],
         [ 6.],
         [ 7.]],

        [[ 8.],
         [ 9.],
         [10.],
         [11.]],

        [[12.],
         [13.],
         [14.],
         [15.]]]], dtype=float32)>

In [30]:
pool2d = tf.keras.layers.MaxPool2D(pool_size=[3, 3])
# Pooling has no model parameters, hence it needs no initialization
pool2d(X)

<tf.Tensor: shape=(1, 1, 1, 1), dtype=float32, numpy=array([[[[10.]]]], dtype=float32)>

paddings = tf.constant([[0, 0], [1,0], [1,0], [0,0]]):

This line defines the padding to be applied to the X tensor. The paddings tensor has a shape of (num_dimensions, 2), where each inner list [before, after] specifies the number of padding units to add before and after that dimension.
[0, 0] for the batch dimension (no padding).
[1, 0] for the height dimension (adds 1 row of zeros at the top).
[1, 0] for the width dimension (adds 1 column of zeros on the left).
[0, 0] for the channel dimension (no padding).
X_padded = tf.pad(X, paddings, "CONSTANT"):

This applies the defined paddings to the input tensor X (which was a 4x4 tensor reshaped to (1, 4, 4, 1) from the previous cell). The "CONSTANT" mode fills the padded areas with zeros.
After this, X_padded will have a shape of (1, 5, 5, 1) (1 batch, 5 height, 5 width, 1 channel).
pool2d = tf.keras.layers.MaxPool2D(pool_size=[3, 3], padding='valid', strides=2):

This initializes a MaxPool2D layer.
pool_size=[3, 3]: The pooling window is 3x3 pixels.
padding='valid': This is crucial. It means no additional padding will be applied by the pooling layer itself. The pooling operation will only be performed where the 3x3 window completely fits within the (already padded) X_padded tensor.
strides=2: The pooling window moves 2 steps at a time across both height and width dimensions.
pool2d(X_padded):

This applies the configured max pooling layer to the X_padded tensor.
Given X_padded is (1, 5, 5, 1), a 3x3 pool with strides=2 and padding='valid' results in an output shape of (1, 2, 2, 1). The output values are the maximums found in each 3x3 window as it slides over the padded input with a stride of 2.
The output shows [[[[ 5.], [ 7.]], [[13.], [15.]]]], which corresponds to the maximum values extracted from each 3x3 pooling region within the padded input.

### Detailed Explanation of Max Pooling Output (5, 7, 13, 15)

Let's trace the `MaxPool2D` operation step-by-step to understand how the output values `5, 7, 13, 15` are derived.

First, recall the initial `X` tensor (before reshaping to `(1, 4, 4, 1)`):
```
[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]
 [12. 13. 14. 15.]]
```

When `X` is reshaped to `(1, 4, 4, 1)` and then padded with `paddings = tf.constant([[0, 0], [1,0], [1,0], [0,0]])` (adding one row of zeros at the top and one column of zeros on the left), the `X_padded` tensor (simplified to 2D for clarity, ignoring batch and channel dimensions) looks like this:

`X_padded` (5x5 matrix):
```
[[ 0.  0.  0.  0.  0.]
 [ 0.  0.  1.  2.  3.]
 [ 0.  4.  5.  6.  7.]
 [ 0.  8.  9. 10. 11.]
 [ 0. 12. 13. 14. 15.]]
```

Now, we apply `MaxPool2D(pool_size=[3, 3], padding='valid', strides=2)`:

1.  **First Output Value: 5**
    *   The 3x3 pooling window starts at `(0, 0)` of `X_padded`.
    *   It covers this region:
        ```
        [[ 0.  0.  0.]
         [ 0.  0.  1.]
         [ 0.  4.  5.]]
        ```
    *   The maximum value in this window is `5`.

2.  **Second Output Value: 7**
    *   The window moves 2 steps to the right (due to `strides=2`), starting at `(0, 2)` of `X_padded`.
    *   It covers this region:
        ```
        [[ 0.  0.  0.]
         [ 1.  2.  3.]
         [ 5.  6.  7.]]
        ```
    *   The maximum value here is `7`.

3.  **Third Output Value: 13**
    *   The window moves 2 steps down from its previous position (relative to the start of the `X_padded`), starting at `(2, 0)` of `X_padded`.
    *   It covers this region:
        ```
        [[ 0.  4.  5.]
         [ 0.  8.  9.]
         [ 0. 12. 13.]]
        ```
    *   The maximum value is `13`.

4.  **Fourth Output Value: 15**
    *   The window moves 2 steps to the right from the previous position, starting at `(2, 2)` of `X_padded`.
    *   It covers this region:
        ```
        [[ 5.  6.  7.]
         [ 9. 10. 11.]
         [13. 14. 15.]]
        ```
    *   The maximum value is `15`.

These four maximum values form the `(1, 2, 2, 1)` output tensor.

In [32]:
X = tf.reshape(tf.range(16, dtype=tf.float32), (1, 4, 4, 1))
paddings = tf.constant([[0, 0], [1,0], [1,0], [0,0]])
X_padded = tf.pad(X, paddings, "CONSTANT")
pool2d = tf.keras.layers.MaxPool2D(pool_size=[3, 3], padding='valid',
                                   strides=2)
pool2d(X_padded)

<tf.Tensor: shape=(1, 2, 2, 1), dtype=float32, numpy=
array([[[[ 5.],
         [ 7.]],

        [[13.],
         [15.]]]], dtype=float32)>

In [33]:
paddings = tf.constant([[0, 0], [0, 0], [1, 1], [0, 0]])
X_padded = tf.pad(X, paddings, "CONSTANT")

pool2d = tf.keras.layers.MaxPool2D(pool_size=[2, 3], padding='valid',
                                   strides=(2, 3))
pool2d(X_padded)

<tf.Tensor: shape=(1, 2, 2, 1), dtype=float32, numpy=
array([[[[ 5.],
         [ 7.]],

        [[13.],
         [15.]]]], dtype=float32)>

Multiple Channels

In [34]:
# Concatenate along `dim=3` due to channels-last syntax
X = tf.concat([X, X + 1], 3)
X

<tf.Tensor: shape=(1, 4, 4, 2), dtype=float32, numpy=
array([[[[ 0.,  1.],
         [ 1.,  2.],
         [ 2.,  3.],
         [ 3.,  4.]],

        [[ 4.,  5.],
         [ 5.,  6.],
         [ 6.,  7.],
         [ 7.,  8.]],

        [[ 8.,  9.],
         [ 9., 10.],
         [10., 11.],
         [11., 12.]],

        [[12., 13.],
         [13., 14.],
         [14., 15.],
         [15., 16.]]]], dtype=float32)>

In [35]:
paddings = tf.constant([[0, 0], [1,0], [1,0], [0,0]])
X_padded = tf.pad(X, paddings, "CONSTANT")
pool2d = tf.keras.layers.MaxPool2D(pool_size=[3, 3], padding='valid',
                                   strides=2)
pool2d(X_padded)

<tf.Tensor: shape=(1, 2, 2, 2), dtype=float32, numpy=
array([[[[ 5.,  6.],
         [ 7.,  8.]],

        [[13., 14.],
         [15., 16.]]]], dtype=float32)>

The layer_summary method is a utility function used to inspect the output shapes of each layer within the neural network model. It's particularly useful for:

Debugging Network Architecture: It helps to verify that the dimensions of your tensors are changing as expected after each convolutional, pooling, or dense layer. Incorrect dimensions can lead to errors later in the model.
Understanding Tensor Flow: By printing the output shape after each layer, you can easily see how the spatial dimensions (height and width) are reduced by pooling layers or convolutional layers with strides, and how the number of channels changes.
Determining Flattening Size: Before a Flatten() layer, layer_summary shows the exact 3D shape (batch_size, height, width, channels) which helps in understanding what size the tensor will be flattened into, which is critical for setting up the subsequent Dense layers correctly.

In [66]:
class LeNet(tf.keras.Model):
    """The LeNet-5 model."""
    def __init__(self, lr=0.1, num_classes=10):
        super().__init__()
        self.net = tf.keras.models.Sequential([
            tf.keras.layers.Conv2D(filters=6, kernel_size=5,
                                   activation='relu', padding='same'),
            tf.keras.layers.AvgPool2D(pool_size=2, strides=2),
            tf.keras.layers.Conv2D(filters=16, kernel_size=5,
                                   activation='relu'),
            tf.keras.layers.AvgPool2D(pool_size=2, strides=2),
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(120, activation='relu'),
            tf.keras.layers.Dense(84, activation='relu'),
            tf.keras.layers.Dense(num_classes)])

    def call(self, X):
      return self.net(X)

    def layer_summary(self, X_shape):
        X = tf.random.normal(X_shape)
        for layer in self.net.layers:
            X = layer(X)
            print(layer.__class__.__name__, 'output shape:\t', X.shape)


In [67]:
model = LeNet()
model.layer_summary((1, 28, 28, 1))

Conv2D output shape:	 (1, 28, 28, 6)
AveragePooling2D output shape:	 (1, 14, 14, 6)
Conv2D output shape:	 (1, 10, 10, 16)
AveragePooling2D output shape:	 (1, 5, 5, 16)
Flatten output shape:	 (1, 400)
Dense output shape:	 (1, 120)
Dense output shape:	 (1, 84)
Dense output shape:	 (1, 10)


In [68]:
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()

# Preprocess data
# Reshape images to (num_samples, height, width, channels) and normalize to [0, 1]
# Fashion MNIST images are 28x28 and grayscale, so channels = 1
X_train = tf.cast(tf.reshape(X_train, (-1, 28, 28, 1)), tf.float32) / 255.0
X_test = tf.cast(tf.reshape(X_test, (-1, 28, 28, 1)), tf.float32) / 255.0

# Cast labels to int64 as expected by some TF operations
y_train = tf.cast(y_train, tf.int64)
y_test = tf.cast(y_test, tf.int64)

# Hyperparameters (from original notebook context)
batch_size = 256
lr = 0.1
num_epochs = 10

# Create tf.data.Dataset objects for efficient data pipeline
train_dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train)).shuffle(buffer_size=1024).batch(batch_size)
val_dataset = tf.data.Dataset.from_tensor_slices((X_test, y_test)).batch(batch_size)

In [69]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.SGD(learning_rate=lr)

In [70]:
model.compile(optimizer=optimizer, loss=loss_fn, metrics=['accuracy'])

In [71]:
history = model.fit(train_dataset, epochs=num_epochs, validation_data=val_dataset)

Epoch 1/10
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 126ms/step - accuracy: 0.4819 - loss: 1.4285 - val_accuracy: 0.7197 - val_loss: 0.7756
Epoch 2/10
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 127ms/step - accuracy: 0.7674 - loss: 0.6175 - val_accuracy: 0.7461 - val_loss: 0.6810
Epoch 3/10
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 124ms/step - accuracy: 0.8072 - loss: 0.5142 - val_accuracy: 0.8107 - val_loss: 0.5046
Epoch 4/10
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 124ms/step - accuracy: 0.8290 - loss: 0.4583 - val_accuracy: 0.8201 - val_loss: 0.4860
Epoch 5/10
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 128ms/step - accuracy: 0.8414 - loss: 0.4247 - val_accuracy: 0.8237 - val_loss: 0.4731
Epoch 6/10
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 125ms/step - accuracy: 0.8507 - loss: 0.3977 - val_accuracy: 0.8251 - val_loss: 0.4607
Epoch 7/10

In [72]:
val_loss, val_acc = model.evaluate(val_dataset, verbose=0)
print(f"Final validation accuracy: {val_acc:.4f}")

Final validation accuracy: 0.8584
