In [3]:
!pip install tensorflow

Defaulting to user installation because normal site-packages is not writeable
Collecting tensorflow
  Downloading tensorflow-2.20.0-cp312-cp312-win_amd64.whl.metadata (4.6 kB)
Collecting absl-py>=1.0.0 (from tensorflow)
  Downloading absl_py-2.3.1-py3-none-any.whl.metadata (3.3 kB)
Collecting astunparse>=1.6.0 (from tensorflow)
  Downloading astunparse-1.6.3-py2.py3-none-any.whl.metadata (4.4 kB)
Collecting flatbuffers>=24.3.25 (from tensorflow)
  Downloading flatbuffers-25.9.23-py2.py3-none-any.whl.metadata (875 bytes)
Collecting gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 (from tensorflow)
  Downloading gast-0.7.0-py3-none-any.whl.metadata (1.5 kB)
Collecting google_pasta>=0.1.1 (from tensorflow)
  Downloading google_pasta-0.2.0-py3-none-any.whl.metadata (814 bytes)
Collecting libclang>=13.0.0 (from tensorflow)
  Downloading libclang-18.1.1-py2.py3-none-win_amd64.whl.metadata (5.3 kB)
Collecting opt_einsum>=2.3.2 (from tensorflow)
  Downloading opt_einsum-3.4.0-py3-none-any.whl.metadata (6.3

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
streamlit 1.37.1 requires protobuf<6,>=3.20, but you have protobuf 6.33.1 which is incompatible.


TOPIC 1: Understanding Pooling and Padding in CNN
1. Describe the purpose and benefits of pooling in CNN.
Explanation: Pooling (also known as subsampling) is a process used to reduce the spatial dimensions (width and height) of the feature maps produced by convolutional layers.
Purpose: To progressively reduce the amount of parameters and computation in the network.
Benefits:
Dimensionality Reduction: It makes the model lighter and faster to train.
Translation Invariance: It helps the model recognize features (like a cat's ear) regardless of exactly where they are located in the image.
Prevents Overfitting: By removing precise spatial information, the model focuses on the existence of features rather than their exact location.
2. Explain the difference between min pooling and max pooling.
Explanation:
Max Pooling: It selects the maximum value from the covered region (e.g., a 2x2 grid). It is the most common type because it captures the most prominent features (edges, textures).
Min Pooling: It selects the minimum value from the region. It is rarely used but can be useful for selecting the darkest pixels in an image or suppressing bright noise.
3. Discuss the concept of padding in CNN and its significance.
Explanation: Padding involves adding extra pixels (usually with a value of 0) around the border of an input image or feature map before applying a convolution operation.
Significance: Without padding, the image size shrinks with every convolutional layer. Eventually, the image would become too small (1x1). Padding allows us to build deeper networks by keeping the spatial dimensions constant. It also prevents the loss of information at the very edges of the image.
4. Compare and contrast zero-padding and valid-padding.
Explanation:
Valid Padding (No Padding): No pixels are added. The filter only visits valid positions inside the original image.
Effect: The output feature map is smaller than the input.
Zero Padding (Same Padding): Rows and columns of zeros are added around the image borders.
Effect: The output feature map usually remains the same size as the input (if stride is 1). This preserves the spatial resolution.

TOPIC 2: Exploring LeNet
1. Provide a brief overview of LeNet-5 architecture.
Explanation: LeNet-5 is one of the earliest Convolutional Neural Networks, proposed by Yann LeCun in 1998. It was designed to recognize handwritten digits (specifically for the MNIST dataset). It is a relatively small network consisting of 7 layers (excluding the input).
2. Describe the key components of LeNet-5 and their respective purposes.
Explanation:
Convolutional Layers (C1, C3, C5): To extract feature patterns from the input images using learnable filters.
Sub-sampling (Average Pooling) Layers (S2, S4): To reduce the size of the feature maps. Note that LeNet originally used Average Pooling, not Max Pooling.
Activation Function: LeNet used Tanh or Sigmoid activation functions (modern nets use ReLU).
Fully Connected Layers (F6): To combine features for the final classification.
Output Layer: A radial basis function (modern implementations use Softmax) to classify digits 0–9.
3. Discuss the advantages and limitations of LeNet-5.
Explanation:
Advantages: It established the foundation for modern CNNs (Conv -> Pool -> FC structure). It is very efficient for simple tasks like digit recognition.
Limitations: It struggles with complex, high-resolution color images. Because it used Sigmoid/Tanh activations, it suffered from the "vanishing gradient" problem, making it hard to train if the network were made deeper.

4. Implement LeNet-5 

In [3]:
import tensorflow as tf
from tensorflow.keras import layers, models

def LeNet5():
    model = models.Sequential()
    # Layer 1: Conv2D (6 filters, 5x5 kernel, Tanh activation)
    model.add(layers.Conv2D(6, kernel_size=(5, 5), strides=(1, 1), activation='tanh', input_shape=(32, 32, 1), padding="same"))
    # Layer 2: Average Pooling
    model.add(layers.AveragePooling2D(pool_size=(2, 2), strides=(2, 2)))
    # Layer 3: Conv2D (16 filters)
    model.add(layers.Conv2D(16, kernel_size=(5, 5), strides=(1, 1), activation='tanh', padding='valid'))
    # Layer 4: Average Pooling
    model.add(layers.AveragePooling2D(pool_size=(2, 2), strides=(2, 2)))
    # Flatten for FC layers
    model.add(layers.Flatten())
    # Layer 5: Fully Connected (120 nodes)
    model.add(layers.Dense(120, activation='tanh'))
    # Layer 6: Fully Connected (84 nodes)
    model.add(layers.Dense(84, activation='tanh'))
    # Output Layer (10 digits)
    model.add(layers.Dense(10, activation='softmax'))
    return model

In [5]:
import tensorflow as tf
import numpy as np

# 1. Create the Model
model = LeNet5()

# Check if the model structure is correct
model.summary()

# 2. Compile the Model (Define how it learns)
model.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy', 
              metrics=['accuracy'])

# 3. Load the MNIST Dataset
print("Loading data...")
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# 4. Preprocessing
# Normalize pixel values to be between 0 and 1
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Add a channel dimension (MNIST is grayscale, so we need 1 channel)
# Shape changes from (60000, 28, 28) to (60000, 28, 28, 1)
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)

# Resize images from 28x28 to 32x32 (standard LeNet input size)
x_train = tf.image.resize(x_train, [32, 32])
x_test = tf.image.resize(x_test, [32, 32])

# 5. Train the Model
print("Starting training...")
history = model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

print("Training finished!")

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Loading data...
Starting training...
Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 8ms/step - accuracy: 0.9344 - loss: 0.2198 - val_accuracy: 0.9674 - val_loss: 0.1037
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 8ms/step - accuracy: 0.9731 - loss: 0.0876 - val_accuracy: 0.9782 - val_loss: 0.0714
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 8ms/step - accuracy: 0.9810 - loss: 0.0614 - val_accuracy: 0.9799 - val_loss: 0.0636
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 8ms/step - accuracy: 0.9846 - loss: 0.0491 - val_accuracy: 0.9829 - val_loss: 0.0526
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 8ms/step - accuracy: 0.9883 - loss: 0.0380 - val_accuracy: 0.9813 - val_loss: 0.0583
Training finished!


TOPIC 3: Analyzing AlexNet
1. Present an overview of the AlexNet architecture.
Explanation: AlexNet was the winner of the 2012 ImageNet challenge and marked the beginning of the "Deep Learning" era. It is much deeper and wider than LeNet. It consists of 8 learned layers: 5 convolutional layers and 3 fully connected layers. It was designed to handle high-resolution color images (227x227 pixels).
2. Explain the architectural innovations introduced in AlexNet.
Explanation:
ReLU Activation: It replaced Sigmoid/Tanh with ReLU (Rectified Linear Unit). This solved the vanishing gradient problem and sped up training significantly.
Dropout: Introduced in the fully connected layers to randomly "turn off" neurons during training. This prevented the model from overfitting (memorizing the data).
Data Augmentation: They artificially expanded the dataset by flipping and cropping images to make the model more robust.
Overlapping Pooling: Unlike LeNet, AlexNet used pooling windows that overlapped, which slightly reduced error rates.
GPU Utilization: It was one of the first models designed specifically to run on parallel GPUs.
3. Discuss the role of convolutional layers, pooling layers, and fully connected layers in AlexNet.
Explanation:
Conv Layers (1-5): The early layers detect simple edges and colors. The deeper layers detect complex shapes and textures (like eyes, wheels, fur).
Pooling Layers: Max pooling is used to aggressively downsample the image size to reduce computation while keeping the strongest features.
Fully Connected Layers: These act as the "classifier" part of the brain. They take the high-level features extracted by the conv layers and determine which class (e.g., "Dog," "Car") the image belongs to.

4. Implement AlexNet

In [7]:
import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np

# 1. Load Data (MNIST)
# We reload it to make sure we have a fresh start for the new model
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# 2. Preprocessing
# Normalize pixel values
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Add channel dimension (MNIST is grayscale)
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)

# 3. Define AlexNet Architecture
# Note: Standard AlexNet is for 227x227 images.
# We will use an Input layer to resize our 28x28 images to 64x64 so AlexNet works.
def AlexNet_MNIST():
    model = models.Sequential()
    
    # Input & Resizing Layer
    model.add(layers.InputLayer(input_shape=(28, 28, 1)))
    model.add(layers.Resizing(64, 64)) 

    # 1st Convolutional Layer
    model.add(layers.Conv2D(filters=48, kernel_size=(11,11), strides=(4,4), padding='valid', activation='relu'))
    model.add(layers.MaxPooling2D(pool_size=(3,3), strides=(2,2), padding='valid'))

    # 2nd Convolutional Layer
    model.add(layers.Conv2D(filters=128, kernel_size=(5,5), strides=(1,1), padding='same', activation='relu'))
    model.add(layers.MaxPooling2D(pool_size=(3,3), strides=(2,2), padding='valid'))

    # 3rd, 4th, 5th Convolutional Layers
    model.add(layers.Conv2D(filters=192, kernel_size=(3,3), strides=(1,1), padding='same', activation='relu'))
    model.add(layers.Conv2D(filters=192, kernel_size=(3,3), strides=(1,1), padding='same', activation='relu'))
    model.add(layers.Conv2D(filters=128, kernel_size=(3,3), strides=(1,1), padding='same', activation='relu'))
    # Max Pooling
    model.add(layers.MaxPooling2D(pool_size=(3,3), strides=(2,2), padding='same'))

    # Flatten
    model.add(layers.Flatten())

    # Fully Connected Layers with Dropout (Key feature of AlexNet!)
    model.add(layers.Dense(1024, activation='relu'))
    model.add(layers.Dropout(0.5))
    
    model.add(layers.Dense(1024, activation='relu'))
    model.add(layers.Dropout(0.5))

    # Output Layer
    model.add(layers.Dense(10, activation='softmax'))

    return model

# 4. Create and Compile
model_alex = AlexNet_MNIST()
model_alex.summary() # Check the output to see the bigger architecture

model_alex.compile(optimizer='adam', 
                   loss='sparse_categorical_crossentropy', 
                   metrics=['accuracy'])

# 5. Train
print("Starting AlexNet training...")
# We train for 3 epochs because AlexNet is larger and slower than LeNet
history = model_alex.fit(x_train, y_train, epochs=3, validation_data=(x_test, y_test))

print("AlexNet implementation complete!")



Starting AlexNet training...
Epoch 1/3
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m82s[0m 42ms/step - accuracy: 0.9074 - loss: 0.2958 - val_accuracy: 0.9772 - val_loss: 0.0968
Epoch 2/3
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m78s[0m 42ms/step - accuracy: 0.9775 - loss: 0.1006 - val_accuracy: 0.9851 - val_loss: 0.0594
Epoch 3/3
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m74s[0m 39ms/step - accuracy: 0.9825 - loss: 0.0766 - val_accuracy: 0.9874 - val_loss: 0.0504
AlexNet implementation complete!
