## TOPIC: Understanding Pooling and Padding in CNN

### 1.Describe the purpose and benefits of pooling in CNN

Pooling, also known as subsampling or downsampling, is a crucial operation in Convolutional Neural Networks (CNNs) that helps reduce the spatial dimensions of feature maps while retaining important information. The primary purpose of pooling is to reduce the computational complexity of the network, control overfitting, and increase the network's translation invariance.

The main benefits of pooling in CNNs are as follows:

1. **Dimensionality Reduction**: Pooling reduces the size of feature maps, making the subsequent layers smaller and more manageable. This leads to a decrease in the number of parameters and computations required in the network, making it more efficient to train and deploy.

2. **Translation Invariance**: Pooling provides the CNN with some level of spatial invariance, meaning that the network can recognize patterns even if they appear in slightly different locations within the input image. Pooling achieves this by aggregating local features and extracting their dominant characteristics, making the network less sensitive to exact positional information.

3. **Feature Generalization**: By summarizing local features into more generalized representations, pooling helps prevent overfitting. Overfitting occurs when a model memorizes specific details of the training data, making it less capable of generalizing to unseen examples. Pooling enforces a form of data compression that discards fine-grained information, leading to a more robust and generalized model.

4. **Noise and Distortion Robustness**: Pooling can make CNNs more robust to small changes and distortions in the input data. It captures the most prominent features and minimizes the influence of minor variations, helping the network focus on the most relevant aspects of the input.

5. **Computational Efficiency**: Pooling operations significantly reduce the number of computations required for each layer, which speeds up the training and inference processes. This efficiency allows CNNs to handle larger datasets and models effectively.

### 2. Explain the difference rbtween Min pooling and Max pooling

It appears that there might be a typographical error in your question. I assume you are asking about the difference between "Min Pooling" and "Max Pooling."

**Max Pooling** and **Min Pooling** are both types of pooling operations used in Convolutional Neural Networks (CNNs) to downsample feature maps and reduce their spatial dimensions. However, they differ in how they aggregate information within the pooling regions.

1. **Max Pooling**:

   - Max pooling takes the maximum value within each pooling region and discards the other values. The output value of each pooling region is the highest activation value found in that region.

   - Max pooling is commonly used because it captures the most salient features and is effective at preserving the most dominant information in the feature maps. It helps maintain robustness to translations and local variations.

   - For example, in a 2x2 max pooling operation, the maximum value among four neighboring elements is selected as the output for that region.

2. **Min Pooling**:

   - Min pooling, on the other hand, takes the minimum value within each pooling region and discards the other values. The output value of each pooling region is the lowest activation value found in that region.

   - Min pooling is less common than max pooling in CNNs and is used less frequently in practice. It has limited applications and is generally not as effective as max pooling for most computer vision tasks.

   - Similar to max pooling, min pooling can also provide some form of translation invariance and noise robustness.

Here's a quick comparison:

- **Max Pooling**: Selects the highest activation value, preserves dominant features, widely used, provides translation invariance.
- **Min Pooling**: Selects the lowest activation value, less commonly used, provides some translation invariance.

### 3. Discuss the concept of padding in CNN and its significance.

Padding is a technique used in Convolutional Neural Networks (CNNs) to preserve the spatial dimensions of feature maps during convolution operations. It involves adding extra border pixels around the input data before applying the convolutional filters. These additional pixels are usually filled with zeros, hence the name "zero-padding." The padding size is typically controlled by a parameter, and the most common choices are adding one pixel of padding on each side (usually referred to as "same" padding) or no padding at all ("valid" padding).

The significance of padding in CNNs lies in several key aspects:

1. **Preservation of Spatial Dimensions**: Convolutional layers without padding result in a reduction of spatial dimensions in the feature maps. If no padding is applied, the output feature map size will be smaller than the input size, and this reduction can progress with deeper layers in the network. Padding helps maintain the spatial dimensions, allowing the network to retain more spatial information and avoid information loss.

2. **Centering the Convolutional Kernels**: When applying convolutional filters, the center of the kernel is usually aligned with the pixels of the input image. Without padding, the kernel's center cannot be applied to the border pixels, leading to a decrease in feature map size. Padding ensures that all pixels of the input image can be processed by the kernel, leading to consistent feature map dimensions.

3. **Preserving Information at Edges**: In convolutional operations, the pixels near the edges of the input image have fewer neighboring pixels available for computation. Without padding, these edge pixels would be underrepresented in the output feature maps. Padding addresses this issue by creating a buffer zone around the input, allowing the convolution operation to capture information at the edges more effectively.

4. **Mitigating the Vanishing Gradient Problem**: Padding can help mitigate the vanishing gradient problem during backpropagation, especially in deeper CNN architectures. The vanishing gradient problem occurs when gradients become too small as they are propagated back through the network during training. This can hinder the training process and reduce the model's ability to learn effectively. Padding helps retain more information in the intermediate layers, improving gradient flow and making it easier for the model to learn useful representations.

5. **Facilitating Design Choices**: Padding allows greater flexibility in designing CNN architectures. It enables the use of larger convolutional kernels, which can capture more complex patterns and context. Additionally, it allows stacking multiple convolutional layers while preserving the spatial dimensions, enabling the construction of deeper networks with larger receptive fields.

### 4. Compare and contrast zero-padding and valid-padding in terms of their effects on the output featuce Map size.

Zero-padding and valid-padding are two different strategies used in Convolutional Neural Networks (CNNs) that have a significant impact on the size of the output feature maps produced by convolutional layers.

**1. Zero-padding:**
- Zero-padding involves adding extra border pixels (usually filled with zeros) around the input data before applying the convolutional filters.
- The padding size is typically controlled by a parameter, and the most common choices are adding one pixel of padding on each side (usually referred to as "same" padding) or more.
- The purpose of zero-padding is to preserve the spatial dimensions of the feature maps during convolution operations.
- When using zero-padding, the size of the output feature maps remains the same as the input size (assuming stride=1). Each pixel in the input image is processed by the convolutional kernel, and padding ensures that the kernel's center can be applied to all pixels, including those at the borders.
- Zero-padding effectively prevents information loss, especially at the edges of the input, and allows for a better representation of the input data.

**2. Valid-padding:**
- Valid-padding, on the other hand, involves no padding at all. It means that the convolutional filters are only applied to the valid region of the input, and no extra border pixels are added.
- Without padding, the convolutional filters can only be applied to pixels in the input image that fully overlap with the kernel.
- As a result, the size of the output feature maps is reduced compared to the input size (unless the kernel is applied only partially to the border pixels).
- The amount of reduction in size depends on the size of the convolutional kernel and the number of convolutional layers in the network.
- Valid-padding is useful when you want to reduce the spatial dimensions of the feature maps, which is often desirable in deeper layers of the network to decrease the computational load and control overfitting.

**Comparison:**
- Zero-padding preserves the spatial dimensions of the feature maps, while valid-padding reduces the size of the feature maps.
- Zero-padding keeps the output feature map size the same as the input size, while valid-padding reduces the output size based on the convolutional kernel size and the number of layers.
- Zero-padding ensures that all pixels in the input are considered during convolution, while valid-padding only applies the convolutional filters to the valid region, neglecting the border pixels.
- Zero-padding is useful when retaining spatial information and preventing information loss, while valid-padding is suitable for reducing spatial dimensions and controlling model complexity.

## TOPIC: Exploring LeNet

### 1. Present an overview of the AlexNet architecture

AlexNet is a deep Convolutional Neural Network (CNN) architecture that gained significant attention after winning the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012. It was developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton and marked a breakthrough in the field of computer vision by demonstrating the power of deep learning on large-scale image classification tasks.

Here's an overview of the AlexNet architecture:

1. **Input Layer**: The network takes an input image of size 224x224x3 (RGB color channels) as the input.

2. **Convolutional Layers**: The network consists of five convolutional layers, each followed by a ReLU (Rectified Linear Unit) activation function. The filters in the early convolutional layers have small receptive fields, while the deeper layers have larger receptive fields to capture more complex patterns.

3. **Max Pooling Layers**: After the first and second convolutional layers, there are max-pooling layers to reduce the spatial dimensions of the feature maps and increase the depth of the network. Max pooling is performed using 3x3 windows with a stride of 2.

4. **Normalization Layers**: Local Response Normalization (LRN) layers are used to normalize the responses of neurons across different feature maps in the early layers. These layers enhance the network's ability to generalize and improve the training convergence.

5. **Dropout**: Dropout layers are employed after the first three fully connected layers. Dropout randomly deactivates a certain percentage of neurons during training, reducing overfitting and improving generalization.

6. **Fully Connected Layers**: The last part of the network consists of three fully connected layers. The first two fully connected layers have 4096 neurons each, and the third fully connected layer has 1000 neurons corresponding to the 1000 classes in the ImageNet dataset. These layers are followed by the Softmax activation function to obtain the final class probabilities.

7. **Output Layer**: The final output is a probability distribution over the 1000 classes of the ImageNet dataset.

8. **Training Details**: AlexNet was trained using the stochastic gradient descent (SGD) optimization algorithm with momentum. It also employed data augmentation techniques such as cropping, flipping, and color jittering to increase the diversity of the training data.

### 2. Explain the acrchitectural innovations introduced in AlexNet that contributed to its breakthrough perfomance.

AlexNet introduced several architectural innovations that contributed to its breakthrough performance in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012. These innovations were instrumental in pushing the boundaries of deep learning and image classification tasks:

1. **Deep Architecture**: AlexNet was one of the first CNN architectures to have multiple layers stacked on top of each other, forming a deep network. Prior to AlexNet, most image classifiers were shallow with just a few layers. By going deeper, AlexNet was able to learn hierarchical features from raw pixels, capturing complex patterns and representations.

2. **ReLU Activation Function**: Instead of using traditional activation functions like sigmoid or tanh, AlexNet used Rectified Linear Units (ReLU) as the activation function after each convolutional layer. ReLU introduces non-linearity and speeds up training by mitigating the vanishing gradient problem. The ReLU activation function allows for faster convergence and improved gradient flow during backpropagation.

3. **Large Convolutional Filters**: AlexNet used large-sized filters (11x11 and 5x5) in the early convolutional layers. Larger filters can capture more spatial information and learn complex features effectively. This was in contrast to earlier networks that primarily used smaller filters.

4. **Multiple GPUs for Training**: AlexNet was one of the first deep learning models to leverage multiple GPUs for parallel processing during training. This allowed faster computation and reduced the training time significantly.

5. **Overlapping Max Pooling**: In the max-pooling layers, AlexNet used overlapping pooling regions. Traditional max-pooling layers used non-overlapping regions. Overlapping pooling helped avoid losing too much spatial information and resulted in better translation invariance.

6. **Local Response Normalization (LRN)**: The Local Response Normalization (LRN) layers were used after the first two convolutional layers. LRN helps enhance the response of neurons by normalizing them across different feature maps within the same spatial location. It introduced local competition between neurons, which aids in generalization and helps create more robust representations.

7. **Dropout Regularization**: AlexNet introduced dropout regularization after the first three fully connected layers. Dropout randomly drops out neurons during training, preventing the network from relying too much on any single feature and reducing overfitting.

8. **Data Augmentation**: AlexNet used data augmentation techniques such as cropping, flipping, and color jittering during training. Data augmentation helps increase the diversity of the training data, making the model more robust and preventing overfitting.

### 3. Discuss the role of convolutional layers, pooling layers, and fully connected layers in AlexNet.

In AlexNet, a deep Convolutional Neural Network (CNN), each type of layer (convolutional, pooling, and fully connected) plays a crucial role in learning hierarchical representations from raw pixels and ultimately classifying images.

**1. Convolutional Layers:**
- The convolutional layers are the core building blocks of CNNs. In AlexNet, there are five convolutional layers.
- These layers consist of convolutional filters (also known as kernels) that slide over the input image, extracting local features at each position through element-wise multiplication and summation.
- The filters in the early layers capture simple patterns like edges and textures, while filters in deeper layers learn more complex and abstract representations.
- By stacking multiple convolutional layers, the network progressively learns hierarchical features, transforming the input image into more semantically meaningful representations.
- Convolutional layers enable the network to learn spatially local patterns, which is crucial for image recognition tasks.

**2. Pooling Layers:**
- After some of the convolutional layers in AlexNet, there are max-pooling layers.
- Pooling layers help reduce the spatial dimensions of the feature maps, making the network more computationally efficient and less sensitive to the exact location of features in the input image.
- Max-pooling takes the maximum value within a region (typically 2x2) and discards the other values. This process reduces the size of the feature maps by half along each spatial dimension (assuming a stride of 2).
- Pooling layers introduce translation invariance, meaning that the network can recognize patterns regardless of their exact positions in the input.
- Moreover, pooling helps control overfitting by aggregating local features and extracting dominant information.

**3. Fully Connected Layers:**
- After the convolutional and pooling layers, there are three fully connected layers in AlexNet.
- These layers connect every neuron in the previous layer to every neuron in the next layer, forming a traditional multi-layer perceptron (MLP) architecture.
- The fully connected layers act as a classifier, taking the high-level representations learned by the preceding layers and mapping them to the output classes.
- The last fully connected layer outputs the probabilities for each class using the Softmax activation function.
- The number of neurons in the last fully connected layer corresponds to the number of classes in the classification task (e.g., 1000 classes in the ImageNet challenge).
- The fully connected layers contribute to the final decision-making process, combining the spatial information learned by the convolutional layers and the abstract features learned by the pooling layers.

### 4. Implement LeNet-5 using a deep learning framework oj youy choice (e.g., TensoyFlow, PyTocch) and tyain it on a publicly available dataset (e.g., MNIST). Evaluate its perfomance and provide insights.

In [1]:
pip install tensorflow

Collecting tensorflow
  Downloading tensorflow-2.13.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (524.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m524.1/524.1 MB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting grpcio<2.0,>=1.24.3
  Downloading grpcio-1.56.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.2/5.2 MB[0m [31m78.6 MB/s[0m eta [36m0:00:00[0mta [36m0:00:01[0m
[?25hCollecting gast<=0.4.0,>=0.2.1
  Downloading gast-0.4.0-py3-none-any.whl (9.8 kB)
Collecting tensorboard<2.14,>=2.13
  Downloading tensorboard-2.13.0-py3-none-any.whl (5.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.6/5.6 MB[0m [31m78.1 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[?25hCollecting wrapt>=1.11.0
  Downloading wrapt-1.15.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.

In [3]:
import tensorflow as tf
from tensorflow.keras import layers, models, datasets, utils

(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()
train_images, test_images = train_images / 255.0, test_images / 255.0

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [4]:
train_images = train_images[..., tf.newaxis]
test_images = test_images[..., tf.newaxis]

In [5]:
model = models.Sequential([
    layers.Conv2D(6, (5, 5), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(16, (5, 5), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(120, activation='relu'),
    layers.Dense(84, activation='relu'),
    layers.Dense(10, activation='softmax')
])

In [6]:
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [7]:
model.fit(train_images, train_labels, epochs=10, batch_size=128, validation_split=0.1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7f144ff19c60>

In [8]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
print("Test accuracy:", test_acc)

Test accuracy: 0.9876999855041504


## TOPIC: Analyzing AlexNet

### 1. Present an overview of the AlexNet architecture.

Certainly! Here's an overview of the AlexNet architecture:

1. **Input Layer**: The AlexNet architecture takes an input image of size 227x227x3 (RGB color channels). The images are preprocessed to this fixed size before feeding them into the network.

2. **Convolutional Layers**: The network consists of five convolutional layers, each followed by a Rectified Linear Unit (ReLU) activation function. The filters in the early convolutional layers have small receptive fields (e.g., 11x11, 5x5), while the deeper layers have smaller receptive fields (e.g., 3x3). This arrangement helps capture both local and more global patterns in the image.

3. **Max Pooling Layers**: After some of the convolutional layers, there are max-pooling layers. Max pooling is applied with a 3x3 window and a stride of 2, reducing the spatial dimensions of the feature maps. Max pooling helps control overfitting and introduces some degree of translation invariance.

4. **Normalization Layers**: Local Response Normalization (LRN) layers are used after the first two convolutional layers. LRN helps enhance the response of neurons by normalizing them across different feature maps within the same spatial location. This local competition between neurons aids in generalization and creates more robust representations.

5. **Dropout**: Dropout layers are employed after the first three fully connected layers. Dropout randomly deactivates a certain percentage of neurons during training, reducing overfitting and improving generalization.

6. **Fully Connected Layers**: The last part of the network consists of three fully connected layers. The first two fully connected layers have 4096 neurons each, and the third fully connected layer has 1000 neurons corresponding to the 1000 classes in the ImageNet dataset. These layers are followed by the Softmax activation function to obtain the final class probabilities.

7. **Output Layer**: The final output is a probability distribution over the 1000 classes of the ImageNet dataset.

8. **Training Details**: AlexNet was trained using the stochastic gradient descent (SGD) optimization algorithm with momentum. It also employed data augmentation techniques such as cropping, flipping, and color jittering to increase the diversity of the training data.

### 2. Explain the architectural innovations introduced in AlexNet that contributed to its breakthrough perfomance.

AlexNet's breakthrough performance can be attributed to several key architectural innovations that pushed the boundaries of deep learning and image classification. Here are the main architectural innovations introduced in AlexNet:

1. **Deep Architecture**: One of the most significant contributions of AlexNet was its deep architecture, consisting of multiple layers stacked on top of each other. Prior to AlexNet, most image classifiers were shallow with just a few layers. By going deeper, AlexNet was able to learn hierarchical features from raw pixels, capturing complex patterns and representations.

2. **ReLU Activation Function**: AlexNet used Rectified Linear Units (ReLU) as the activation function after each convolutional layer, instead of traditional activation functions like sigmoid or tanh. ReLU introduces non-linearity and speeds up training by mitigating the vanishing gradient problem. The ReLU activation function allows for faster convergence and improved gradient flow during backpropagation.

3. **Large Convolutional Filters**: AlexNet used large-sized filters (11x11 and 5x5) in the early convolutional layers. Larger filters can capture more spatial information and learn complex features effectively. This was in contrast to earlier networks that primarily used smaller filters.

4. **Multiple GPUs for Training**: AlexNet was one of the first deep learning models to leverage multiple GPUs for parallel processing during training. This allowed faster computation and reduced the training time significantly. The ability to distribute computations across multiple GPUs was a key factor in handling the computational demands of a deep network like AlexNet.

5. **Overlapping Max Pooling**: In the max-pooling layers, AlexNet used overlapping pooling regions. Traditional max-pooling layers used non-overlapping regions. Overlapping pooling helped avoid losing too much spatial information and resulted in better translation invariance.

6. **Local Response Normalization (LRN)**: The Local Response Normalization (LRN) layers were used after the first two convolutional layers. LRN helps enhance the response of neurons by normalizing them across different feature maps within the same spatial location. It introduced local competition between neurons, which aids in generalization and helps create more robust representations.

7. **Dropout Regularization**: AlexNet introduced dropout regularization after the first three fully connected layers. Dropout randomly drops out neurons during training, preventing the network from relying too much on any single feature and reducing overfitting.

8. **Data Augmentation**: AlexNet used data augmentation techniques such as cropping, flipping, and color jittering during training. Data augmentation helps increase the diversity of the training data, making the model more robust and preventing overfitting.

### 3. Discuss the role of convolutional layers, pooling layecs, and fully connected layers in AlexNet.

In AlexNet, convolutional layers, pooling layers, and fully connected layers play distinct roles in the process of learning hierarchical representations from raw image pixels and making predictions. Each type of layer contributes to the success of AlexNet in image classification tasks:

**1. Convolutional Layers:**
- Convolutional layers are the foundational building blocks of CNNs. In AlexNet, there are five convolutional layers.
- The role of convolutional layers is to apply convolutional filters (also known as kernels) to the input image. These filters slide over the input image, detecting local patterns such as edges, corners, and textures.
- Each filter learns to detect a specific feature, and the combination of multiple filters in different layers enables the network to learn hierarchical representations.
- In AlexNet, the convolutional layers with large receptive fields (e.g., 11x11 and 5x5) capture lower-level features in the initial layers. As the network goes deeper, the convolutional layers with smaller receptive fields (e.g., 3x3) capture more complex and abstract features.

**2. Pooling Layers:**
- After some of the convolutional layers in AlexNet, there are max-pooling layers.
- The primary role of pooling layers is to reduce the spatial dimensions of the feature maps produced by the convolutional layers. This reduces the computational complexity of the network and makes it more efficient.
- Max pooling is used in AlexNet, which takes the maximum value within a small window (e.g., 2x2) and discards the other values. This process down-samples the feature maps and retains the most salient information.
- Pooling layers introduce a degree of translation invariance, meaning that the network can recognize patterns regardless of their precise positions in the input.

**3. Fully Connected Layers:**
- After the convolutional and pooling layers, there are three fully connected layers in AlexNet.
- Fully connected layers act as a classifier, taking the high-level representations learned by the preceding layers and mapping them to the output classes.
- These layers are fully connected because each neuron in a fully connected layer is connected to all the neurons in the previous layer, forming a traditional multi-layer perceptron (MLP) architecture.
- The fully connected layers in AlexNet enable the network to learn global patterns and correlations across the extracted features from the earlier layers.
- The last fully connected layer in AlexNet outputs the probability distribution over the classes, and the class with the highest probability is considered the final prediction.

### 4. Implement AlexNet using a deep learning framework of your choice and evaluate its perfomance on a dataset of your choice.

In [1]:
import tensorflow as tf
from tensorflow.keras import layers, models, optimizers
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical

# Load and preprocess CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)

2023-08-02 01:07:09.769852: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-08-02 01:07:10.293077: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-08-02 01:07:10.296095: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
def alexnet_model():
    model = models.Sequential()
    
    # 1st Convolutional Layer
    model.add(layers.Conv2D(96, (11, 11), strides=(4, 4), activation='relu', input_shape=(32, 32, 3)))
    model.add(layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))
    
    # 2nd Convolutional Layer
    model.add(layers.Conv2D(256, (5, 5), padding='same', activation='relu'))
    model.add(layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding='same'))
    
    # 3rd Convolutional Layer
    model.add(layers.Conv2D(384, (3, 3), padding='same', activation='relu'))
    
    # 4th Convolutional Layer
    model.add(layers.Conv2D(384, (3, 3), padding='same', activation='relu'))
    
    # 5th Convolutional Layer
    model.add(layers.Conv2D(256, (3, 3), padding='same', activation='relu'))
    
    # Fully Connected Layers
    model.add(layers.Flatten())
    model.add(layers.Dense(4096, activation='relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(4096, activation='relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(10, activation='softmax'))
    
    return model

In [3]:
# Create and compile the model
model = alexnet_model()
model.compile(loss='categorical_crossentropy', optimizer=optimizers.Adam(lr=0.001), metrics=['accuracy'])



In [4]:
# Train the model
model.fit(x_train, y_train, batch_size=128, epochs=4, validation_split=0.1)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.src.callbacks.History at 0x7f2ac129cd90>

In [5]:
# Evaluate the model on the test set
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=1)
print("Test accuracy:", test_accuracy)

Test accuracy: 0.4000999927520752
