## TOPIC: Understanding Pooling and Padding in CNN


## 1. Describe the purpose and benefits of pooling in CNN.

In [None]:
'''Pooling, in Convolutional Neural Networks (CNNs), serves the purpose of reducing the spatial dimensions of the input volume, 
leading to a smaller representation while retaining important information. There are several types of pooling, with max pooling and average 
pooling being the most common.

Purpose:
    Dimensionality Reduction: 
        By downsampling the input feature maps, pooling reduces the number of parameters and computations in the network. This simplification helps
        in controlling overfitting and computational costs.

    Translation Invariance: 
        Pooling helps in achieving a degree of translation invariance. It means that even if the position of a feature in the input changes 
        slightly, the network can still recognize it due to pooling summarizing the presence of features in a local neighborhood.

Benefits:
    Computational Efficiency: 
        Pooling reduces the computational load by decreasing the spatial dimensions, making subsequent operations faster and more manageable.

    Feature Generalization: 
        It extracts the most important features while discarding less relevant ones, aiding the network in focusing on significant patterns.

    Robustness to Variations: 
        Pooling helps in making the network less sensitive to small variations in the input, enhancing its ability to recognize features 
        regardless of their exact location.

    Memory Efficiency: 
        Smaller feature maps post-pooling reduce memory requirements, facilitating easier storage and manipulation of data.

However, it's worth noting that with the introduction of newer architectures, like the inception modules and residual connections in networks 
like InceptionNet and ResNet, respectively, the use of pooling layers has become less frequent in some models. These architectures leverage 
different strategies to achieve similar goals while reducing spatial dimensions but with potentially less loss of information compared to 
traditional pooling layers.'''

## 2. Explain the difference between min pooling and max pooling.

In [None]:

'''Min pooling and max pooling are both types of pooling operations used in Convolutional Neural Networks (CNNs) to downsample feature maps, 
reducing their spatial dimensions. However, they operate differently in terms of how they aggregate information within the pooling window.

Max Pooling:
    Operation: 
        Max pooling takes the maximum value from each sub-region of the input feature map within a defined window (often 2x2 or 3x3) and outputs 
        only the maximum value.

    Purpose: 
        It captures the most active feature within the window, emphasizing the presence of specific features. Max pooling is effective in 
        preserving prominent features, enhancing robustness to translations and variations in the input data.

    Example: 
        Given a 2x2 pooling window, if the values within that window are [3, 5, 2, 4], max pooling would output the value 5 (the maximum value
        in the window).

Min Pooling:
    Operation: 
        Min pooling, on the other hand, takes the minimum value from each sub-region of the input feature map within the defined window and 
        outputs only the minimum value.

    Purpose: 
        It focuses on capturing the least active feature within the window, highlighting the absence or minimum presence of specific features.

    Example: 
        Using the same 2x2 pooling window as before with values [3, 5, 2, 4], min pooling would output the value 2 (the minimum value in the 
        window).

Differences:
    Aggregation: 
        Max pooling captures the most prominent features, while min pooling focuses on the least active or minimum features within the pooling 
        window.
 
    Use Cases: 
        Max pooling is commonly used in CNN architectures due to its ability to capture strong, salient features, whereas min pooling is less 
        frequently used and might be more suitable for specific applications where the absence of certain features is crucial.

While max pooling is more prevalent in CNN architectures due to its effectiveness in capturing important features, the choice between max and min
pooling (or using average pooling) often depends on the specific requirements and characteristics of the task at hand.'''


## 3. Discuss the concept of padding in CNN and its significance.

In [None]:
'''Padding in Convolutional Neural Networks (CNNs) refers to the technique of adding extra bordering pixels around the input data before applying
convolution operations. This additional border of pixels is typically filled with zeros (zero-padding), although other padding methods like
reflection padding or replication padding can also be used.

Significance of Padding:
    Preservation of Spatial Information:
        Prevents Information Loss: Without padding, the spatial dimensions of the input feature maps reduce with each convolutional layer. 
        Padding helps retain the spatial dimensions, especially at deeper layers, preserving more spatial information.
    Control over Output Size:
        Desired Output Dimensions: Padding allows control over the spatial dimensions of the output feature maps after convolution. By adjusting
        the amount of padding, it's possible to obtain specific output dimensions.
    Mitigating Border Effects:
        Reducing Border Effects: Convolution operations at the edges of feature maps tend to have fewer neighbors, causing border effects and 
        reduction in information. Padding helps mitigate this by allowing more convolutions at the edges, enhancing the network's performance and
        reducing artifacts.
    Facilitating Feature Learning:
        Enabling Convolutional Operations: Padding ensures that features at the borders of the input are fully considered during convolution, 
        enabling the network to learn features at different positions effectively.
    Compatibility with Stride and Kernel Sizes:
        Flexible Use of Stride and Kernel: Padding enables the use of larger kernel sizes or strides without drastically reducing the spatial 
        dimensions of the feature maps. This flexibility in architectural choices aids in learning hierarchical representations of data.
    
Types of Padding:
    Valid (No Padding): No padding is added, and convolution is performed only where the input and the kernel fully overlap, leading to reduction
    in spatial dimensions.

    Same Padding: It adds enough padding to the input so that the output feature map has the same spatial dimensions as the input. This is 
    achieved by adding padding such that the convolutional operation covers the input fully.

Padding is a crucial concept in CNNs, as it impacts the network's ability to learn features effectively, control output dimensions, and mitigate 
issues related to border effects, ultimately contributing to the network's performance and robustness.'''

## 4. Compare and contrast zero-padding and valid-padding in terms of their effects on the output feature Map size.

In [None]:

'''Zero-padding and valid-padding are two common techniques used in Convolutional Neural Networks (CNNs) that have contrasting effects on the output
feature map size.

Zero-padding:
    Effect on Output Size:
        Preserves Output Size: Zero-padding adds extra rows and columns of zeros around the input feature map.
        Maintains Spatial Dimensions: With zero-padding, the spatial dimensions of the output feature map can remain the same as the input when 
        using convolutional operations.
    
    Example:
        Consider an input feature map of size 5x5 and a convolutional kernel of size 3x3. Applying zero-padding of size 1 (one pixel wide on each
        side) would result in a padded input size of 7x7.
        Convolution with a 3x3 kernel on this padded input would yield a 5x5 output feature map, maintaining the spatial dimensions of the input.

Valid-padding:
    Effect on Output Size:
        Reduces Output Size: Valid-padding (also called no padding) does not add any extra bordering pixels to the input.
        Decreases Spatial Dimensions: Without any padding, the spatial dimensions of the output feature map reduce compared to the input.

    Example:
        For the same example of a 5x5 input feature map and a 3x3 kernel, if valid-padding is used (no padding), the convolution operation would 
        be performed without adding any extra padding.
        Applying the 3x3 kernel on the 5x5 input would result in a 3x3 output feature map due to the reduction in spatial dimensions caused by the
        absence of padding.

Comparison:
    Output Size Maintenance: 
        Zero-padding maintains the spatial dimensions of the input in the output, while valid-padding reduces the output size by performing 
        convolution without any additional padding.

    Control over Output Size: 
        Zero-padding allows for controlling the output size by adjusting the amount of padding added, while valid-padding results in a reduced 
        output size compared to the input.

In summary, zero-padding is often used when preservation of spatial dimensions in the output feature maps is desired, while valid-padding is used 
when reducing spatial dimensions is acceptable or desired for downsampling or extracting key features.'''


In [1]:
import numpy as np

# Define the input feature map
input_feature_map = [1, 2, 3]

# Define the filter
filters = [0, 1, 0.5]

# Apply convolution with zero-padding
output_feature_map_zero_padding = np.convolve(input_feature_map, filters, mode='same')

# Print the output feature map size
print(output_feature_map_zero_padding)

# Apply convolution with valid-padding
output_feature_map_valid_padding = np.convolve( input_feature_map, filters, mode='valid')

# Print the output feature map size
print(output_feature_map_valid_padding)

[1.  2.5 4. ]
[2.5]


# TOPIC: Exploring LeNet¶



## 1. Provide a brief overview of LeNet-5 architecture.

In [None]:
'''LeNet-5 is a pioneering Convolutional Neural Network (CNN) architecture developed by Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner
in the 1990s. It was one of the earliest successful CNN models and played a significant role in advancing the field of deep learning and computer
vision. LeNet-5 was primarily designed for handwritten digit recognition tasks, such as recognizing digits in checks and postal services.

Structure:

    Input: 32x32 grayscale image
    Layers:
        3 convolutional layers: Each with 5x5 filters and ReLU activation
        2 subsampling layers: Each using 2x2 average pooling with stride 2
        1 flattening layer: Converts the feature maps to a vector
        2 fully-connected layers: 120 and 84 neurons with ReLU activation
        Output: 10 neurons with softmax activation for digit classification

Key Points:

    Small and efficient: Suitable for limited computational resources compared to modern CNNs
    Emphasis on local features: Achieved through small receptive fields of 5x5 filters
    Adaptive feature extraction: Utilized subsampling to reduce dimensionality and capture global features
    Fully-connected layers: Performed final classification based on extracted features

Significance:

    Demonstrated the effectiveness of CNNs for image recognition tasks
    Paved the way for the development of more complex and powerful CNN architectures
    Remains a valuable tool for understanding the basic principles of CNNs

Applications:

    Handwritten digit recognition
    Image classification
    Feature extraction

Limitations:

    Limited to small input sizes
    Less powerful than modern CNNs for complex tasks

Overall, LeNet-5 represents a significant milestone in the history of deep learning and laid the groundwork for the widespread adoption of CNNs 
in various applications.'''

## 2. Describe the key components of LeNet-5 and their respective purposes.


In [None]:
'''Key Components of LeNet-5 and their Purposes:

1. Input Layer:

    Purpose: This layer accepts the input image, typically a 32x32 grayscale image representing a handwritten digit.
    Function: It preprocesses the input image and prepares it for subsequent processing by the network.

2. Convolutional Layers:

    Number: LeNet-5 has 3 convolutional layers.
    Purpose: Extract local features from the input image.
    Function: Each layer applies a set of filters (5x5 in this case) that slide across the image, extracting features like edges, corners, and 
    textures.
    Activation: LeNet-5 uses the ReLU activation function in these layers, which introduces non-linearity and helps the network learn complex 
    features.

3. Subsampling Layers:

    Number: LeNet-5 has 2 subsampling layers.
    Purpose: Reduce the dimensionality of the feature maps and capture global features.
    Function: These layers use 2x2 average pooling with stride 2, which reduces the feature map size by half while preserving important information.

4. Flattening Layer:

    Purpose: Convert the feature maps into a single vector
    Function: This layer transforms the multi-dimensional feature maps (e.g., 14x14x20) into a one-dimensional vector (e.g., 1960) suitable for
    input to fully-connected layers.

5. Fully-Connected Layers:

    Number: LeNet-5 has two fully-connected layers with 120 and 84 neurons, respectively.
    Purpose: Combine features extracted by convolutional layers and make final classification decision.
    Function: These layers perform linear transformations on the input vector, allowing the network to learn complex relationships between 
    features and the output classes (10 digits in this case).
    Activation: LeNet-5 uses ReLU activation in the first fully-connected layer and no activation in the second (softmax activation is applied
    in the output layer).

6. Output Layer:

    Number: LeNet-5 has a single output layer with 10 neurons.
    Purpose: Predict the class label of the input image.
    Function: Each neuron in the output layer represents a digit (0-9). The network calculates the probability that the input image belongs to 
    each class and outputs the class with the highest probability.
    Activation: LeNet-5 uses the softmax activation function in this layer, which normalizes the outputs into probabilities between 0 and 1.

Summary:

    Each component of LeNet-5 plays a crucial role in the overall process of recognizing handwritten digits. The convolutional layers extract 
    local features, subsampling layers capture global information, fully-connected layers combine features and learn complex relationships, and 
    finally, the output layer makes the final prediction. This architecture highlights the fundamental principles of CNNs and demonstrates their 
    effectiveness in image recognition tasks.'''

## 3. Discuss the advantages and limitations of LeNet-5 in the context of image classification tasks.

In [None]:
'''Advantages of LeNet-5:
    
    Simplicity and efficiency: 
        LeNet-5 has a relatively small number of parameters and layers, making it computationally efficient and suitable for limited resources 
        compared to modern CNNs.
    Focus on local features: 
        The use of small receptive fields in the convolutional layers allows LeNet-5 to effectively capture local features in images, which are 
        crucial for tasks like handwritten digit recognition.
    Adaptive feature extraction: 
        Subsampling layers downsample the feature maps while preserving important information, allowing LeNet-5 to capture both local and global 
        features.
    Demonstrated effectiveness: 
        LeNet-5 achieved state-of-the-art results on handwritten digit recognition tasks, paving the way for the development of more complex CNNs.
    Educational value: 
        Due to its simplicity, LeNet-5 remains a valuable tool for understanding the basic principles of CNNs and serves as a foundational model 
        for further research.
    Potential for transfer learning: 
        The pre-trained features extracted by LeNet-5 can be used as input to other models, improving their performance on similar tasks.

Limitations of LeNet-5:

    Limited to small input sizes:
        LeNet-5 is only suitable for processing small images (e.g., 32x32) and cannot handle high-resolution images without significant 
        modifications.
    Less powerful than modern CNNs: 
        Due to its limited complexity, LeNet-5 performs poorly on complex image classification tasks compared to modern architectures like VGG or
        ResNet.
    Overfitting risk: 
        The relatively small number of parameters in LeNet-5 can lead to overfitting, especially when trained on small datasets.
    Limited representational capacity: 
        The shallow architecture of LeNet-5 restricts its ability to learn complex relationships between features, which can hinder performance on
        tasks with high intra-class variability.
    Not suitable for object detection or segmentation: 
        LeNet-5 was designed for digit recognition and cannot be directly applied to tasks like object detection or segmentation without 
        significant modifications.

Conclusion:
    Despite its limitations, LeNet-5 remains a significant milestone in the history of CNNs. Its simplicity, efficiency, and effectiveness in 
    specific tasks make it a valuable tool for learning and understanding the fundamentals of these powerful models. However, it is important to 
    acknowledge that LeNet-5 is no longer competitive for most modern image classification tasks. More complex and advanced architectures have 
    emerged that offer significantly improved performance on a wider range of applications.'''

## 4. Implement LeNet-5 using a deep learning framework of youc choice (e.g., TensorFlow, PyTorch) and train it on a publicly available dataset (e.g., MNIST). Evaluate its performance and provide insights.

In [3]:
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras import layers, models

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape(-1, 28, 28, 1)
test_images = test_images.reshape(-1, 28, 28, 1)
train_images, test_images = train_images / 255.0, test_images / 255.0

In [4]:
model = models.Sequential([
    layers.Conv2D(filters=6, kernel_size=(5, 5), activation='relu', input_shape=(28, 28, 1)),
    layers.AveragePooling2D(pool_size=(2, 2)),
    layers.Conv2D(filters=16, kernel_size=(5, 5), activation='relu'),
    layers.AveragePooling2D(pool_size=(2, 2)),
    layers.Flatten(),
    layers.Dense(120, activation='relu'),
    layers.Dense(84, activation='relu'),
    layers.Dense(10, activation='softmax')
])

In [5]:
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=10, validation_data=(test_images, test_labels))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x294afee90>

In [6]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('Test loss:', test_loss)
print('Test accuracy:', test_acc)


Test loss: 0.03432998061180115
Test accuracy: 0.9889000058174133


In [None]:
'''Insights:

This implementation achieves a test accuracy of approximately 99%. This demonstrates the effectiveness of LeNet-5 for simple image classification 
tasks like MNIST digit recognition. However, it's important to note that this performance is expected and not particularly remarkable compared to
modern CNNs, which can often achieve 99.7% or higher accuracy on MNIST.

Here are some additional insights:

    LeNet-5's simplicity makes it ideal for educational purposes and understanding CNN fundamentals.
    Its computational efficiency allows it to be deployed on limited resources.
    However, its limitations in representing complex relationships and handling high-resolution images restrict its applicability to many modern 
    tasks.

Overall, LeNet-5 serves as a valuable historical reference and foundational model for CNNs. While not the most powerful architecture today, it 
offers valuable lessons and provides a basis for understanding more complex and effective models.'''

# TOPIC: Analyzing AlexNet

## 1. Present an overview of the AlexNet architecture.

In [None]:
'''AlexNet Architecture Overview:
A Milestone in CNN History:
    AlexNet, developed by Alex Krizhevsky et al. in 2012, was a groundbreaking convolutional neural network (CNN) that revolutionized the field of 
    image recognition. Its introduction at the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012 marked a significant performance 
    jump, winning the competition by a large margin and sparking a resurgence of interest in deep learning.

Key Features:

    Increased depth: 
        AlexNet boasted a significantly deeper architecture compared to previous CNNs, with 8 layers including 5 convolutional layers and 3 
        fully-connected layers. This increased depth allowed the network to learn more complex relationships between features.
    Rectified Linear Unit (ReLU) activation: 
        AlexNet introduced the use of ReLU activation instead of the traditional tanh or sigmoid functions, improving training speed and 
        performance.
    Max pooling: 
        Max pooling layers were used to downsample feature maps, reducing computational cost and improving feature generalization.
    Dropout layers: 
        Dropout layers randomly set neuron outputs to zero during training, preventing overfitting and improving generalization.
    Larger image size: 
        AlexNet used larger input images (227x227) compared to previous models, allowing it to capture more context and detail.
    Data augmentation: 
        AlexNet employed data augmentation techniques like random cropping and flipping to artificially increase the training data size, improving
        robustness.
    GPU utilization: 
        AlexNet was one of the first CNNs to utilize GPUs for training, significantly accelerating the process and making large-scale training 
        feasible.

Impact:

    Performance boost: 
        AlexNet's performance on the ILSVRC competition surpassed previous methods by a significant margin, demonstrating the potential of deep 
        learning for image recognition.
    Increased research: 
        AlexNet's success sparked a surge of interest in deep learning research, leading to the development of numerous advanced CNN architectures
        and applications.
    Foundation for future models: 
        AlexNet's key features and design principles served as a foundation for many subsequent successful CNN architectures.

Limitations:

    Computational cost: 
        Training AlexNet required significant computational resources compared to previous models, limiting its accessibility to some researchers.
    Overfitting risk:
        Despite utilizing dropout, AlexNet could still suffer from overfitting with limited training data.
    Limited task applicability: 
        While excelling at image classification, AlexNet's architecture was not directly applicable to other tasks like object detection or 
        segmentation.

Overall:

    Despite its limitations, AlexNet remains a landmark achievement in the history of computer vision and deep learning. Its innovative 
    architecture and impressive performance paved the way for the development of more powerful and versatile CNNs, shaping the modern landscape 
    of image recognition and artificial intelligence.'''

## 2. Explain the architectural innovations introduced in AlexNet that contributed to its breakthrough performance.

In [None]:
'''AlexNet's breakthrough performance on the ILSVRC competition can be attributed to several key architectural innovations:

Increased Depth:
    Compared to previous CNNs with just a few layers, AlexNet used 8 layers, including 5 convolutional layers and 3 fully-connected layers. 
    This significant increase in depth allowed the network to learn more complex and hierarchical representations of visual features, leading to
    improved image recognition accuracy.

Rectified Linear Unit (ReLU) Activation:
    AlexNet replaced the traditional tanh or sigmoid activations with the ReLU function. ReLU has a faster and more efficient computation, 
    allowing for faster training and improved performance. Additionally, ReLU's non-saturating nature helps prevent the vanishing gradient problem,
    which can hinder training in deep networks.

Max Pooling:
    Max pooling layers were used to downsample feature maps, reducing the number of parameters and computational cost. This allowed AlexNet to 
    process larger images while maintaining efficiency. Additionally, max pooling helps improve feature generalization by making the network less
    sensitive to small variations in the input image.

Dropout Layers:
    AlexNet introduced dropout layers, which randomly set neuron outputs to zero during training. This prevents individual neurons from becoming
    too reliant on each other, reducing overfitting and improving the network's ability to generalize to unseen data.

Larger Image Size:
    While previous models used small input images, AlexNet utilized larger 227x227 images. This allows the network to capture more context and
    detail in the input, leading to better feature extraction and recognition performance.

Data Augmentation:
    AlexNet employed data augmentation techniques like random cropping, flipping, and scaling to artificially increase the size and diversity of
    the training data. This helps prevent overfitting and improves the network's robustness to variations in the real-world data.

GPU Utilization:
    AlexNet was one of the first CNNs to effectively utilize GPUs for training. GPUs offer significantly higher computational power compared to 
    CPUs, allowing for faster and more efficient training of deep networks. This enabled AlexNet to handle the increased complexity of its
    architecture and train on a large dataset.

Combination of Innovations:
    The combined effect of these innovations was a significant leap in performance compared to previous models. Each innovation played a crucial
    role in enabling AlexNet to learn more complex features and achieve superior image recognition accuracy.

Overall:

AlexNet's architectural innovations demonstrated the potential of deep learning for complex tasks like image recognition. These innovations paved 
the way for further research and development in the field, leading to the emergence of even more powerful and versatile CNN architectures that 
continue to shape the landscape of artificial intelligence today.'''

## 3. Discuss the role of convolutional layers, pooling layers, and fully connected layers in AlexNet.

In [None]:
'''AlexNet's architecture relies on three critical types of layers, each playing a distinct role in the network's functionality:

1. Convolutional Layers:

    Function: Extract local features from the input image.
    Mechanism: Apply filters (kernels) that slide across the image, detecting specific patterns and extracting features like edges, corners, and textures.
    Impact:
        Capture local information crucial for understanding the content of the image.
        Increase the depth of the network, allowing for learning complex hierarchical representations.

2. Pooling Layers:

    Function: Reduce the dimensionality of feature maps while preserving important information.
    Mechanism: Apply functions like max pooling or average pooling to aggregate values in small regions of the feature maps.
    Impact:
        Reduce computational cost by processing smaller feature maps.
        Improve feature generalization by making the network less sensitive to small variations in the input.
        Introduce invariance to spatial translations of features.

3. Fully-Connected Layers:

    Function: Combine features extracted by convolutional layers and learn complex relationships between them.
    Mechanism: Perform linear transformations on the flattened feature maps to connect all neurons and learn global features.
    Impact:
        Enable the network to combine local features into higher-level representations for recognition.
        Learn complex relationships between features that cannot be captured by convolutional layers alone.
        Perform the final classification based on the learned features.

Interplay of Layers:

    Convolutional layers extract local features, while pooling layers summarize and downsample them.
    Fully-connected layers combine these summarized features and learn complex relationships, eventually leading to the final classification decision.
    This interplay between layers with different functionalities allows AlexNet to effectively learn hierarchical representations of visual features,
    achieving superior performance in image recognition tasks.

Overall:

    Each type of layer in AlexNet plays a crucial role in the network's performance. Convolutional layers extract local features, pooling layers reduce 
    dimensionality and introduce invariance, and fully-connected layers combine features and learn complex relationships, ultimately enabling AlexNet to 
    achieve its groundbreaking performance in image recognition.'''

## 4. Implement AlexNet using a deep learning framewofk of your choice and evaluate its performance on a dataset of your choice.

In [16]:
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras import layers, models

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.astype('float32') / 255.0
test_images = test_images.astype('float32') / 255.0

# Standardize mean and standard deviation of the data
mean = train_images.mean(axis=(0, 1, 2), keepdims=True)
std = train_images.std(axis=(0, 1, 2), keepdims=True)
train_images = (train_images - mean) / std
test_images = (test_images - mean) / std

model = models.Sequential([
    layers.Conv2D(filters=96, kernel_size=(11, 11), strides=4, padding='same', activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D(pool_size=(3, 3), strides=2, padding='same'),
    layers.Conv2D(filters=256, kernel_size=(5, 5), padding='same', activation='relu'),
    layers.MaxPooling2D(pool_size=(3, 3), strides=2, padding='same'),
    layers.Conv2D(filters=384, kernel_size=(3, 3), padding='same', activation='relu'),
    layers.Conv2D(filters=384, kernel_size=(3, 3), padding='same', activation='relu'),
    layers.Conv2D(filters=256, kernel_size=(3, 3), padding='same', activation='relu'),
    layers.MaxPooling2D(pool_size=(3, 3), strides=2, padding='same'),
    layers.Flatten(),
    layers.Dense(4096, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(4096, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=10, validation_data=(test_images, test_labels))

test_loss, test_acc = model.evaluate(test_images, test_labels)
print('Test loss:', test_loss)
print('Test accuracy:', test_acc)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test loss: 0.09113667905330658
Test accuracy: 0.9793999791145325
