In [1]:
#Q1. Descrirbe the purpose and  benefits of pooling in CNN.
#Ans:-Purpose of Pooling in CNN:
#Pooling, specifically Max Pooling and Average Pooling, is a crucial operation in Convolutional Neural Networks (CNNs).
#The primary purposes of pooling in CNNs are:
#1.Spatial Reduction:Pooling reduces the spatial dimensions of the input feature maps, effectively downsampling them.
#This reduction helps manage computational complexity and memory requirements.

#02.Translation Invariance:
#Pooling introduces a degree of translation invariance.
# By summarizing local information (max or average values) in small regions, the network becomes less sensitive to the exact position of features within the receptive field.

#03.Feature Retention:While reducing spatial dimensions, pooling retains the most salient features.
# This ensures that important information is preserved, facilitating the learning of high-level features in deeper layers of the network.

#Benefits of Pooling in CNN:
#01.Computational Efficiency:Pooling reduces the amount of computation required in subsequent layers by downsampling the input.
# This is especially important in deep networks where the spatial dimensions can become large.

#02.Parameter Reduction:Fewer parameters are needed in the network after pooling, leading to a more compact model.
# This is beneficial for preventing overfitting, especially when dealing with limited training data.

#03.Increased Receptive Field:Pooling allows the network to capture a larger receptive field by summarizing information from a local region.
# This enables the network to recognize more global patterns and complex features.

#04.Improved Translation Invariance:Pooling enhances the network's ability to detect features regardless of their exact position within the receptive field.
# This is crucial for recognizing objects or patterns regardless of their location in an image.

#05.Hierarchical Feature Learning:Pooling contributes to the hierarchical learning of features.
#As the network progresses through layers, pooling layers help abstract away fine-grained spatial information, focusing on more high-level features.

#06.Noise Reduction:Pooling helps filter out noise or irrelevant details from the input, focusing on the most relevant information.
#This is beneficial for improving the robustness of the model.

#07.Memory Efficiency:The reduced spatial dimensions after pooling lead to lower memory requirements during training and inference, making the model more memory-efficient.

In [2]:
#Q2. Explain the difference between min pooling and max pooling.
#Ans:-Min Pooling and Max Pooling in Convolutional Neural Networks (CNNs):
#Min pooling and max pooling are two types of pooling operations commonly used in Convolutional Neural Networks (CNNs) for down-sampling feature maps.
#Both operations aim to reduce the spatial dimensions of the input while retaining important features.
# The key difference lies in how they aggregate information within the pooling regions.
#Max Pooling:
#Operation:Max pooling involves dividing the input feature map into non-overlapping regions and selecting the maximum value from each region.
#Selection Criteria:The maximum value represents the most activated feature in that region.
#Benefits:Max pooling is particularly effective in capturing the most prominent features and highlighting the presence of specific patterns within the local regions.
#Illustration:If a max pooling operation is applied to a set of values [2, 5, 1, 8], the output will be the maximum value, which is 8.

#Min Pooling:
#Operation:Min pooling is similar to max pooling but involves selecting the minimum value from each pooling region.
#Selection Criteria:The minimum value represents the least activated feature in that region.
#Benefits:Min pooling can be useful in scenarios where the goal is to capture the least prominent features or suppress the impact of noise in the input.
#Illustration:If a min pooling operation is applied to a set of values [2, 5, 1, 8], the output will be the minimum value, which is 1.

In [3]:
#Q3. Discuss the concept of padding in CNN and its significance.
#Ans:-Padding in Convolutional Neural Networks (CNNs):
#In the context of CNNs, padding refers to the process of adding extra pixels (or zeros) around the input data before applying convolutional or pooling operations.
# Padding is often applied to maintain the spatial dimensions of the input feature maps and to address several issues associated with the convolutional and pooling layers.

#Significance of Padding:
#Preservation of Spatial Information:Padding helps preserve the spatial information at the edges of the input feature maps.
# Without padding, the convolutional operation progressively reduces the spatial dimensions, potentially leading to a loss of information near the borders.

#Prevention of Information Loss:Without padding, the convolutional layers tend to lose information from the edges of the input feature maps.
# Padding ensures that the outer pixels are given the same consideration as the central pixels during convolution, preventing information loss.

#Facilitation of Centered Convolution:Padding enables centered convolution, where the center of the convolutional kernel aligns with the center of the input feature map.
# This is important for preserving the spatial relationships between features.

#Handling Various Input Sizes:Padding is beneficial when dealing with input images of various sizes.
# It allows the network to process inputs of different dimensions consistently and helps in creating a more robust and flexible model.

#Mitigation of Border Effects:Without padding, the convolutional operation near the borders can result in fewer interactions with neighboring pixels.
# Padding helps mitigate border effects by providing a buffer zone for convolutional operations.

#Control over Output Size:Padding allows control over the size of the output feature maps.
#By adjusting the amount of padding, one can control the spatial dimensions of the feature maps, influencing the downsampling and upsampling characteristics of the network.

#Padding Types:
#Valid (No Padding):
#Also known as "no padding" or "valid convolution."
#No padding is added, resulting in reduced spatial dimensions after convolution.
#Suitable when the input size is not a concern, and information loss near the borders is acceptable.

#Same (Zero Padding):
#Padding is added symmetrically to the input so that the output size remains the same.
#Commonly used to maintain spatial information and simplify network architectures.

#Full Padding:
#Padding is added to ensure that every pixel in the input has the same impact on the output.
#Less commonly used in practice due to computational requirements.

In [6]:
#Q4.Compare and contrast zero-padding and valid-padding in terms of their effects on the output feature map size.
#Ans:-Zero-padding (Same Padding) vs. Valid Padding in Convolutional Neural Networks (CNNs):
#1. Zero-padding (Same Padding):
#Effect on Output Feature Map Size:
#Zero-padding involves adding zeros to the input feature map symmetrically, allowing the output feature map to have the same spatial dimensions as the input.
#It keeps the spatial information intact and helps mitigate issues related to information loss at the edges.

#Advantages:
#Preserves spatial information at the borders.
#Simplifies network architectures as it keeps the output size consistent.

#Use Cases:Commonly used in practice when maintaining spatial information is crucial, such as in image classification tasks.

#2. Valid Padding (No Padding):
#Effect on Output Feature Map Size:
#Valid padding, also known as "no padding," involves not adding any extra padding to the input feature map.
#The output feature map size is reduced compared to the input due to the absence of padding.

#Advantages:
#Reduces computational requirements as no additional operations are performed at the borders.
#Suitable when the exact spatial dimensions are not critical, and information loss near the edges is acceptable.

#Use Cases:Used when computational efficiency is a priority, and slight information loss at the edges is tolerable.

#Comparison:Preservation of Spatial Information:
#Zero-padding preserves spatial information, making it suitable for tasks where maintaining accurate spatial relationships is crucial.
#Valid padding sacrifices some spatial information at the edges but may be preferred in cases where computational efficiency is prioritized.

#Network Architectures:
#Zero-padding often simplifies network architectures by keeping the output size consistent.
#Valid padding may require additional adjustments in network architecture to account for reduced spatial dimensions.

#Use Case Considerations:
#Zero-padding is commonly used in tasks like image classification where precise spatial information is essential.
#Valid padding is used when computational efficiency is a priority, and a slight reduction in spatial information near the edges is acceptable.

Topic : Exploring LeNet

In [7]:
#Q1. Provide a brief overview of LeNet-5 architecture.
#Ans:-LeNet-5, designed by Yann LeCun and his collaborators in 1998, is one of the pioneering convolutional neural network (CNN) architectures.
# It was primarily developed for handwritten digit recognition tasks, specifically for recognizing digits in postal codes and checks.
# LeNet-5 played a crucial role in demonstrating the effectiveness of deep learning in computer vision.
#Key Components of LeNet-5:
#01.Input Layer:LeNet-5 takes as input grayscale images of size 32x32 pixels.

#02.First Convolutional Layer (C1):
#Convolutional layer with a kernel size of 5x5.
#Output channels: 6.
#Activation function: Sigmoid.
#Subsampling (Pooling): Average pooling with a 2x2 kernel and a stride of 2.

#03.Second Convolutional Layer (C3):
#Convolutional layer with a kernel size of 5x5.
#Input channels: 6.
#Output channels: 16.
#Activation function: Sigmoid.
#Subsampling (Pooling): Average pooling with a 2x2 kernel and a stride of 2.

#04.Third Convolutional Layer (C5):
#Convolutional layer with a kernel size of 5x5.
#Input channels: 16.
#Output channels: 120.
#Activation function: Sigmoid.

#05.Fully Connected Layers (F6 and Output Layer):
#F6: Fully connected layer with 84 neurons.
#Output layer: Fully connected layer with 10 neurons (for 10 output classes in digit recognition).
#Activation function: Sigmoid for F6, Softmax for the output layer.

#06.Flatten Layers:Flatten layers are used to convert the 3D feature maps into a 1D vector before the fully connected layers.

#07.Activation Function:Sigmoid activation functions are used throughout the network except for the output layer, which uses the Softmax activation for multi-class classification.

#08.Loss Function:Cross-entropy loss is typically used as the loss function for the classification task.

#Advantages and Contributions:
#First Convolutional Neural Network:
#LeNet-5 was one of the earliest CNN architectures and demonstrated the effectiveness of convolutional layers in feature extraction and spatial hierarchies.
#Parameter Sharing:Parameter sharing in convolutional layers reduces the number of parameters, making the model more efficient.
#Pooling Layers:LeNet-5 introduced the concept of pooling layers for spatial down-sampling, which aids in creating hierarchical and translation-invariant features.
#Demonstrated Success in Handwriting Recognition:LeNet-5 achieved high accuracy in handwritten digit recognition tasks and laid the foundation for subsequent advancements in CNNs for image recognition.

In [8]:
#Q2. Describe the key components of LeNet-5 and their respective purposes.
#Ans:-Key Components of LeNet-5 and Their Purposes:
#01.Input Layer:
#Purpose: Accepts grayscale images as input with dimensions 32x32 pixels.
#Details: The input layer represents the raw pixel values of the input images.

#First Convolutional Layer (C1):
#Purpose: Performs the initial feature extraction.
#Details:
#Convolution with a 5x5 kernel.
#Input channels: 1 (grayscale).
#Output channels: 6.
#Activation function: Sigmoid.
#Subsampling (Pooling): Average pooling with a 2x2 kernel and a stride of 2.
#Significance: C1 extracts basic features from the input images.

#Second Convolutional Layer (C3):
#Purpose: Further refines features and extracts higher-level representations.
#Details:
#Convolution with a 5x5 kernel.
#Input channels: 6.
#Output channels: 16.
#Activation function: Sigmoid.
#Subsampling (Pooling): Average pooling with a 2x2 kernel and a stride of 2.
#Significance: C3 builds upon the features extracted by C1, creating more complex representations.

#Third Convolutional Layer (C5):
#Purpose: Continues feature extraction and prepares for fully connected layers.
#Details:
#Convolution with a 5x5 kernel.
#Input channels: 16.
#Output channels: 120.
#Activation function: Sigmoid.
#Significance: C5 further abstracts features and prepares for the transition to fully connected layers.

#Fully Connected Layers (F6 and Output Layer):
#Purpose: Make predictions based on the learned features.
#Details:
#F6: Fully connected layer with 84 neurons and a Sigmoid activation function.
#Output layer: Fully connected layer with 10 neurons (for digit classes) and a Softmax activation function.
#Significance: These layers combine the abstracted features for classification.

#Flatten Layers:
#Purpose: Reshape the 3D feature maps into 1D vectors.
#Details: Flatten layers precede the fully connected layers.

In [9]:
#Q3. Discuss the advantages and limitations of LeNet-5 and their context of image classification tasks.
#Ans:Advantages of LeNet-5 in Image Classification:
#01.Pioneering CNN Architecture:LeNet-5 was one of the pioneering architectures that demonstrated the effectiveness of convolutional neural networks (CNNs) for image classification tasks.
# It laid the foundation for subsequent developments in deep learning.

#02.Feature Hierarchy:
#LeNet-5 introduced the concept of a feature hierarchy through convolutional and pooling layers.
# This hierarchical representation allows the network to progressively extract complex features.

#03.Parameter Sharing:Parameter sharing in convolutional layers reduces the number of parameters, making the model more efficient and reducing the risk of overfitting, especially in the presence of limited training data.

#04.Translation Invariance:The use of pooling layers in LeNet-5 contributes to translation invariance, enabling the network to recognize features regardless of their exact position in the input.

#05.Demonstrated Success in Handwriting Recognition:LeNet-5 achieved high accuracy in handwritten digit recognition tasks, showcasing its effectiveness in recognizing patterns and shapes.

#06.Sigmoid Activation:The use of the sigmoid activation function in LeNet-5's layers allows the model to learn non-linear mappings, enhancing its capability to capture complex relationships in the data.

#Limitations of LeNet-5 in Image Classification:
#01.Sigmoid Activation:The use of the sigmoid activation function in the layers can lead to the vanishing gradient problem,
# making it challenging for the model to learn and update parameters effectively, especially in deeper networks.

#02.Limited Capacity:Compared to modern architectures, LeNet-5 has a relatively limited capacity to capture complex patterns and variations in large datasets.
# Deeper and more complex architectures have since been developed to handle the challenges of diverse image datasets.

#03.Small Input Size:LeNet-5 was designed for small input images (32x32 pixels).
# While suitable for certain tasks, it may struggle with more detailed or high-resolution images commonly encountered in modern computer vision applications.

#04.Sigmoid Saturation:Sigmoid activation functions can saturate, leading to issues like the vanishing gradient problem, where gradients become very small during backpropagation, hindering effective weight updates.

#05.Limited Non-Linearity:The overall architecture of LeNet-5 may exhibit limited non-linearity compared to more recent architectures, potentially restricting its ability to model highly complex relationships in data.

#06.Sensitivity to Initialization:LeNet-5 and similar architectures can be sensitive to weight initialization, and the effectiveness of training may depend on the chosen initialization strategy.

#07.Not Suited for Large and Diverse Datasets:LeNet-5 may struggle to handle the diversity and complexity present in large datasets, limiting its applicability to more challenging image classification tasks.

In [10]:
#Q4. Implement LeNet-5 using a deep learning framework of your choice (e.g., TensorFlow, PyTorch) and train it on a publicly available dataset (e.g., MNIST). Evaluate its performance and provide insights.
#Ans:

In [11]:
pip install torch torchvision



In [12]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# LeNet-5 Architecture
class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, kernel_size=5, stride=1, padding=2)
        self.act1 = nn.Sigmoid()
        self.pool1 = nn.AvgPool2d(kernel_size=2, stride=2)

        self.conv2 = nn.Conv2d(6, 16, kernel_size=5, stride=1)
        self.act2 = nn.Sigmoid()
        self.pool2 = nn.AvgPool2d(kernel_size=2, stride=2)

        self.conv3 = nn.Conv2d(16, 120, kernel_size=5, stride=1)
        self.act3 = nn.Sigmoid()

        self.flatten = nn.Flatten()

        self.fc1 = nn.Linear(120, 84)
        self.act4 = nn.Sigmoid()

        self.fc2 = nn.Linear(84, 10)
        self.act5 = nn.Softmax(dim=1)

    def forward(self, x):
        x = self.pool1(self.act1(self.conv1(x)))
        x = self.pool2(self.act2(self.conv2(x)))
        x = self.act3(self.conv3(x))
        x = self.flatten(x)
        x = self.act4(self.fc1(x))
        x = self.act5(self.fc2(x))
        return x

# Load MNIST dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = datasets.MNIST(root="./data", train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root="./data", train=False, transform=transform, download=True)

# Set up data loaders
train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)

# Initialize the model, optimizer, and loss function
model = LeNet5()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 10
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for images, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

    print(f"Epoch {epoch + 1}/{num_epochs}, Loss: {running_loss / len(train_loader)}")

# Evaluation on the test set
model.eval()
correct = 0
total = 0

with torch.no_grad():
    for images, labels in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = correct / total
print(f"Test Accuracy: {accuracy * 100:.2f}%")

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 138084457.57it/s]

Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw






Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 86897915.23it/s]


Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 40213580.71it/s]


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 14688148.63it/s]


Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw

Epoch 1/10, Loss: 1.9387245119761811
Epoch 2/10, Loss: 1.6223456747750484
Epoch 3/10, Loss: 1.5340590540534143
Epoch 4/10, Loss: 1.5115797218483393
Epoch 5/10, Loss: 1.501482479607881
Epoch 6/10, Loss: 1.4952360626731092
Epoch 7/10, Loss: 1.4908081546012781
Epoch 8/10, Loss: 1.4883036397413405
Epoch 9/10, Loss: 1.4865467439073998
Epoch 10/10, Loss: 1.4840894350365026
Test Accuracy: 98.06%


TOPIC: Analyzing AlexNet

In [13]:
#Q1. Provide an overview of the AlexNet architecture.
#Ans:-AlexNet Architecture Overview:
#AlexNet, introduced by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton in 2012, was a groundbreaking deep convolutional neural network (CNN) that significantly advanced the field of computer vision.
# AlexNet was designed to participate in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) and achieved a substantial improvement in image classification accuracy compared to previous methods.
# Here's an overview of the key components of the AlexNet architecture:
#01.Input Layer:
#Accepts RGB images with a fixed size of 224x224 pixels.

#02.Convolutional Layers (Conv1 to Conv5):
#Conv1:
#96 filters with a kernel size of 11x11.
#Stride of 4 pixels.
#Rectified Linear Unit (ReLU) activation.
#Local Response Normalization (LRN).

#Conv2:
#256 filters with a kernel size of 5x5.
#Stride of 1 pixel.
#ReLU activation.
#LRN.
#Max-pooling with a 3x3 kernel and a stride of 2 pixels.

#Conv3:
#384 filters with a kernel size of 3x3.
#Stride of 1 pixel.
#ReLU activation.

#Conv4:
#384 filters with a kernel size of 3x3.
#Stride of 1 pixel.
#ReLU activation.

#Conv5:
#256 filters with a kernel size of 3x3.
#Stride of 1 pixel.
#ReLU activation.
#Max-pooling with a 3x3 kernel and a stride of 2 pixels.

#03.Fully Connected Layers (FC6 to FC8):
#FC6:
#4096 neurons.
#ReLU activation.
#Dropout for regularization.

#FC7:
#4096 neurons.
#ReLU activation.
#Dropout.

#FC8:
#1000 neurons (output classes for ILSVRC).
#Softmax activation for classification.

#04.Normalization and Pooling:
#Local Response Normalization (LRN) is applied in Conv1 and Conv2 to enhance generalization.
#Max-pooling is used in Conv1, Conv2, and Conv5 layers.

#05.Activation Function:
#Rectified Linear Unit (ReLU) is used as the activation function throughout the convolutional and fully connected layers, except for the output layer where Softmax is applied for classification.

#06.Dropout:
#Dropout is used in FC6 and FC7 layers during training for regularization, preventing overfitting.

#07.Output Layer:
#The output layer has 1000 neurons corresponding to the 1000 classes in the ImageNet dataset.
#Softmax activation is applied for multi-class classification.

#Contributions and Impact:
#AlexNet significantly advanced the field of deep learning and played a crucial role in popularizing deep CNNs for image classification tasks.
#It demonstrated the effectiveness of deep neural networks, particularly CNNs, in handling complex visual data.
#The use of ReLU activation, local response normalization, and dropout contributed to improved training convergence and generalization.

In [14]:
#Q2. Explain the architertural innovations introduced in AlexNet that contriruted to its breakthrough performance.
#Ans:-AlexNet introduced several architectural innovations that contributed to its breakthrough performance in image classification tasks.
#These innovations addressed challenges in training deep neural networks and significantly improved the model's ability to learn and generalize.
# Here are the key architectural innovations in AlexNet:
#01.Deep Architecture:
#AlexNet was one of the first deep convolutional neural networks (CNNs) with a significantly deep architecture, consisting of eight layers, including five convolutional layers and three fully connected layers.
# This depth allowed the network to learn hierarchical features of increasing complexity.

#02.Rectified Linear Units (ReLU):AlexNet replaced traditional activation functions like sigmoid or hyperbolic tangent with Rectified Linear Units (ReLU) in the hidden layers.
# ReLU introduces non-linearity by activating neurons with a simple threshold function (outputting the input for positive values and zero for negative values).
#ReLU helps mitigate the vanishing gradient problem and accelerates convergence during training.

#03.Local Response Normalization (LRN):LRN was applied after the first and second convolutional layers (Conv1 and Conv2).
#LRN normalizes the responses within a local neighborhood, enhancing the contrast between activated and non-activated neurons.
# This kind of normalization acts as a form of lateral inhibition and helps improve the generalization of the model.

#04.Overlapping Pooling:AlexNet introduced overlapping max-pooling operations, which allowed pooling regions to overlap, unlike traditional non-overlapping pooling.
# Overlapping pooling helps capture more spatial information and improves the model's robustness to spatial translations.

#05.Data Augmentation and Dropout:
#To address overfitting, AlexNet employed two regularization techniques: data augmentation and dropout.
#Data Augmentation: The training set was artificially expanded by applying random transformations (e.g., rotations, flips, and crops) to the input images.
# This helped the model generalize better to variations in the input data.
#Dropout: Dropout was applied to the fully connected layers (FC6 and FC7) during training.
# Dropout randomly drops a fraction of neurons during each forward and backward pass, preventing co-adaptation of neurons and improving model generalization.

In [15]:
#Q3. Discuss the role of convolutional layers, pooling layers, and fully connected layers in AlexNet.
#Ans:-Role of Convolutional Layers, Pooling Layers, and Fully Connected Layers in AlexNet:
#01.Convolutional Layers:
#Role:
#Convolutional layers are fundamental in capturing local patterns and spatial hierarchies within the input image. They learn to detect features such as edges, textures, and simple shapes.
#Implementation in AlexNet:
#AlexNet includes five convolutional layers (Conv1 to Conv5) that use filters of varying sizes (e.g., 11x11, 5x5, and 3x3) to capture features at different scales.
#ReLU activation functions are applied after each convolutional operation, introducing non-linearity.

#02.Pooling Layers:
#Role:
#Pooling layers down-sample the spatial dimensions of the feature maps, reducing computational complexity and making the model more robust to variations in object position and scale.
#Implementation in AlexNet:
#AlexNet employs max-pooling in Conv1, Conv2, and Conv5. Max-pooling retains the most prominent features within each pooling region.
#Overlapping pooling (pooling regions with overlap) is used to capture more spatial information.

#03.Fully Connected Layers:
#Role:
#Fully connected layers at the end of the network are responsible for high-level reasoning and decision-making based on the hierarchical features learned by the convolutional layers.
#They combine local features from convolutional layers across the entire input space.

#04.Implementation in AlexNet:
#AlexNet has three fully connected layers (FC6 to FC8) at the end of the architecture.
#FC6 and FC7 have 4096 neurons each with ReLU activation and dropout for regularization.
#FC8 is the output layer with 1000 neurons (representing the classes in ImageNet) and softmax activation for classification.

#05.Role of Data Augmentation and Dropout:
#Data Augmentation:
#Data augmentation, although not a specific layer, is a crucial part of the training process in AlexNet.
# It involves applying random transformations (rotations, flips, crops) to the input images during training, effectively expanding the dataset and improving generalization.

#Dropout:
#Dropout is applied to the fully connected layers (FC6 and FC7) during training.
#It randomly drops a fraction of neurons during each forward and backward pass, preventing overfitting and encouraging the network to be more robust.

#05.Overall Flow:
#The convolutional layers capture low to mid-level features in the input images.
#The pooling layers down-sample the spatial dimensions, preserving essential information.
#Fully connected layers combine the high-level features and make final predictions.

#06.Softmax Activation in Output Layer:
#The softmax activation in the output layer (FC8) converts the raw scores into class probabilities, facilitating multi-class classification.

In [1]:
#Q4.Implement AlexNet using a deep learning framewock of your choice and evaluate its perfocmance on a dataset of your choice.

In [2]:
pip install torch torchvision



In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import torchvision.models as models

# Load CIFAR-10 dataset
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
])

train_dataset = datasets.CIFAR10(root="./data", train=True, transform=transform, download=True)
test_dataset = datasets.CIFAR10(root="./data", train=False, transform=transform, download=True)

train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)

# Initialize AlexNet model
alexnet = models.alexnet()

# Modify the last fully connected layer for CIFAR-10
num_classes = 10
alexnet.classifier[6] = nn.Linear(4096, num_classes)

# Set the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
alexnet = alexnet.to(device)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(alexnet.parameters(), lr=0.001, momentum=0.9)

# Training loop
num_epochs = 5
for epoch in range(num_epochs):
    alexnet.train()
    running_loss = 0.0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = alexnet(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    print(f"Epoch {epoch + 1}/{num_epochs}, Loss: {running_loss / len(train_loader)}")

# Evaluation on the test set
alexnet.eval()
correct = 0
total = 0

with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)

        outputs = alexnet(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = correct / total
print(f"Test Accuracy: {accuracy * 100:.2f}%")

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:01<00:00, 90660265.64it/s]


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified
