<a href="https://colab.research.google.com/github/amritanshukumar2006-collab/Math-for-Deep-Learning-250120-Amritanshu-Kumar/blob/main/End_Term_Assignment_Prcatical.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# End Term Assignment Prcatical
# **Convolutional Neural Network** by using PyTorch


Firstly for the bulding of a Nueral Network from Scratch, we are importing various Modules
Let's import PyTorch



In [1]:
import torch

In [2]:
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# The Perceptron
Now, we are creating a single neuron. This neurol performs a non - linear transformation using an Activation Function (Sigmoidal Function).

Here,

1.   w stands for the weights.
2.   b stands for biases.
We define neural network components by subclassing `nn.Module`.


In [3]:
class Perceptron(nn.Module):
    def __init__(self, input_size):
        super().__init__()
        self.w = nn.Parameter(torch.randn(input_size) * 0.01) #The weights are allocated for each nueron correction and layer
        self.b = nn.Parameter(torch.zeros(1))#These are the biases

    def forward(self, x):
        return torch.sigmoid(torch.matmul(x, self.w) + self.b)  # logits

# The DenseLayer
***The Dense layer is the decision-making part.***
The DenseLayer here will detect all the features togeter and mixes them all. After this, the netural netwrok predcits the most suitable output.
Here, we create a layer by stacking multiple `Perceptron` instances. The output is a vector where each element comes from one `perceptron`.

In [4]:
class DenseLayer(nn.Module):
    def __init__(self, input_size, output_size):
        super().__init__()
        self.layer = nn.ModuleList(
            [Perceptron(input_size) for _ in range(output_size)]
        )

    def forward(self, x):
        return torch.stack([p(x) for p in self.layer], dim=1)

# The Convolution Layer
Here, we are defining the layer.
We are using learnable kernels/filters as `nn.Parameter`

Here, we are defining the Input terms,

N is the Batch Size
C is the input channel
H_in is the height
W_in is the weidth

In [5]:
class ConvLayer(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0):
        super().__init__()

        self.kernel_size = kernel_size
        self.stride = stride
        self.padding = padding

        # These are the learnable filters
        self.weight = nn.Parameter(
            torch.randn(out_channels, in_channels, kernel_size, kernel_size)
        )
        self.bias = nn.Parameter(torch.zeros(out_channels)) #using it as said
    def forward(self, x):
        """
        Convolution via im2col (unfold)
        """
        N, C, H_in, W_in = x.shape

        # Ensuring that H_in and W_in are pure Python integers
        H_in = int(H_in)
        W_in = int(W_in)

        K = self.kernel_size # Defining Kernel Size

        # Extract sliding windows
        # The shape is: (N, C*K*K, L)
        x_unfold = F.unfold(
            x,
            kernel_size=K, #Making the size of kernel k x k
            stride=self.stride,
            padding=self.padding
        )

        # Reshape weights: (out_channels, C*K*K)
        W = self.weight.view(self.weight.size(0), -1) # Each row of W is a flattened convolution filter.

        # Doing the Matrix multiplication
        # (N, out_channels, L)
        out = torch.matmul(W, x_unfold) + self.bias.view(1, -1, 1)

        # Compute output spatial size
        H_out = (H_in + 2*self.padding - K) // self.stride + 1 #The height of the output is defined
        W_out = (W_in + 2*self.padding - K) // self.stride + 1 #The weidth of the output is defined

        # Reshape to feature maps
        out = out.view(N, -1, H_out, W_out)

        return out

# The MaxPoolLayer
The MaxPoolLayer is an important layer that reduces the size of the maain information, removing all the unnecessary information making the computation easier and faster.


---
Using the `nn.Module`


---



```
# x_unfold = F.unfold(x, kernel_size=K, stride=S)
```
Here, this `x_unfold` Extracts k × k patches,
flattens them and then stores them as columns.


After this the shape becomes (N, C, K*K, L)

That needs to be reshaped to (N, C, K*K, L)


In [6]:
class MaxPoolLayer(nn.Module):
    def __init__(self, kernel_size, stride=None):
        super().__init__()
        self.kernel_size = kernel_size
        self.stride = stride if stride else kernel_size
    def forward(self, x):
        """
        Pooling via unfold
        """
        N, C, H, W = x.shape
        K = self.kernel_size
        S = self.stride

        # Unfold
        x_unfold = F.unfold(x, kernel_size=K, stride=S)
        # Shape: (N, C*K*K, L)

        # Reshape to (N, C, K*K, L)
        x_unfold = x_unfold.view(N, C, K*K, -1)

        # Max over pooling window
        out, _ = torch.max(x_unfold, dim=2)

        # Output size
        H_out = (H - K) // S + 1 # Here, we are reshaping the image that we get as the output - Height
        W_out = (W - K) // S + 1 # Here, we are reshaping the image that we get as the output -

        return out.view(N, C, H_out, W_out)

# **The Convolutional Neural Network**
Here we are putting all the layers together making the final Neural Network

In [7]:
class CustomCNN(nn.Module):
    def __init__(self):
        super().__init__()

        self.conv1 = ConvLayer(1, 8, 3, padding=1)
        self.pool1 = MaxPoolLayer(2)

        self.conv2 = ConvLayer(8, 16, 3, padding=1)
        self.pool2 = MaxPoolLayer(2)

        self.fc = DenseLayer(16 * 7 * 7, 10)

    def forward(self, x):
        x = torch.relu(self.conv1(x)) # Here, we are using relu function as the activation function
        x = self.pool1(x)

        x = torch.relu(self.conv2(x)) # Here, we are using relu function as the activation function
        x = self.pool2(x)

        x = x.view(x.size(0), -1)
        return self.fc(x)

Now, we are loading our MNIST Dataset into the batch size of 32

In [8]:
transform = transforms.ToTensor()

train_data = datasets.MNIST("./data", train=True, download=True, transform=transform)
test_data = datasets.MNIST("./data", train=False, download=True, transform=transform)

train_loader = DataLoader(train_data, batch_size=32, shuffle=True)
test_loader = DataLoader(test_data, batch_size=32)


100%|██████████| 9.91M/9.91M [00:00<00:00, 39.4MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 1.15MB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 9.91MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 7.83MB/s]


# **Finally TRAINING our MNIST Dataset**


In [9]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = CustomCNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0005)

for epoch in range(10):
    model.train()
    total_loss = 0

    for x, y in train_loader:
        x, y = x.to(device), y.to(device)

        optimizer.zero_grad()
        out = model(x)
        loss = criterion(out, y)
        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    print(f"Epoch {epoch+1} | Loss: {total_loss/len(train_loader):.4f}")

Epoch 1 | Loss: 1.6168
Epoch 2 | Loss: 1.5771
Epoch 3 | Loss: 1.5702
Epoch 4 | Loss: 1.5668
Epoch 5 | Loss: 1.5642
Epoch 6 | Loss: 1.5254
Epoch 7 | Loss: 1.4844
Epoch 8 | Loss: 1.4825
Epoch 9 | Loss: 1.4810
Epoch 10 | Loss: 1.4800


# Finally Calculating the **Test Accuracy**

In [10]:
model.eval()
correct = 0
total = 0

with torch.no_grad():
    for x, y in test_loader:
        x, y = x.to(device), y.to(device)
        out = model(x)
        pred = out.argmax(dim=1)
        correct += (pred == y).sum().item()
        total += y.size(0)

print(f"Test Accuracy: {100 * correct / total:.2f}%")

Test Accuracy: 98.13%


# **Comparing the Performance of CNN with DNN**
**Convolutional Neural Network**

`Convolutional Neural Network` works better in comparision with the `Deep Neural Network` with the case of the Structured Spatial Data, which comprises of Complex Data Patterns such as images, videos and audios.

The CNN has this ability because of having the ability to exploit local spatial correlations through `Convolutional filters`and `Weight sharing`, this reduces the number trainable parameters. This makes the Computation inexpensive and more efiicient.

CNN enforces inductive bias,

Since, *f(T(x))* = *T(f(x))*

**Deep Neural Network**

DNNs treat all input features as independent and it lacks of inherent spatial inductive bias (*Unlike the CNN*). The `Deep Neural Network` requires more number of trainable parameters to learn complex Data Patterns, making the computation more expensive and less efficient than CNN.