# Convolutional Neural Network (CNN)

SimpleCNN class:

The network consists of two convolutional layers, two max pooling layers, and two fully connected layers.

- conv1: First convolutional layer with 1 input channel (grayscale image), 32 output channels, and a 3x3 kernel.

- conv2: Second convolutional layer with 32 input channels, 64 output channels, and a 3x3 kernel.

- pool: Max pooling layer with a 2x2 window and stride of 2.

- fc1: First fully connected layer with 64 * 7 * 7 input features and 128 output features.

- fc2: Second fully connected layer with 128 input features and 10 output features (for 10 digit classes).

- relu: ReLU activation function used after each convolutional and the first fully connected layer.

Forward Pass:

- The input goes through conv1, ReLU activation, and max pooling.

- Then it passes through conv2, ReLU activation, and max pooling.

- The output is flattened and fed into the fully connected layers.

- The final output has 10 dimensions, corresponding to the 10 digit classes.


In [None]:
%pip install torchvision

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

In [None]:
# Define the CNN architecture
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(64 * 7 * 7, 128)
        self.fc2 = nn.Linear(128, 10)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.pool(self.relu(self.conv1(x)))
        x = self.pool(self.relu(self.conv2(x)))
        x = x.view(-1, 64 * 7 * 7)
        x = self.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Data Preparation:

- We use the MNIST dataset, which contains 28x28 grayscale images of handwritten digits.

- The data is normalized to have values between -1 and 1.

In [None]:
# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load and preprocess the MNIST dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])

# Load and preprocess the MNIST dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])

trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

testset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False)

Initialize the model, loss function, and optimizer

In [None]:
model = SimpleCNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training Process:

- We use CrossEntropyLoss as our loss function, which is suitable for multi-class classification.

- Adam optimizer is used to update the model parameters.

- The training loop runs for 5 epochs, processing the entire dataset in each epoch.

In [None]:
num_epochs = 5
for epoch in range(num_epochs):
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data[0].to(device), data[1].to(device)
        
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        if i % 100 == 99:
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 100:.3f}')
            running_loss = 0.0

print('Finished Training')


# Evaluation:

After training, we evaluate the model on the test set to measure its accuracy.

In [None]:
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data[0].to(device), data[1].to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy on the test set: {100 * correct / total}%')


# Conclusion

This CNN learns to recognize patterns in the input images through its convolutional layers. The first layer might detect simple features like edges, while the second layer can combine these to detect more complex patterns. The fully connected layers at the end interpret these features to classify the digit.
The max pooling layers help to reduce the spatial dimensions of the data, making the network more computationally efficient and helping it to be invariant to small translations in the input.

This is a simple CNN architecture suitable for a task like MNIST digit recognition. More complex tasks might require deeper networks with more convolutional layers, different types of layers (like batch normalization), or advanced architectures like ResNet or Inception.