## Convolutional Neural Networks

CNNs are Neural Networks that are used to classify images. They do so by using `filters` and `convolutions`. `filters` are the weights in CNNs. They're a column vector in NNs(e.g. hidden layer with 3 nodes), and a matrix in CNNs (3x3 filter). One more way how they differ from ordinary NNs is they use the idea of shared weights. 

### Shared Weights
CNNs achieve translational invarince using shared weights. The basic idea is: if a filter detects a horizontal line, then it is intuitive for it to detect the line anywhere in the image irrespective of the location. Hence, there is no need to learn how to detect a line at a different location again and again. Also: __it enormously decreases the number of parameters to learn__. In a normal NN, it would have one weight for every pixel in the image which are too many hyper parameters.  

Steps:
1. Convolution
2. Max Pooling
3. ReLU
4. Flattening
5. Full Connection


### Convolution layer
For an image, we create a filter/kernel (from the same image) that performs element wise multiplication (convolution). Hence using this we can detect edges, and various other information about the image. The convolution matrix is initially initialised with random zero-centered numbers. Later it will automatically learn to figure out various aspects of the image. 

Important thing to remember: depth of filter in current layer equals the number of channels in previous layer. 

### Interpreting Convolutions
Initial convolution layers learn basic things like horizontal lines, vertical lines, small shapes. And as the layers go on increasing they learn more and more high level features. Like a facial recognition model will learn basic lines in the initial layers, then learn nose, eyes, etc in the next layers, then faces in the final layers. 

### Filters
Filters are the weights in CNNs. These filters have depth, another hyperparameter. One filter is a 2D array which can be interpreted as something that learn a shape. We can create an array of such filters thus adding a depth to it. Each filter learns a different element: one might learn horizontal lines, one might learn a basic circle shape, etc. 


### Padding
In padding we add an extra layer of 0s accross the dimension so that adding multiple convolutions won't shrink the dimensions quickly. 


### Output Dimensions
The output dimensions after padding are:

$$W_0 = \frac{W_i - F + 2P}{2}$$


$$H_0 = \frac{H_i - F + 2P}{2}$$

Where, 
W_i is the input width, H_i is input height, F is filter size (it's symmetric), and P is the padding

### Structuring (selecting filter size and depth)
It is suggested that the filter size should be bigger in the initial layers, and depth should be smaller. For subsequent layers, you should be reducing the filter size and increasing the depth. That's because the final layers are high level representations so increase the depth in the final layers means more high level features will be learned. The size should generally be a multiple of 2. 

### Pooling
Pooling is used for following reasons:
1. Translational Invariance: we don't care where the face is for a facial classifier. 
2. Reduction in number of parameters, while respecting the spatial ascept. 
3. Reduce overfitting. 

There are couple of ways of pooling: mean, max. Max pooling is most commonly used with 2x2 ksize and 2 stride. There are also overlapping ksizes in pooling. 

Recently, pooling is not much used because:
1. Datasets are so diverse, we're more concerned about underfitting, than overfitting. 
2. Dropout is a much better regularizer. 
3. Downsampling image results in information loss. 

### ReLU (Activation Function)
This operation add non-linearity in the network. It is a computationally efficient activation function and has many advantages over other. 

### Flattening
This operation simply flattens the matrix, that is, converts the matrix into single column by appending all the rows below one another. 

### Fully connected layer
Once we have the flattened input, imagine this step as creating a neural network with hidden layers. 


## Convolution Layer in PyTorch

In [4]:
import torch
import torch.nn as nn
import torch.nn.functional as F

# output depth
k_output = 64

# Image properties
image_width = 10
image_height = 10
color_channels = 3

# convolutional filter
filter_width = 5
filter_height = 5

# input image (batch_size is dynamic)
input_image = torch.randn((1, color_channels, image_height, image_width))  # PyTorch uses (batch_size, channels, height, width)

# Define the convolution layer (in_channels, out_channels, kernel_size)
conv_layer = nn.Conv2d(in_channels=color_channels, 
                       out_channels=k_output, 
                       kernel_size=(filter_height, filter_width), 
                       stride=(2, 2), 
                       padding=2)  # "SAME" padding equivalent in PyTorch

# Apply convolution
conv = conv_layer(input_image)

# Define and apply bias (conv_layer already has bias by default)
bias = conv_layer.bias
conv = conv + bias.view(1, -1, 1, 1)  # Add bias along the output channels

# Apply max pooling (kernel_size, stride, padding)
conv = F.max_pool2d(conv, kernel_size=2, stride=2, padding=0)

# Apply ReLU activation
conv = F.relu(conv)

print(conv.shape)  # Check the output shape

## Image Classifier in PyTorch


In [9]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

class Classifier(nn.Module):
    def __init__(self):
        super(Classifier, self).__init__()
        
        # 1. Convolution
        # (in_channels, out_channels, kernel_size, stride)
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=3)
        
        # 2. Max Pooling
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # 3. Flattening is done in forward pass with view()
        
        # 4. Fully connected layer (Flattened size needs to be calculated based on input size)
        self.fc1 = nn.Linear(32 * 10 * 10, 128)  # Adjust 32 * 10 * 10 based on input size post pooling
        
        # 5. Output layer
        self.fc2 = nn.Linear(128, 1)

    def forward(self, x):
        # Apply convolution + ReLU activation
        x = F.relu(self.conv1(x))
        
        # Apply max pooling
        x = self.pool(x)
        
        # Flatten the tensor for the fully connected layer
        x = x.view(-1, 32 * 10 * 10)  # Flatten to (batch_size, num_features)
        
        # Fully connected layer + ReLU activation
        x = F.relu(self.fc1(x))
        
        # Output layer with sigmoid activation
        x = torch.sigmoid(self.fc2(x))
        
        return x

# Initialize the model
model = Classifier()

# Compile the model: use optimizer and loss function
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.BCELoss()  # Binary Cross-Entropy for binary classification

# Example input (batch_size=1, channels=3, height=64, width=64)
input_data = torch.randn(1, 3, 64, 64)

# Forward pass
output = model(input_data)
print(output)


In [None]:
import torch
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from torchvision.datasets import ImageFolder

# 1. Data Augmentation for Training Set
train_transform = transforms.Compose([
    transforms.Resize((64, 64)),         # Resize to match the input size expected by the model
    transforms.RandomHorizontalFlip(),   # Horizontal flipping
    transforms.RandomResizedCrop(64),    # Random cropping
    transforms.RandomAffine(degrees=0, shear=0.2),  # Shearing
    transforms.RandomZoom(0.2),          # Random Zooming (may require custom transform or adjusted RandomResizedCrop)
    transforms.ToTensor(),               # Convert the image to a tensor
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])  # Normalization (mean, std for each channel)
])

# 2. Data Preprocessing for Test Set (only rescaling)
test_transform = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# Load the datasets
train_dataset = ImageFolder(root='dataset/training_set', transform=train_transform)
test_dataset = ImageFolder(root='dataset/test_set', transform=test_transform)

# Data Loaders
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

# Assuming you have a model defined as 'model'
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

# Loss function and optimizer
criterion = torch.nn.BCELoss()  # or CrossEntropyLoss for multi-class classification
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Training loop
epochs = 25
steps_per_epoch = len(train_loader)
validation_steps = len(test_loader)

for epoch in range(epochs):
    model.train()  # Set the model to training mode
    running_loss = 0.0
    for i, (inputs, labels) in enumerate(train_loader):
        inputs, labels = inputs.to(device), labels.to(device).float().view(-1, 1)

        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(inputs)

        # Compute the loss
        loss = criterion(outputs, labels)
        
        # Backward pass and optimization
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    print(f"Epoch {epoch+1}/{epochs}, Loss: {running_loss/steps_per_epoch:.4f}")

    # Validation
    model.eval()
    validation_loss = 0.0
    with torch.no_grad():
        for i, (inputs, labels) in enumerate(test_loader):
            inputs, labels = inputs.to(device), labels.to(device).float().view(-1, 1)
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            validation_loss += loss.item()

    print(f"Validation Loss: {validation_loss/validation_steps:.4f}")
