<a href="https://colab.research.google.com/github/Redcoder815/Deep_Learning_PyTorch/blob/main/19NiN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import torch
from torch import nn
import torchvision.transforms as transforms
from torch.utils import data
from torchvision import datasets
import torch.optim as optim

In [2]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device

device(type='cuda', index=0)

In [3]:
def nin_block(out_channels, kernel_size, strides, padding):
    return nn.Sequential(
        nn.LazyConv2d(out_channels, kernel_size, strides, padding), nn.ReLU(),
        nn.LazyConv2d(out_channels, kernel_size=1), nn.ReLU(),
        nn.LazyConv2d(out_channels, kernel_size=1), nn.ReLU())

The nn.AdaptiveAvgPool2d layer in PyTorch is a type of pooling operation that automatically adjusts the kernel size and stride to output a tensor of a specified size, regardless of the input size. It's commonly used at the end of convolutional neural networks to prepare feature maps for the fully connected layers.

Specifically, nn.AdaptiveAvgPool2d((1, 1)) will take any input size and output a tensor where the spatial dimensions (height and width) are both 1. This effectively computes the average of each feature map across all its spatial dimensions, resulting in a single value per feature map. This is useful for classification tasks, as it provides a fixed-size representation of the features before passing them to a linear layer for final classification.

In [4]:
class NiN(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()
        self.net = nn.Sequential(
            nin_block(96, kernel_size=11, strides=4, padding=0),
            nn.MaxPool2d(3, stride=2),
            nin_block(256, kernel_size=5, strides=1, padding=2),
            nn.MaxPool2d(3, stride=2),
            nin_block(384, kernel_size=3, strides=1, padding=1),
            nn.MaxPool2d(3, stride=2),
            nn.Dropout(0.5),
            nin_block(num_classes, kernel_size=3, strides=1, padding=1),
            nn.AdaptiveAvgPool2d((1, 1)),
            nn.Flatten())
    def forward(self, X):
      return self.net(X)

In [5]:
model = NiN()
model.to(device)

NiN(
  (net): Sequential(
    (0): Sequential(
      (0): LazyConv2d(0, 96, kernel_size=(11, 11), stride=(4, 4))
      (1): ReLU()
      (2): LazyConv2d(0, 96, kernel_size=(1, 1), stride=(1, 1))
      (3): ReLU()
      (4): LazyConv2d(0, 96, kernel_size=(1, 1), stride=(1, 1))
      (5): ReLU()
    )
    (1): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (2): Sequential(
      (0): LazyConv2d(0, 256, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
      (1): ReLU()
      (2): LazyConv2d(0, 256, kernel_size=(1, 1), stride=(1, 1))
      (3): ReLU()
      (4): LazyConv2d(0, 256, kernel_size=(1, 1), stride=(1, 1))
      (5): ReLU()
    )
    (3): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (4): Sequential(
      (0): LazyConv2d(0, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): ReLU()
      (2): LazyConv2d(0, 384, kernel_size=(1, 1), stride=(1, 1))
      (3): ReLU()
      (4): LazyConv2d(0, 384, kernel

In [6]:
batch_size = 256

In [7]:
Transform = transforms.Compose([
    transforms.Resize((227, 227)),
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

The num_workers parameter in PyTorch's DataLoader specifies how many subprocesses to use for data loading.

When num_workers is set to 0 (the default), data will be loaded in the main process. This can be slow, especially for large datasets or complex data transformations, as the main process has to do both data loading and model training.
When num_workers is set to a value greater than 0 (like 2 in your code), the DataLoader will use that many separate processes to fetch and preprocess the data in parallel. This can significantly speed up training by ensuring that the CPU is busy preparing the next batch of data while the GPU (or CPU) is busy training the model on the current batch.
Setting num_workers=2 means that two worker processes will be launched to load the data concurrently, potentially making the data loading more efficient.

In [8]:
mnist_train = datasets.FashionMNIST(root="../data", train=True, transform=Transform, download=True)
mnist_val = datasets.FashionMNIST(root="../data", train=False, transform=Transform, download=True)

train_iter = data.DataLoader(mnist_train, batch_size, shuffle=True, num_workers=2)
val_iter = data.DataLoader(mnist_val, batch_size, shuffle=False, num_workers=2)

In [9]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [10]:
max_epochs = 3

In [11]:
for epoch in range(max_epochs):
  model.train()
  train_loss_sum, train_accuracy_sum, n = 0.0, 0.0, 0
  for images, labels in train_iter:
    images, labels = images.to(device), labels.to(device)
    y_pred = model(images)
    l = criterion(y_pred, labels)
    optimizer.zero_grad()
    l.backward()
    optimizer.step()
    train_loss_sum += l
    predicted_labels = torch.argmax(y_pred, dim=1)
    train_accuracy_sum += (predicted_labels == labels).float().sum()
    n += labels.numel()

  model.eval()
  test_accuracy_sum, test_n = 0.0, 0
  with torch.no_grad():
    for images, labels in val_iter:
      images, labels = images.to(device), labels.to(device)
      y_pred = model(images)
      predicted_labels = torch.argmax(y_pred, dim=1)
      test_accuracy_sum += (predicted_labels == labels).float().sum()
      test_n += labels.numel()
  test_accuracy = test_accuracy_sum / test_n
  print(f'Epoch {epoch + 1}, Loss: {train_loss_sum / n:.4f}, Train Accuracy: {train_accuracy_sum / n:.4f}, Validation Accuracy: {test_accuracy:.4f}')

Epoch 1, Loss: 0.0065, Train Accuracy: 0.3827, Validation Accuracy: 0.6803
Epoch 2, Loss: 0.0028, Train Accuracy: 0.7430, Validation Accuracy: 0.7836
Epoch 3, Loss: 0.0021, Train Accuracy: 0.7994, Validation Accuracy: 0.8220
