# ResNeXt [7 points]
Based on your ResNet implementation in Part I, extend it to ResNeXT. It is expected that your accuracy is higher than ResNet. Compare the results with your VGG and ResNet implementation.

## Step 1: Implement the ResNeXT architecture
Pay close attention to the grouped convolutions and cardinality parameter. Using inbuild ResNeXt model won’t be considered for evaluation.

In [3]:
!unzip -q cnn_dataset.zip -d /content/

In [None]:
from sklearn.model_selection import train_test_split
from torch.utils.data import Subset
import torch.nn as nn
import torch.nn.functional as function
from torch.utils.data import TensorDataset, DataLoader
import torch
from torchvision import datasets, transforms
import pandas as pd
from torchsummary import summary

In [5]:
dataset_path = "cnn_dataset"

transform = transforms.Compose([
    transforms.Resize((96, 96)),
    transforms.ToTensor(),
])

dataset = datasets.ImageFolder(root = dataset_path, transform = transform)
num_classes = len(dataset.classes)
num_images = len(dataset)
class_names = dataset.classes
class_distribution = {cls: 0 for cls in class_names}
class_sums = {cls: torch.zeros((3, 96, 96)) for cls in class_names}


for img, label in dataset:
    class_distribution[class_names[label]] += 1
    class_sums[class_names[label]] += img

dataset_stats = pd.DataFrame({
    "Category": list(class_distribution.keys()),
    "Image Count": list(class_distribution.values())
})

print(dataset_stats)

print("classes:", class_names)

   Category  Image Count
0      dogs        10000
1      food        10000
2  vehicles        10000
classes: ['dogs', 'food', 'vehicles']


In [6]:
def create_one_hot_dataset(original_dataset, num_classes):
    images = []
    one_hot_labels = []

    for img, label in original_dataset:
        images.append(img)
        label_long = torch.tensor(label, dtype=torch.long)
        one_hot = function.one_hot(label_long, num_classes=num_classes)
        one_hot_labels.append(one_hot)

    images_tensor = torch.stack(images)
    labels_tensor = torch.stack(one_hot_labels)

    return TensorDataset(images_tensor, labels_tensor)


dataset = create_one_hot_dataset(dataset, num_classes=num_classes)

In [7]:
indices = list(range(len(dataset)))
train_indices, temp_indices = train_test_split(indices, test_size = 0.30, random_state=42)
val_indices, test_indices = train_test_split(temp_indices, test_size = 0.50, random_state=42)

train_dataset = Subset(dataset, train_indices)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=4)
val_dataset = Subset(dataset, val_indices)
val_loader = DataLoader(val_dataset, batch_size=64,shuffle=False, num_workers=4)
test_dataset = Subset(dataset, test_indices)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False, num_workers=4)

print(f"Training set size: {len(train_dataset)}")
print(f"Validation set size: {len(val_dataset)}")
print(f"Test set size: {len(test_dataset)}")

Training set size: 21000
Validation set size: 4500
Test set size: 4500


In [9]:
class ResNeXtBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1, downsample=None, cardinality=32, base_width=4, dropout_val=0.0):
        super(ResNeXtBlock, self).__init__()
        D = cardinality * base_width
        self.conv1 = nn.Conv2d(in_channels, D, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(D)
        self.conv2 = nn.Conv2d(D, D, kernel_size=3, stride=stride, padding=1, groups=cardinality, bias=False)
        self.bn2 = nn.BatchNorm2d(D)
        self.conv3 = nn.Conv2d(D, out_channels, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.dropout = nn.Dropout(p=dropout_val) if dropout_val > 0 else None

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        if self.dropout is not None:
            out = self.dropout(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)
        out = self.conv3(out)
        out = self.bn3(out)
        out += identity
        out = self.relu(out)
        return out

class ResNeXt(nn.Module):
    def __init__(self, num_classes, cardinality=32, base_width=4, dropout_val=0.0):
        super(ResNeXt, self).__init__()
        self.in_channels = 64
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(64, 3, stride=1, cardinality=cardinality, base_width=base_width, dropout_val=dropout_val)
        self.layer2 = self._make_layer(128, 4, stride=2, cardinality=cardinality, base_width=base_width, dropout_val=dropout_val)
        self.layer3 = self._make_layer(256, 6, stride=2, cardinality=cardinality, base_width=base_width, dropout_val=dropout_val)
        self.layer4 = self._make_layer(512, 3, stride=2, cardinality=cardinality, base_width=base_width, dropout_val=dropout_val)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.dropout = nn.Dropout(p=dropout_val) if dropout_val > 0 else None
        self.fc = nn.Linear(512, num_classes)

    def _make_layer(self, out_channels, blocks, stride, cardinality, base_width, dropout_val):
        downsample = None
        if stride != 1 or self.in_channels != out_channels:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels),
            )
        layers = []
        layers.append(ResNeXtBlock(self.in_channels, out_channels, stride, downsample, cardinality, base_width, dropout_val))
        self.in_channels = out_channels
        for _ in range(1, blocks):
            layers.append(ResNeXtBlock(self.in_channels, out_channels, 1, None, cardinality, base_width, dropout_val))
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        if self.dropout is not None:
            x = self.dropout(x)
        x = self.fc(x)
        return x

In [None]:
model = ResNeXt(num_classes=num_classes, cardinality=32, base_width=4, dropout_val=0.0).to(device)
summary(model, input_size=(3, 224, 224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
            Conv2d-5          [-1, 128, 56, 56]           8,192
       BatchNorm2d-6          [-1, 128, 56, 56]             256
              ReLU-7          [-1, 128, 56, 56]               0
            Conv2d-8          [-1, 128, 56, 56]           4,608
       BatchNorm2d-9          [-1, 128, 56, 56]             256
             ReLU-10          [-1, 128, 56, 56]               0
           Conv2d-11           [-1, 64, 56, 56]           8,192
      BatchNorm2d-12           [-1, 64, 56, 56]             128
             ReLU-13           [-1, 64, 56, 56]               0
     ResNeXtBlock-14           [-1, 64,

In [None]:
class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride = 1, downsample = None, dropout_val = 0.5):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size = 3, stride=stride, padding = 1, bias = False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size = 3, stride = 1, padding = 1, bias = False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.downsample = downsample
        self.dropout = nn.Dropout(p = dropout_val)

    def forward(self, x):
        identity = x

        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = function.relu(out, inplace = True)
        out = self.dropout(out)
        out = self.conv2(out)

        out = self.bn2(out)
        out += identity
        out = function.relu(out, inplace = True)

        return out

class ResNet18(nn.Module):
    def __init__(self, num_classes, dropout_size = 0.5):
        super(ResNet18, self).__init__()
        self.in_channels = 64
        self.dropout_size = dropout_size

        self.conv1 = nn.Conv2d(3, 64, kernel_size = 7, stride = 2, padding = 3, bias = False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride = 2, padding = 1)

        self.layer1 = self.make_layer(64, 2, stride = 1, dropout_size = self.dropout_size)
        self.layer2 = self.make_layer(128, 2, stride = 2, dropout_size = self.dropout_size)
        self.layer3 = self.make_layer(256, 2, stride = 2, dropout_size = self.dropout_size)
        self.layer4 = self.make_layer(512, 2, stride = 2, dropout_size = self.dropout_size)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.dropout = nn.Dropout(p = dropout_size)
        self.fc = nn.Linear(512, num_classes)

    def make_layer(self, out_channels, blocks, stride, dropout_size):
        downsample = None
        if stride != 1 or self.in_channels != out_channels:
            downsample = nn.Sequential(nn.Conv2d(self.in_channels, out_channels, kernel_size = 1, stride = stride, bias = False), nn.BatchNorm2d(out_channels))

        layers = []
        layers.append(ResidualBlock(self.in_channels, out_channels, stride, downsample, dropout_val = dropout_size))

        self.in_channels = out_channels

        for _ in range(1, blocks):
            layers.append(ResidualBlock(out_channels, out_channels))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.dropout(x)
        x = self.fc(x)

        return x


model = ResNet18(num_classes=num_classes).to(device)
summary(model, input_size=(3, 224, 224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
            Conv2d-5           [-1, 64, 56, 56]          36,864
       BatchNorm2d-6           [-1, 64, 56, 56]             128
           Dropout-7           [-1, 64, 56, 56]               0
            Conv2d-8           [-1, 64, 56, 56]          36,864
       BatchNorm2d-9           [-1, 64, 56, 56]             128
    ResidualBlock-10           [-1, 64, 56, 56]               0
           Conv2d-11           [-1, 64, 56, 56]          36,864
      BatchNorm2d-12           [-1, 64, 56, 56]             128
          Dropout-13           [-1, 64, 56, 56]               0
           Conv2d-14           [-1, 64,

## Step 2: Train and evaluate your ResNeXt model
Train and evaluate your ResNeXt model on the same dataset used in Part I.

In [15]:
def train_one_epoch(model, dataloader, criterion, optimizer):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for images, labels in dataloader:
        images, labels = images.to(device), labels.to(device)
        labels = labels.argmax(dim=1).long()
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * images.size(0)
        _, predicted = torch.max(outputs, 1)
        correct += (predicted == labels).sum().item()
        total += labels.size(0)

    epoch_loss = running_loss / total
    epoch_acc = correct / total
    return epoch_loss, epoch_acc


def validate(model, dataloader, criterion):
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0

    with torch.no_grad():
        for images, labels in dataloader:
            images, labels = images.to(device), labels.to(device)
            labels = labels.argmax(dim = 1).long()
            outputs = model(images)
            loss = criterion(outputs, labels)
            running_loss += loss.item() * images.size(0)
            _, predicted = torch.max(outputs, 1)
            correct += (predicted == labels).sum().item()
            total += labels.size(0)

    epoch_loss = running_loss / total
    epoch_acc = correct / total
    return epoch_loss, epoch_acc


def train_model(model, train_loader, val_loader, optim, optim_str, scheduler = None, num_epochs = 10, patience = 3, init_type = 'he', batch_size = 32):
    criterion = nn.CrossEntropyLoss()
    best_val_loss = float('inf')
    best_model_weights = None
    epochs_no_improve = 0
    history = {'train_loss': [], 'train_acc': [], 'val_loss': [],'val_acc': []}

    for epoch in range(num_epochs):
        train_loss, train_acc = train_one_epoch(model, train_loader, criterion, optim)
        val_loss, val_acc = validate(model, val_loader, criterion)

        if scheduler is not None:
            scheduler.step()

        history['train_loss'].append(train_loss)
        history['train_acc'].append(train_acc)
        history['val_loss'].append(val_loss)
        history['val_acc'].append(val_acc)


        print(f"Epoch [{epoch + 1} of {num_epochs}]: " f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f} | "f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}")

        if val_loss < best_val_loss:
            best_val_loss = val_loss
            best_model_weights = model.state_dict()
            epochs_no_improve = 0
        else:
            epochs_no_improve += 1
            if epochs_no_improve >= patience:
                print("Early stopping")
                break

    if best_model_weights is not None:
        model.load_state_dict(best_model_weights)


    return model, history

In [18]:
def experiment_for_resnext_improved(num_classes, optimizer_name='SGD', batch_size=32, num_epochs=20):
    print(f"ResNeXT = {optimizer_name}, batch_size = {batch_size} ---")
    model = ResNeXt(num_classes=num_classes, cardinality=64, base_width=4, dropout_val=0.1).to(device)
    optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=num_epochs)

    model, history = train_model(model, train_loader, val_loader, scheduler=scheduler,
                                 num_epochs=num_epochs, patience=5, optim=optimizer,
                                 optim_str=optimizer_name, batch_size=batch_size)
    test_loss, test_acc = validate(model, test_loader, nn.CrossEntropyLoss())
    print(f"Test Loss: {test_loss:.4f}, Test Accuracy: {test_acc:.4f}")

    return model, history

model_resnext_improved, history_resnext_improved = experiment_for_resnext_improved(num_classes=num_classes, optimizer_name='SGD', batch_size=64, num_epochs=20)
torch.save(model_resnext_improved.state_dict(), "resnext_improved_best_weights.pth")
print("Model saved")

ResNeXT = SGD, batch_size = 64 ---
Epoch [1 of 20]: Train Loss: 0.9341, Train Acc: 0.6817 | Val Loss: 0.5709, Val Acc: 0.7858
Epoch [2 of 20]: Train Loss: 0.4752, Train Acc: 0.8185 | Val Loss: 0.4709, Val Acc: 0.8244
Epoch [3 of 20]: Train Loss: 0.4033, Train Acc: 0.8473 | Val Loss: 0.3870, Val Acc: 0.8433
Epoch [4 of 20]: Train Loss: 0.3664, Train Acc: 0.8625 | Val Loss: 0.4725, Val Acc: 0.8220
Epoch [5 of 20]: Train Loss: 0.3254, Train Acc: 0.8787 | Val Loss: 0.6021, Val Acc: 0.7591
Epoch [6 of 20]: Train Loss: 0.2996, Train Acc: 0.8898 | Val Loss: 0.3928, Val Acc: 0.8678
Epoch [7 of 20]: Train Loss: 0.2631, Train Acc: 0.9019 | Val Loss: 0.5207, Val Acc: 0.8129
Epoch [8 of 20]: Train Loss: 0.2471, Train Acc: 0.9092 | Val Loss: 0.3123, Val Acc: 0.8836
Epoch [9 of 20]: Train Loss: 0.2223, Train Acc: 0.9169 | Val Loss: 0.2898, Val Acc: 0.8993
Epoch [10 of 20]: Train Loss: 0.2100, Train Acc: 0.9231 | Val Loss: 0.2685, Val Acc: 0.8991
Epoch [11 of 20]: Train Loss: 0.1905, Train Acc: 0.928

## Step 3: Compare the performance of your ResNeXt model
Compare the performance of your ResNeXt model against your previous ResNet and VGG models. Provide a detailed analysis of the results.

### a. A table summarizing the performance metrics (accuracy, loss, etc.) for all three models.

<table border="1" cellspacing="0" cellpadding="5">
  <thead>
    <tr>
      <th>Model</th>
      <th>Validation Loss</th>
      <th>Validation Accuracy</th>
      <th>Test Loss</th>
      <th>Test Accuracy</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>ResNet18</td>
      <td>0.2326</td>
      <td>91.27%</td>
      <td>0.2194</td>
      <td>91.64%</td>
    </tr>
    <tr>
      <td>ResNeXt</td>
      <td>0.2243</td>
      <td>93.84%</td>
      <td>0.2050</td>
      <td>93.96%</td>
    </tr>
    <tr>
      <td>VGG16-C</td>
      <td>0.2538</td>
      <td>90.78%</td>
      <td>0.2348</td>
      <td>91.40%</td>
    </tr>
  </tbody>
</table>


### b. Discussion of the observed differences in performance.
Explain why ResNeXt might be outperforming ResNet and VGG. Consider factors like cardinality, grouped convolutions, and the overall architecture.

ResNeXt outperforms ResNet and VGG primarily because it introduces a new aspect called cardinality to the network structure. Cardinality is utilized to refer to the number of parallel transformation path ways per block. Instead of having deeper or wider layers stacked on top of each other, ResNeXt adds a group of transformations in parallel. Through this additions, the network can learn more rich features without a proportional increase in parameters, which leads to better utilization of model capacity.

Another significant advantage is the use of grouped convolutions. In ResNeXts approach, the second convolution is replaced by a grouped convolutions where the input channels are split into several groups and convolution operations are done independently on each group. Not only does this save computational cost, but it also forces the network to learn distinct feature representations in each group. The group output then maintains a high proportion of features such that the network is able to generalize better on complex visual tasks.

We could see that the number of parameters used by resnext is way lesser compared to VCG AND RESNET , less parameters but more accuracy .



### c. Analysis of any challenges encountered during the implementation or training process.

There were some difficulties we faced during development. Hyperparameter tuning was sensitive slight variations in dropout or learning rate would have a significant effect on performance, so achieving the correct balance took a bit of trial and error. The increased complexity of the model with the grouped convolutions and multiple branches also made it more difficult to keep track of dimensions and make sure everything functioned correctly compared to more straightforward architectures. We also saw some volatility in validation accuracy, which had to be carefully watched and countered with techniques like early stopping to avoid overfitting. Overall, although these problems slowed us down a bit, they provided us with greater insight into how to bring out the best in the ResNeXt architecture

### d. Provide detailed analysis of the results.

Training logs indicate consistent decrease in the training loss while the training accuracy increases similarly in the 20 epochs, hitting a high of 99.4% for epoch 19. The validation accuracy continues improving steadily to the highest of about 93.8% among the later epochs while validation loss settles at an average of around 0.22, demonstrating that the model is picking solid features without fitting. Though there are some minor oscillations, a negligible dip in validation accuracy at epochs 5 and 15—the overall trend is decisively positive, with test accuracy of about 93.96% on the final model and minimal test loss of 0.2050. The early stopping condition appears to have been successful at training when performance converged, which indicates that the best ResNeXt architecture with the hyperparameters optimized is efficient and generalizes well to new data.

### 4.	References

https://arxiv.org/abs/1611.05431 -- Resnext

Assignment1 part 1

Assignment1 part 2

https://youtu.be/l7CK-u8InsA?si=jNtLY8Tkaj8DRWLZ - resnext tutorial