# Homework 2 Solution

Both models, the given one and the modified one here, reach around 95% accuracy for the training data. But, the given one reaches around 75% accuracy over the validation data while the modified model reaches around 85% accuracy over the same. This is an improvement in the performance of the model.
Changes made were to lessen the number of blocks from 4 to 2, which could be helping toprevent overfitting the model to the training data. And, the freuency of downsampling was also reduced from previously downsampling every 2 convolutions to every 4 convolutions. Downsampling, while a useful techniue to deal with large data, also leads to some loss in spatial resolution. So by reducing the freuency, we are preserving more features of the data allowing it to give better results.

In [1]:
import torch, torchvision

In [None]:
from torchvision.transforms import v2
training_data = torchvision.datasets.CIFAR10(
    root="/lus/eagle/projects/datasets/CIFAR-10/",
    train=True,
    download=True,
    transform=v2.Compose([
        v2.ToTensor(),
        v2.RandomHorizontalFlip(),
        v2.RandomResizedCrop(size=32, scale=[0.85,1.0], antialias=False),
        v2.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
    ])
)

test_data = torchvision.datasets.CIFAR10(
    root="/lus/eagle/projects/datasets/CIFAR-10/",
    train=False,
    download=False,
    transform=torchvision.transforms.ToTensor()
)

training_data, validation_data = torch.utils.data.random_split(training_data, [0.8, 0.2], generator=torch.Generator().manual_seed(55))

batch_size = 32
# batch_size = 128

# The dataloader makes our dataset iterable
train_dataloader = torch.utils.data.DataLoader(training_data,
    batch_size=batch_size,
    pin_memory=True,
    shuffle=True,
    num_workers=4)
val_dataloader = torch.utils.data.DataLoader(validation_data,
    batch_size=batch_size,
    pin_memory=True,
    shuffle=False,
    num_workers=4)

In [3]:
dev = torch.device(
    "cuda") if torch.cuda.is_available() else torch.device("cpu")


def preprocess(x, y):
    # CIFAR-10 is *color* images so 3 layers!
    return x.view(-1, 3, 32, 32).to(dev), y.to(dev)


class WrappedDataLoader:
    def __init__(self, dl, func):
        self.dl = dl
        self.func = func

    def __len__(self):
        return len(self.dl)

    def __iter__(self):
        for b in self.dl:
            yield (self.func(*b))


train_dataloader = WrappedDataLoader(train_dataloader, preprocess)
val_dataloader = WrappedDataLoader(val_dataloader, preprocess)

In [4]:
from torch import nn


class Downsampler(nn.Module):

    def __init__(self, in_channels, out_channels, shape, stride=2):
        super(Downsampler, self).__init__()

        self.norm = nn.LayerNorm([in_channels, *shape])

        self.downsample = nn.Conv2d(
            in_channels=in_channels,
            out_channels=out_channels,
            kernel_size = stride,
            stride = stride,
        )

    def forward(self, inputs):


        return self.downsample(self.norm(inputs))



class ConvNextBlock(nn.Module):
    """This block of operations is loosely based on this paper:

    """


    def __init__(self, in_channels, shape):
        super(ConvNextBlock, self).__init__()

        # Depthwise, seperable convolution with a large number of output filters:
        self.conv1 = nn.Conv2d(in_channels=in_channels,
                                     out_channels=in_channels,
                                     groups=in_channels,
                                     kernel_size=[7,7],
                                     padding='same' )

        self.norm = nn.LayerNorm([in_channels, *shape])

        # Two more convolutions:
        self.conv2 = nn.Conv2d(in_channels=in_channels,
                                     out_channels=4*in_channels,
                                     kernel_size=1)

        self.conv3 = nn.Conv2d(in_channels=4*in_channels,
                                     out_channels=in_channels,
                                     kernel_size=1
                                     )


    def forward(self, inputs):
        x = self.conv1(inputs)

        # The normalization layer:
        x = self.norm(x)

        x = self.conv2(x)

        # The non-linear activation layer:
        x = torch.nn.functional.gelu(x)

        x = self.conv3(x)

        # This makes it a residual network:
        return x + inputs


class Classifier(nn.Module):


    def __init__(self, n_initial_filters, n_stages, blocks_per_stage):
        super(Classifier, self).__init__()

        # This is a downsampling convolution that will produce patches of output.

        # This is similar to what vision transformers do to tokenize the images.
        self.stem = nn.Conv2d(in_channels=3,
                                    out_channels=n_initial_filters,
                                    kernel_size=1,
                                    stride=1)

        current_shape = [32, 32]

        self.norm1 = nn.LayerNorm([n_initial_filters,*current_shape])
        # self.norm1 = WrappedLayerNorm()

        current_n_filters = n_initial_filters

        self.layers = nn.Sequential()
        for i, n_blocks in enumerate(range(n_stages)):
            # Add a convnext block series:
            for _ in range(blocks_per_stage):
                self.layers.append(ConvNextBlock(in_channels=current_n_filters, shape=current_shape))
            # Add a downsampling layer:
            if i != n_stages - 1:
                # Skip downsampling if it's the last layer!
                self.layers.append(Downsampler(
                    in_channels=current_n_filters,
                    out_channels=2*current_n_filters,
                    shape = current_shape,
                    )
                )
                # Double the number of filters:
                current_n_filters = 2*current_n_filters
                # Cut the shape in half:
                current_shape = [ cs // 2 for cs in current_shape]



        self.head = nn.Sequential(
            nn.Flatten(),
            nn.LayerNorm(current_n_filters),
            nn.Linear(current_n_filters, 10)
        )
        # self.norm2 = nn.InstanceNorm2d(current_n_filters)
        # # This brings it down to one channel / class
        # self.bottleneck = nn.Conv2d(in_channels=current_n_filters, out_channels=10,
        #                                   kernel_size=1, stride=1)

    def forward(self, inputs):

        x = self.stem(inputs)
        # Apply a normalization after the initial patching:
        x = self.norm1(x)

        # Apply the main chunk of the network:
        x = self.layers(x)

        # Normalize and readout:
        x = nn.functional.avg_pool2d(x, x.shape[2:])
        x = self.head(x)

        return x



        # x = self.norm2(x)
        # x = self.bottleneck(x)

        # # Average pooling of the remaining spatial dimensions (and reshape) makes this label-like:
        # return nn.functional.avg_pool2d(x, kernel_size=x.shape[-2:]).reshape((-1,10))

In [5]:
def evaluate(dataloader, model, loss_fn, val_bar):
    # Set the model to evaluation mode - some NN pieces behave differently during training
    # Unnecessary in this situation but added for best practices
    model.eval()
    size = len(dataloader)
    num_batches = len(dataloader)
    loss, correct = 0, 0

    # We can save computation and memory by not calculating gradients here - we aren't optimizing
    with torch.no_grad():
        # loop over all of the batches
        for X, y in dataloader:

            pred = model(X)
            loss += loss_fn(pred, y).item()
            # how many are correct in this batch? Tracking for accuracy
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
            val_bar.update()

    loss /= num_batches
    correct /= (size*batch_size)

    accuracy = 100*correct
    return accuracy, loss

In [6]:
def train_one_epoch(dataloader, model, loss_fn, optimizer, progress_bar):
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        # forward pass
        pred = model(X)
        loss = loss_fn(pred, y)

        # backward pass calculates gradients
        loss.backward()

        # take one step with these gradients
        optimizer.step()

        # resets the gradients
        optimizer.zero_grad()

        progress_bar.update()

In [9]:
from torchinfo import summary

model = Classifier(64, 2, 4)
model.cuda()

print(summary(model, input_size=(batch_size, 3, 32, 32)))

Layer (type:depth-idx)                   Output Shape              Param #
Classifier                               [32, 10]                  --
├─Conv2d: 1-1                            [32, 64, 32, 32]          256
├─LayerNorm: 1-2                         [32, 64, 32, 32]          131,072
├─Sequential: 1-3                        [32, 128, 16, 16]         --
│    └─ConvNextBlock: 2-1                [32, 64, 32, 32]          --
│    │    └─Conv2d: 3-1                  [32, 64, 32, 32]          3,200
│    │    └─LayerNorm: 3-2               [32, 64, 32, 32]          131,072
│    │    └─Conv2d: 3-3                  [32, 256, 32, 32]         16,640
│    │    └─Conv2d: 3-4                  [32, 64, 32, 32]          16,448
│    └─ConvNextBlock: 2-2                [32, 64, 32, 32]          --
│    │    └─Conv2d: 3-5                  [32, 64, 32, 32]          3,200
│    │    └─LayerNorm: 3-6               [32, 64, 32, 32]          131,072
│    │    └─Conv2d: 3-7                  [32, 256, 32, 

In [90]:
from tqdm.notebook import tqdm

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=0.001, weight_decay=0.01)

epochs = 30
for j in range(epochs):
    with tqdm(total=len(train_dataloader), position=0, leave=True, desc=f"Train Epoch {j}") as train_bar:
        train_one_epoch(train_dataloader, model, loss_fn, optimizer, train_bar)

    # checking on the training loss and accuracy once per epoch

    with tqdm(total=len(train_dataloader), position=0, leave=True, desc=f"Validate (train) Epoch {j}") as train_eval:
        acc, loss = evaluate(train_dataloader, model, loss_fn, train_eval)

        print(f"Epoch {j}: training loss: {loss:.3f}, accuracy: {acc:.3f}")
    with tqdm(total=len(val_dataloader), position=0, leave=True, desc=f"Validate Epoch {j}") as val_bar:

        acc_val, loss_val = evaluate(val_dataloader, model, loss_fn, val_bar)
        print(f"Epoch {j}: validation loss: {loss_val:.3f}, accuracy: {acc_val:.3f}")


Train Epoch 0:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 0:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 0: training loss: 1.436, accuracy: 48.042


Validate Epoch 0:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 0: validation loss: 1.423, accuracy: 48.193


Train Epoch 1:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 1:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 1: training loss: 1.163, accuracy: 58.620


Validate Epoch 1:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 1: validation loss: 1.161, accuracy: 58.636


Train Epoch 2:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 2:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 2: training loss: 0.921, accuracy: 67.575


Validate Epoch 2:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 2: validation loss: 0.951, accuracy: 66.254


Train Epoch 3:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 3:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 3: training loss: 0.819, accuracy: 71.367


Validate Epoch 3:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 3: validation loss: 0.870, accuracy: 69.339


Train Epoch 4:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 4:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 4: training loss: 0.690, accuracy: 76.278


Validate Epoch 4:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 4: validation loss: 0.777, accuracy: 73.023


Train Epoch 5:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 5:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 5: training loss: 0.557, accuracy: 80.805


Validate Epoch 5:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 5: validation loss: 0.648, accuracy: 77.416


Train Epoch 6:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 6:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 6: training loss: 0.556, accuracy: 80.588


Validate Epoch 6:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 6: validation loss: 0.675, accuracy: 76.168


Train Epoch 7:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 7:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 7: training loss: 0.445, accuracy: 84.328


Validate Epoch 7:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 7: validation loss: 0.577, accuracy: 79.583


Train Epoch 8:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 8:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 8: training loss: 0.415, accuracy: 85.513


Validate Epoch 8:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 8: validation loss: 0.586, accuracy: 79.762


Train Epoch 9:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 9:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 9: training loss: 0.359, accuracy: 87.578


Validate Epoch 9:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 9: validation loss: 0.558, accuracy: 80.931


Train Epoch 10:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 10:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 10: training loss: 0.312, accuracy: 89.195


Validate Epoch 10:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 10: validation loss: 0.512, accuracy: 82.758


Train Epoch 11:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 11:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 11: training loss: 0.310, accuracy: 89.192


Validate Epoch 11:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 11: validation loss: 0.524, accuracy: 82.428


Train Epoch 12:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 12:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 12: training loss: 0.288, accuracy: 89.910


Validate Epoch 12:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 12: validation loss: 0.522, accuracy: 82.788


Train Epoch 13:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 13:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 13: training loss: 0.277, accuracy: 90.362


Validate Epoch 13:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 13: validation loss: 0.543, accuracy: 82.308


Train Epoch 14:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 14:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 14: training loss: 0.231, accuracy: 91.960


Validate Epoch 14:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 14: validation loss: 0.503, accuracy: 83.526


Train Epoch 15:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 15:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 15: training loss: 0.198, accuracy: 93.252


Validate Epoch 15:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 15: validation loss: 0.496, accuracy: 84.056


Train Epoch 16:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 16:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 16: training loss: 0.182, accuracy: 93.750


Validate Epoch 16:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 16: validation loss: 0.512, accuracy: 84.365


Train Epoch 17:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 17:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 17: training loss: 0.170, accuracy: 94.117


Validate Epoch 17:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 17: validation loss: 0.498, accuracy: 84.874


Train Epoch 18:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 18:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 18: training loss: 0.182, accuracy: 93.498


Validate Epoch 18:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 18: validation loss: 0.510, accuracy: 83.437


Train Epoch 19:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 19:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 19: training loss: 0.162, accuracy: 94.412


Validate Epoch 19:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 19: validation loss: 0.510, accuracy: 84.315


Train Epoch 20:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 20:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 20: training loss: 0.147, accuracy: 95.017


Validate Epoch 20:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 20: validation loss: 0.481, accuracy: 85.094


Train Epoch 21:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 21:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 21: training loss: 0.140, accuracy: 94.910


Validate Epoch 21:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 21: validation loss: 0.497, accuracy: 84.994


Train Epoch 22:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 22:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 22: training loss: 0.126, accuracy: 95.608


Validate Epoch 22:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 22: validation loss: 0.472, accuracy: 85.144


Train Epoch 23:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 23:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 23: training loss: 0.126, accuracy: 95.720


Validate Epoch 23:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 23: validation loss: 0.506, accuracy: 84.934


Train Epoch 24:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 24:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 24: training loss: 0.126, accuracy: 95.678


Validate Epoch 24:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 24: validation loss: 0.514, accuracy: 85.264


Train Epoch 25:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 25:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 25: training loss: 0.116, accuracy: 96.035


Validate Epoch 25:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 25: validation loss: 0.498, accuracy: 84.954


Train Epoch 26:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 26:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 26: training loss: 0.112, accuracy: 96.028


Validate Epoch 26:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 26: validation loss: 0.512, accuracy: 85.313


Train Epoch 27:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 27:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 27: training loss: 0.134, accuracy: 95.457


Validate Epoch 27:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 27: validation loss: 0.548, accuracy: 84.345


Train Epoch 28:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 28:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 28: training loss: 0.094, accuracy: 96.838


Validate Epoch 28:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 28: validation loss: 0.499, accuracy: 85.693


Train Epoch 29:   0%|          | 0/1250 [00:00<?, ?it/s]

Validate (train) Epoch 29:   0%|          | 0/1250 [00:00<?, ?it/s]

Epoch 29: training loss: 0.104, accuracy: 96.295


Validate Epoch 29:   0%|          | 0/313 [00:00<?, ?it/s]

Epoch 29: validation loss: 0.526, accuracy: 84.984
