# Cifar10 Preprocessing

In this notebook we cover the preprocessing for the Cifar10 dataset.

In [21]:
from cifar10 import get_cifar10_data
import torchvision.transforms as T
from timm import create_model
from timm.data.transforms_factory import create_transform
from transformers import AutoModelForImageClassification
import torch
import PIL

We start by opening the data and running a transformed image through each model. We utilise the create_transform function to create a transform for each model. We try using the BiT suggested transformation of transforming images of size below 96x96 to 128x128. We suspect well enough that this transformation will not work using the SWIN transformer, but alas we need to try it out.

In [22]:
train_data, test_data = get_cifar10_data()

Files already downloaded and verified
Files already downloaded and verified


In [23]:
example_image, example_label = train_data[0]
print(example_image.size)

(32, 32)


In [24]:
cifar10_transform = create_transform(224, is_training=False)
convnext_model = create_model("convnext_base_in22k", pretrained=True, num_classes=10)
swin_model = AutoModelForImageClassification.from_pretrained(
        "microsoft/swin-base-patch4-window7-224-in22k",
        num_labels=10,
        ignore_mismatched_sizes=True
    )

Some weights of SwinForImageClassification were not initialized from the model checkpoint at microsoft/swin-base-patch4-window7-224-in22k and are newly initialized because the shapes did not match:
- classifier.weight: found shape torch.Size([21841, 1024]) in the checkpoint and torch.Size([10, 1024]) in the model instantiated
- classifier.bias: found shape torch.Size([21841]) in the checkpoint and torch.Size([10]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [25]:
model_input = cifar10_transform(example_image)

In [26]:
convnext_output = convnext_model(model_input.unsqueeze(0))
convnext_output

tensor([[ 0.1638,  0.0234,  0.1389,  0.5137,  0.5145, -0.3699,  0.3191,  0.1650,
          0.0802,  0.0093]], grad_fn=<AddmmBackward0>)

In [27]:
swin_output = swin_model(model_input.unsqueeze(0))
swin_output.logits

tensor([[ 0.3066,  0.3679,  0.3508,  0.3193,  0.3030,  0.2363,  0.0645,  0.5388,
         -0.5157, -0.3150]], grad_fn=<AddmmBackward0>)

We see that this does very well not work. We therefore decide to use 128x128 for the ConvNext and 224X224 for SWIN. On the topic of transformations for cifar10 we choose not to use any random cropping as we scale the image up anyway, making random cropping seriously moot since the we essentially could just choose to not upscale it to such high resolution and therefore keep all the information. With this being decided it actually ends up being the case that our predefined transformation for the SWIN model works best for both models. It is defined below and is used in the cifar10 file.

In [28]:
def get_cifar10_feature_extractor(image_size=(224, 224)):
    return T.Compose([
        T.PILToTensor(),
        T.Resize(image_size, T.InterpolationMode.BILINEAR, antialias=False),
        T.ConvertImageDtype(torch.float32),
        T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])