# Code challenge: self-supervised learning and embeddings generation
Import the necessary libraries and modules:

In [84]:
import torch # load pytorch for machine learning
import torch.nn as nn # load module for neural networks
import torchvision.transforms as tvtran # load module for transformations to transform / augment data 
import torchvision.datasets as tvdat # load module for handling datasets
import torchvision.models as tvmod
from torchvision.models import resnet18, ResNet18_Weights

## Load the dataset of images
Load images from the root folder, transform to pytorch tensors, resize to 256 square pixels, then crop to 224x224 pixels:

In [111]:
transforms = tvtran.Compose([
    tvtran.ToTensor(),
    tvtran.Resize(256),
    tvtran.CenterCrop(224),
    ])

x_raw = tvdat.ImageFolder('../CodeChallenge/data/', transform=transforms) 

## Image transformations
### Check to see if image sizes are the same
Logical test to see if all the images in the dataset are the same size. If not, they will need rescaling to same size:

In [104]:
im_sz_same = all([x_raw[i][0].shape[1]==x_raw[0][0].shape[1] and x_raw[i][0].shape[2]==x_raw[0][0].shape[2] for i in range(len(raw_im_sizes))])

print("Images are the same size" if im_sz_same else "N.B. Images are not the same size, therefore will need rescaling")

N.B. Images are not the same size, therefore will need rescaling


### Resize images

In [107]:
x_scale = [tvtran.Resize(100)(xx) for xx in x_raw]

TypeError: Unexpected type <class 'tuple'>

Create a data loader for inputting images into the model:

In [None]:
trainloader = DataLoader(train_dataset, batch_size=32, shuffle=True, num_workers=2)

## Load a pre-trained computer vision model
Use a slimline pre-trained computer vision model (ResNet18) due to lack of compute resources. Sufficient for proof-of-concept. This produces a 512-dim feature vector for each image:

In [28]:
base = resnet18(weights=ResNet18_Weights.IMAGENET1K_V1) # load ResNet18 as the base encoder

This model has 1000 classes in its output layer. However, we want the feature vector output of the penultimate layer, so we truncate the base model to remove the final layer. 

In [29]:
base_trunc = nn.Sequential(*list(base.children())[:-1]) # remove final classification layer

In [None]:
decode_im

In [30]:
# run base model on image
base_trunc(my_data_raw[0][0])

TypeError: conv2d() received an invalid combination of arguments - got (Image, Parameter, NoneType, tuple, tuple, tuple, int), but expected one of:
 * (Tensor input, Tensor weight, Tensor bias = None, tuple of ints stride = 1, tuple of ints padding = 0, tuple of ints dilation = 1, int groups = 1)
      didn't match because some of the arguments have invalid types: ([31;1mImage[0m, [31;1mParameter[0m, [31;1mNoneType[0m, [31;1mtuple of (int, int)[0m, [31;1mtuple of (int, int)[0m, [31;1mtuple of (int, int)[0m, [31;1mint[0m)
 * (Tensor input, Tensor weight, Tensor bias = None, tuple of ints stride = 1, str padding = "valid", tuple of ints dilation = 1, int groups = 1)
      didn't match because some of the arguments have invalid types: ([31;1mImage[0m, [31;1mParameter[0m, [31;1mNoneType[0m, [31;1mtuple of (int, int)[0m, [31;1mtuple of (int, int)[0m, [31;1mtuple of (int, int)[0m, [31;1mint[0m)


## Projection head
Define the projection head (multi-layer perceptron) to add to the base encoder. Contains one hidden layer (512-->512) and outputs 128-dim vector 

In [11]:
proj_head = nn.Sequential(
    nn.Linear(512, 512), # hidden layer
    nn.ReLU(), # activation function
    nn.Linear(512, 128) # output layer
)

## Combined model
Combine the base encoder with the projection head:

In [13]:
def CombinedModel(x):
    return proj_head(base_trunc(x))