In [1]:
from fastai.vision.all import *
from self_supervised.augmentations import *
from self_supervised.layers import *
from self_supervised.swav import *

## Embedding Extraction

In this tutorial we will take a look at how to extract emebddings using the encoders trained with any of the self-supervised learning algorithms in this repo. Here, we will use SWAV algorithm as an example. Below code shows how to create a Learner for training and this might be something you have already done.

In [2]:
sqrmom=0.99
mom=0.95
beta=0.
eps=1e-4
opt_func = partial(ranger, mom=mom, sqr_mom=sqr_mom, eps=eps, beta=beta)

In [3]:
def get_dls(size, bs, workers=None):
    path = URLs.IMAGEWANG_160 if size <= 160 else URLs.IMAGEWANG
    source = untar_data(path)
    
    files = get_image_files(source)
    tfms = [[PILImage.create, ToTensor, RandomResizedCrop(size, min_scale=1.)], 
            [parent_label, Categorize()]]
    
    dsets = Datasets(files, tfms=tfms, splits=RandomSplitter(valid_pct=0.1)(files))
    
    batch_tfms = [IntToFloatTensor]
    dls = dsets.dataloaders(bs=bs, num_workers=workers, after_batch=batch_tfms)
    return dls

In [4]:
bs=96
resize, size = 256, 224

In [5]:
arch = "xresnet34"
encoder = create_encoder(arch, pretrained=False, n_in=3)

In [None]:
dls = get_dls(resize, bs)
model = create_swav_model(encoder, n_in=3)
learn = Learner(dls, model, SWAVLoss(),
                cbs=[SWAV(aug_func=get_batch_augs,
                          crop_sizes=[size,int(3/4*size)], #4/3 - large to small crop size ratio is an important hyperparam for optimization! If we kept small crop 96 and just increased 128 to 192 training has hard time
                          num_crops=[2,6],
                          min_scales=[0.25,0.2],
                          max_scales=[1.0,0.35],                
                          rotate=True,
                          rotate_deg=10,
                          jitter=True,
                          bw=True,
                          blur=False
                          ),
                     TerminateOnNaNCallback(),
                     ])

I already had a model trained so for demonstration purposes I will just load it. 

In [7]:
epochs=100
load_name = f'swav_iwang_sz{size}_epc{epochs}'
learn.load(load_name)

<fastai.learner.Learner at 0x7f074d24dd90>

### Saving Your Model

Once a model is trained you can simply save either the `model weights`, `encoder weights` or directly the `Learner` itself. We will use fastai and pytorch for this. You can skip this part if you already have either one of them saved already.

In [9]:
save_name = f'swav_iwang_sz{size}_epc{epochs}' 

In [15]:
learn.save(save_name) # saves whole model weights with_opt=True by default
torch.save(learn.model.encoder.state_dict(), learn.path/learn.model_dir/f'{save_name}_encoder.pth') # saves only the encoder state dict
learn.export(learn.path/learn.model_dir/f'{save_name}_export.pkl') # saves whole Learner

We can simply see all of them saved successfully.

In [10]:
(learn.path/learn.model_dir).ls().filter(lambda o: save_name in o.stem)

(#3) [Path('models/swav_iwang_sz224_epc100_encoder.pth'),Path('models/swav_iwang_sz224_epc100.pth'),Path('models/swav_iwang_sz224_epc100_export.pkl')]

### Extracting with Learner

If your input items are compatible with the Dataloaders/dls you used during training then using the saved `Learner` object is the most straightforward solution for embedding extraction. For example the dls we used for training uses the following tfms for getting the input ready, so it expects a filename.

```[PILImage.create, ToTensor, RandomResizedCrop(size, min_scale=1.)]```

First, let's take a look at how to embed everything we have in an efficient manner.

In [11]:
path = URLs.IMAGEWANG_160 if size <= 160 else URLs.IMAGEWANG
source = untar_data(path)
embedding_files = get_image_files(source)

Next let's create a dataframe to later use during visualization with some neat functional programming using L object from fastcore.

In [12]:
df = pd.DataFrame({"filename":embedding_files})

df['split'] = (embedding_files).map(lambda o: o.parent.parent.name)
df['split'] = df['split'].replace(to_replace='imagewang', value='unsup')
df['label'] = (embedding_files).map(lambda o: o.parent.name)

In [13]:
df

Unnamed: 0,filename,split,label
0,/root/.fastai/data/imagewang/train/n02096294/n02096294_2289.JPEG,train,n02096294
1,/root/.fastai/data/imagewang/train/n02096294/ILSVRC2012_val_00037999.JPEG,train,n02096294
2,/root/.fastai/data/imagewang/train/n02096294/n02096294_4119.JPEG,train,n02096294
3,/root/.fastai/data/imagewang/train/n02096294/n02096294_6689.JPEG,train,n02096294
4,/root/.fastai/data/imagewang/train/n02096294/n02096294_4839.JPEG,train,n02096294
...,...,...,...
26343,/root/.fastai/data/imagewang/unsup/n02105641_10903.JPEG,unsup,unsup
26344,/root/.fastai/data/imagewang/unsup/n02093754_898.JPEG,unsup,unsup
26345,/root/.fastai/data/imagewang/unsup/n02111889_17867.JPEG,unsup,unsup
26346,/root/.fastai/data/imagewang/unsup/n02111889_5998.JPEG,unsup,unsup


Now, we can load the learner directly and create a test_dl for embedding extraction, also don't forget to disable cpu loading if you have a GPU!

In [14]:
learn = load_learner("./models/swav_iwang_sz224_epc100_export.pkl", cpu=False)

In [15]:
learn.swav.augs[0]

Pipeline: RandomResizedCrop -> RandomHorizontalFlip -> ColorJitter -> RandomGrayscale -> Rotate -- {'size': None, 'mode': 'bilinear', 'pad_mode': 'reflection', 'mode_mask': 'nearest', 'align_corners': True, 'p': 1.0} -> Normalize -- {'mean': tensor([[[[0.4850]],

         [[0.4560]],

         [[0.4060]]]], device='cuda:0'), 'std': tensor([[[[0.2290]],

         [[0.2240]],

         [[0.2250]]]], device='cuda:0'), 'axes': (0, 2, 3)}

At this point you can either add a `Hook` to `learn.model.encoder` or overwrite `learn.model=learn.model.encoder`. I will overwrite it since I won't need the remaining part of the model in this notebook.

In [16]:
learn.model = learn.model.encoder; learn.model[-2:]

Sequential(
  (8): AdaptiveConcatPool2d(
    (ap): AdaptiveAvgPool2d(output_size=1)
    (mp): AdaptiveMaxPool2d(output_size=1)
  )
  (9): Flatten(full=False)
)

We also need to disable SWAV callback and add back `Normalization` if any used during training.

**Important** All of the augmentations including `Normalization` happen inside self-supervised learning callbacks. By default `imagenet_stats` is used, but it can be different in your case. In this learner example, images were first resized to 256 for batching and further 224 crops were used during SWAV training. You can try both resolution during embedding extraction to see which one is better.

In [17]:
learn.cbs = L([cb for cb in learn.cbs if cb.name != 'swav']); learn.cbs

(#4) [TrainEvalCallback,Recorder,ProgressCallback,TerminateOnNaNCallback]

Make sure to use correct tfms Pipeline.

In [50]:
emb_dl = learn.dls.test_dl(embedding_files, after_batch=[IntToFloatTensor(), Normalize.from_stats(*imagenet_stats)])

We can see that images will be resized to 256x256, if you want to change this or any data tfms in general you can skip to **Extracting with Custom Dataset** section below. `tfms` is used for creating data from the source, e.g. a filename, then `after_item` is used to apply transforms to each created data sample individually, and finally `after_batch` is used to apply transforms to collated batches.

In [54]:
emb_dl.tfms, emb_dl.after_item, emb_dl.after_batch

(Pipeline: PILBase.create -> RandomResizedCrop -- {'size': (256, 256), 'min_scale': 1.0, 'ratio': (0.75, 1.3333333333333333), 'resamples': (2, 0), 'val_xtra': 0.14, 'p': 1.0} -> ToTensor,
 Pipeline: ,
 Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} -> Normalize -- {'mean': tensor([[[[0.4850]],
 
          [[0.4560]],
 
          [[0.4060]]]], device='cuda:0'), 'std': tensor([[[[0.2290]],
 
          [[0.2240]],
 
          [[0.2250]]]], device='cuda:0'), 'axes': (0, 2, 3)})

We need to set activation function for predictions as no operation because it's set for classification output.

In [24]:
embeddings, _ = learn.get_preds(dl=emb_dl, act=noop) 

Check the embeddings and finally save.

In [25]:
embeddings,  embeddings.shape, embeddings.norm(dim=1)

(TensorBase([[ 0.9732,  5.4093,  1.0725,  ...,  0.2657,  0.2560,  1.6736],
         [ 2.0293,  3.4570,  0.4585,  ...,  0.1125,  0.5919,  1.3086],
         [ 4.1430,  5.6190, 18.5848,  ...,  0.5565,  0.6212,  1.8319],
         ...,
         [ 2.3026,  1.6281,  3.4819,  ...,  0.7759,  0.5221,  0.3157],
         [ 5.6842,  2.0450,  0.6188,  ...,  0.2059,  0.1493,  0.4630],
         [ 3.0677,  4.3904,  2.5808,  ...,  0.3809,  0.4606,  0.9369]]),
 torch.Size([26348, 1024]),
 TensorBase([102.1530, 123.4563, 147.9856,  ..., 202.9648, 139.4411, 107.1795]))

In [26]:
emb_dir = Path("./embeddings")
if not emb_dir.exists(): emb_dir.mkdir()
torch.save(embeddings, emb_dir/'swav_iwang_embeddings.pth')
df.to_csv(emb_dir/'iwang.csv',index=False)

### Extracting with Custom Dataset

Sometimes you might have a inputs which are not compatible with training dls. In those cases, you can create your own custom dls but making sure to use the same item tfms you used during training to get meaningful representations, such as `RandomResizedCrop`, image resolution and `Normalize`. I will add a dummy function to my tfms list which will mimic your own custom scenario for reading and generating inputs. Also let's assume this dataset is unlabaled, so for `y_tfms` we will pass an empty list.

In [55]:
def dummy_func(x):
    # your own transform for reading
    return x

In [56]:
tfms = [[dummy_func, PILImage.create, ToTensor, RandomResizedCrop(size=256, min_scale=1.)], []]
dsets = Datasets(embedding_files, tfms=tfms, splits=RandomSplitter(valid_pct=0.1)(embedding_files))
batch_tfms = [IntToFloatTensor, Normalize.from_stats(*imagenet_stats)]
dls = dsets.dataloaders(bs=bs, num_workers=4, after_batch=batch_tfms)
emb_dl = dls.test_dl(embedding_files)

Could not do one pass in your dataloader, there is something wrong in it
Could not do one pass in your dataloader, there is something wrong in it


In [57]:
emb_dl.tfms, emb_dl.after_item, emb_dl.after_batch

(Pipeline: dummy_func -> PILBase.create -> RandomResizedCrop -- {'size': (256, 256), 'min_scale': 1.0, 'ratio': (0.75, 1.3333333333333333), 'resamples': (2, 0), 'val_xtra': 0.14, 'p': 1.0} -> ToTensor,
 Pipeline: ,
 Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} -> Normalize -- {'mean': tensor([[[[0.4850]],
 
          [[0.4560]],
 
          [[0.4060]]]], device='cuda:0'), 'std': tensor([[[[0.2290]],
 
          [[0.2240]],
 
          [[0.2250]]]], device='cuda:0'), 'axes': (0, 2, 3)})

We get the same embeddings as learner but using a custom data Pipeline, so that you can modify it for your own needs.

In [58]:
embeddings, _ = learn.get_preds(dl=emb_dl, act=noop) # we need to set activation as no operation because it's set for classification output

In [59]:
embeddings,  embeddings.shape, embeddings.norm(dim=1)

(TensorBase([[ 0.9732,  5.4093,  1.0725,  ...,  0.2657,  0.2560,  1.6736],
         [ 2.0293,  3.4570,  0.4585,  ...,  0.1125,  0.5919,  1.3086],
         [ 4.1430,  5.6190, 18.5848,  ...,  0.5565,  0.6212,  1.8319],
         ...,
         [ 2.3026,  1.6281,  3.4819,  ...,  0.7759,  0.5221,  0.3157],
         [ 5.6842,  2.0450,  0.6188,  ...,  0.2059,  0.1493,  0.4630],
         [ 3.0677,  4.3904,  2.5808,  ...,  0.3809,  0.4606,  0.9369]]),
 torch.Size([26348, 1024]),
 TensorBase([102.1530, 123.4563, 147.9856,  ..., 202.9648, 139.4411, 107.1795]))

In [60]:
emb_dir = Path("./embeddings")
if not emb_dir.exists(): emb_dir.mkdir()
torch.save(embeddings, emb_dir/'swav_iwang_embeddings.pth')
df.to_csv(emb_dir/'iwang.csv',index=False)

### Extracting with PyTorch Boilerplate

If you've only saved model/encoder weights and not learner, then in that case you need to follow an approach like below. 

In [65]:
arch = "xresnet34"
encoder = create_encoder(arch, pretrained=False, n_in=3)

In [76]:
encoder.load_state_dict(torch.load( learn.path/learn.model_dir/f'{save_name}_encoder.pth'))
encoder.eval().cuda();

In [77]:
embeddings = []
for xb in progress_bar(emb_dl):
    xb = xb[0]
    embeddings += [to_detach(encoder(xb))]
embeddings = torch.cat(embeddings)

In [81]:
embeddings,  embeddings.shape, embeddings.norm(dim=1)

(TensorBase([[ 0.9732,  5.4093,  1.0725,  ...,  0.2657,  0.2560,  1.6736],
         [ 2.0293,  3.4570,  0.4585,  ...,  0.1125,  0.5919,  1.3086],
         [ 4.1430,  5.6190, 18.5848,  ...,  0.5565,  0.6212,  1.8319],
         ...,
         [ 2.3026,  1.6281,  3.4819,  ...,  0.7759,  0.5221,  0.3157],
         [ 5.6842,  2.0450,  0.6188,  ...,  0.2059,  0.1493,  0.4630],
         [ 3.0677,  4.3904,  2.5808,  ...,  0.3809,  0.4606,  0.9369]]),
 torch.Size([26348, 1024]),
 TensorBase([102.1530, 123.4563, 147.9856,  ..., 202.9648, 139.4411, 107.1795]))

### Visualize Embeddings

One way of qualitative analysis you can do is to visualize learned representations/embeddings. For this purpose [Nvidia Rapids cuML](https://docs.rapids.ai/api/cuml/stable/) library can be used, it's very fast due to GPU support.