In [1]:
from fastai.vision.all import *
from self_supervised.augmentations import *
from self_supervised.layers import *
from self_supervised.swav import *

## Embedding Extraction

In this tutorial we will take a look at how to extract emebddings using the encoders trained with any of the self-supervised learning algorithms in this repo. Here, we will use SWAV algorithm as an example. Below code shows how to create a Learner for training and this might be something you have already done.

In [2]:
sqrmom=0.99
mom=0.95
beta=0.
eps=1e-4
opt_func = partial(ranger, mom=mom, sqr_mom=sqrmom, eps=eps, beta=beta)

In [3]:
def get_dls(size, bs, workers=None):
    path = URLs.IMAGEWANG_160 if size <= 160 else URLs.IMAGEWANG
    source = untar_data(path)
    
    files = get_image_files(source)
    tfms = [[PILImage.create, ToTensor, RandomResizedCrop(size, min_scale=1.)], 
            [parent_label, Categorize()]]
    
    dsets = Datasets(files, tfms=tfms, splits=RandomSplitter(valid_pct=0.1)(files))
    
    batch_tfms = [IntToFloatTensor]
    dls = dsets.dataloaders(bs=bs, num_workers=workers, after_batch=batch_tfms)
    return dls

In [4]:
bs=96
resize, size = 256, 224

In [5]:
arch = "xresnet34"
encoder = create_encoder(arch, pretrained=False, n_in=3)

In [6]:
dls = get_dls(resize, bs)
model = create_swav_model(encoder, n_in=3)
learn = Learner(dls, model, SWAVLoss(),
                cbs=[SWAV(aug_func=get_batch_augs,
                          crop_sizes=[size,int(3/4*size)], #4/3 - large to small crop size ratio is an important hyperparam for optimization! If we kept small crop 96 and just increased 128 to 192 training has hard time
                          num_crops=[2,6],
                          min_scales=[0.25,0.2],
                          max_scales=[1.0,0.35],                
                          rotate=True,
                          rotate_deg=10,
                          jitter=True,
                          bw=True,
                          blur=False
                          ),
                     TerminateOnNaNCallback(),
                     ])

Pipeline: RandomResizedCrop -> RandomHorizontalFlip -> ColorJitter -> RandomGrayscale -> Rotate -- {'size': None, 'mode': 'bilinear', 'pad_mode': 'reflection', 'mode_mask': 'nearest', 'align_corners': True, 'p': 1.0} -> Normalize -- {'mean': tensor([[[[0.4850]],

         [[0.4560]],

         [[0.4060]]]], device='cuda:0'), 'std': tensor([[[[0.2290]],

         [[0.2240]],

         [[0.2250]]]], device='cuda:0'), 'axes': (0, 2, 3)}
Pipeline: RandomResizedCrop -> RandomHorizontalFlip -> ColorJitter -> RandomGrayscale -> Rotate -- {'size': None, 'mode': 'bilinear', 'pad_mode': 'reflection', 'mode_mask': 'nearest', 'align_corners': True, 'p': 1.0} -> Normalize -- {'mean': tensor([[[[0.4850]],

         [[0.4560]],

         [[0.4060]]]], device='cuda:0'), 'std': tensor([[[[0.2290]],

         [[0.2240]],

         [[0.2250]]]], device='cuda:0'), 'axes': (0, 2, 3)}
Pipeline: RandomResizedCrop -> RandomHorizontalFlip -> ColorJitter -> RandomGrayscale -> Rotate -- {'size': None, 'mode': 'b

I already had a model trained so for demonstration purposes I will just load it. 

In [10]:
epochs=100
load_name = f'swav_iwang_sz{size}_epc{epochs}'
learn.load(load_name)

<fastai.learner.Learner at 0x7f1fa53d36d0>

### Saving Your Model

Once a model is trained you can simply save either the `model weights`, `encoder weights` or directly the `Learner` itself. We will use fastai and pytorch for this. You can skip this part if you already have either one of them saved already.

In [15]:
save_name = f'swav_iwang_sz{size}_epc{epochs}' 
learn.save(save_name) # saves whole model weights with_opt=True by default
torch.save(learn.model.encoder.state_dict(), learn.path/learn.model_dir/f'{save_name}_encoder.pth') # saves only the encoder state dict
learn.export(learn.path/learn.model_dir/f'{save_name}_export.pkl') # saves whole Learner

We can simply see all of them saved successfully.

In [19]:
(learn.path/learn.model_dir).ls().filter(lambda o: save_name in o.stem)

(#3) [Path('models/swav_iwang_sz224_epc100_encoder.pth'),Path('models/swav_iwang_sz224_epc100.pth'),Path('models/swav_iwang_sz224_epc100_export.pkl')]

### Extracting with Learner

If your input items are compatible with the Dataloaders/dls you used during training then using the saved `Learner` object is the most straightforward solution for embedding extraction. For example the dls we used for training uses the following tfms for getting the input ready, so it expects a filename.

```[PILImage.create, ToTensor, RandomResizedCrop(size, min_scale=1.)]```

First, let's take a look at how to embed everything we have in an efficient manner.

In [7]:
path = URLs.IMAGEWANG_160 if size <= 160 else URLs.IMAGEWANG
source = untar_data(path)
embedding_files = get_image_files(source)

Next let's create a dataframe to later use during visualization with some neat functional programming using L object from fastcore.

In [11]:
df = pd.DataFrame({"filename":embedding_files})

df['split'] = (embedding_files).map(lambda o: o.parent.parent.name)
df['split'] = df['split'].replace(to_replace='imagewang', value='unsup')
df['label'] = (embedding_files).map(lambda o: o.parent.name)

In [12]:
df

Unnamed: 0,filename,split,label
0,/root/.fastai/data/imagewang/train/n02096294/n02096294_2289.JPEG,train,n02096294
1,/root/.fastai/data/imagewang/train/n02096294/ILSVRC2012_val_00037999.JPEG,train,n02096294
2,/root/.fastai/data/imagewang/train/n02096294/n02096294_4119.JPEG,train,n02096294
3,/root/.fastai/data/imagewang/train/n02096294/n02096294_6689.JPEG,train,n02096294
4,/root/.fastai/data/imagewang/train/n02096294/n02096294_4839.JPEG,train,n02096294
...,...,...,...
26343,/root/.fastai/data/imagewang/unsup/n02105641_10903.JPEG,unsup,unsup
26344,/root/.fastai/data/imagewang/unsup/n02093754_898.JPEG,unsup,unsup
26345,/root/.fastai/data/imagewang/unsup/n02111889_17867.JPEG,unsup,unsup
26346,/root/.fastai/data/imagewang/unsup/n02111889_5998.JPEG,unsup,unsup


Now, we can load the learner directly and create a test_dl for embedding extraction, also don't forget to disable cpu loading if you have a GPU!

In [180]:
learn = load_learner("./models/swav_iwang_sz224_epc100_export.pkl", cpu=False)

In [181]:
learn.swav.augs[0]

Pipeline: RandomResizedCrop -> RandomHorizontalFlip -> ColorJitter -> RandomGrayscale -> Rotate -- {'size': None, 'mode': 'bilinear', 'pad_mode': 'reflection', 'mode_mask': 'nearest', 'align_corners': True, 'p': 1.0} -> Normalize -- {'mean': tensor([[[[0.4850]],

         [[0.4560]],

         [[0.4060]]]], device='cuda:0'), 'std': tensor([[[[0.2290]],

         [[0.2240]],

         [[0.2250]]]], device='cuda:0'), 'axes': (0, 2, 3)}

At this point you can either add a `Hook` to `learn.model.encoder` or overwrite `learn.model=learn.model.encoder`. I will overwrite it since I won't need the remaining part of the model in this notebook.

In [182]:
learn.model = learn.model.encoder; learn.model[-2:]

Sequential(
  (8): AdaptiveConcatPool2d(
    (ap): AdaptiveAvgPool2d(output_size=1)
    (mp): AdaptiveMaxPool2d(output_size=1)
  )
  (9): Flatten(full=False)
)

We also need to disable SWAV callback and add back `Normalization` if any used during training.

**Important** All of the augmentations including `Normalization` happen inside self-supervised learning callbacks. By default `imagenet_stats` are used but make sure to check 

In [183]:
learn.cbs = L([cb for cb in learn.cbs if cb.name != 'swav']); learn.cbs

(#4) [TrainEvalCallback,Recorder,ProgressCallback,TerminateOnNaNCallback]

In [184]:
learn.dls.after_batch

Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1}

In [185]:
emb_dl1 = learn.dls.test_dl(embedding_files); emb_dl1.after_batch

Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1}

In [186]:
xb = emb_dl1.one_batch(); xb[0][0]

TensorImage([[[0.0627, 0.0627, 0.0627,  ..., 0.0275, 0.0275, 0.0275],
         [0.0627, 0.0627, 0.0627,  ..., 0.0275, 0.0275, 0.0275],
         [0.0627, 0.0627, 0.0627,  ..., 0.0275, 0.0275, 0.0275],
         ...,
         [0.1686, 0.1686, 0.1686,  ..., 0.1373, 0.1373, 0.1373],
         [0.1725, 0.1686, 0.1686,  ..., 0.1373, 0.1373, 0.1373],
         [0.1725, 0.1686, 0.1686,  ..., 0.1373, 0.1373, 0.1373]],

        [[0.0275, 0.0275, 0.0275,  ..., 0.0275, 0.0275, 0.0275],
         [0.0275, 0.0275, 0.0275,  ..., 0.0275, 0.0275, 0.0275],
         [0.0275, 0.0275, 0.0275,  ..., 0.0275, 0.0275, 0.0275],
         ...,
         [0.0588, 0.0549, 0.0549,  ..., 0.0510, 0.0510, 0.0510],
         [0.0588, 0.0549, 0.0549,  ..., 0.0510, 0.0510, 0.0510],
         [0.0588, 0.0549, 0.0549,  ..., 0.0510, 0.0510, 0.0510]],

        [[0.0392, 0.0392, 0.0392,  ..., 0.0275, 0.0275, 0.0275],
         [0.0392, 0.0392, 0.0392,  ..., 0.0275, 0.0275, 0.0275],
         [0.0392, 0.0392, 0.0392,  ..., 0.0275, 0.027

In [187]:
emb_dl2 = learn.dls.test_dl(embedding_files); emb_dl2.after_batch

Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1}

In [188]:
emb_dl2.after_batch.add(Normalize.from_stats(*imagenet_stats)); emb_dl2.after_batch

Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} -> Normalize -- {'mean': tensor([[[[0.4850]],

         [[0.4560]],

         [[0.4060]]]], device='cuda:0'), 'std': tensor([[[[0.2290]],

         [[0.2240]],

         [[0.2250]]]], device='cuda:0'), 'axes': (0, 2, 3)}

In [189]:
xb = emb_dl2.one_batch(); xb[0][0]

TensorImage([[[-1.8439, -1.8439, -1.8439,  ..., -1.9980, -1.9980, -1.9980],
         [-1.8439, -1.8439, -1.8439,  ..., -1.9980, -1.9980, -1.9980],
         [-1.8439, -1.8439, -1.8439,  ..., -1.9980, -1.9980, -1.9980],
         ...,
         [-1.3815, -1.3815, -1.3815,  ..., -1.5185, -1.5185, -1.5185],
         [-1.3644, -1.3815, -1.3815,  ..., -1.5185, -1.5185, -1.5185],
         [-1.3644, -1.3815, -1.3815,  ..., -1.5185, -1.5185, -1.5185]],

        [[-1.9132, -1.9132, -1.9132,  ..., -1.9132, -1.9132, -1.9132],
         [-1.9132, -1.9132, -1.9132,  ..., -1.9132, -1.9132, -1.9132],
         [-1.9132, -1.9132, -1.9132,  ..., -1.9132, -1.9132, -1.9132],
         ...,
         [-1.7731, -1.7906, -1.7906,  ..., -1.8081, -1.8081, -1.8081],
         [-1.7731, -1.7906, -1.7906,  ..., -1.8081, -1.8081, -1.8081],
         [-1.7731, -1.7906, -1.7906,  ..., -1.8081, -1.8081, -1.8081]],

        [[-1.6302, -1.6302, -1.6302,  ..., -1.6824, -1.6824, -1.6824],
         [-1.6302, -1.6302, -1.6302,  ..

In [191]:
emb_dl3 = learn.dls.test_dl(embedding_files); emb_dl3.after_batch

Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} -> Normalize -- {'mean': tensor([[[[0.4850]],

         [[0.4560]],

         [[0.4060]]]], device='cuda:0'), 'std': tensor([[[[0.2290]],

         [[0.2240]],

         [[0.2250]]]], device='cuda:0'), 'axes': (0, 2, 3)}

In [192]:
xb = emb_dl3.one_batch(); xb[0][0]

TensorImage([[[-1.8439, -1.8439, -1.8439,  ..., -1.9980, -1.9980, -1.9980],
         [-1.8439, -1.8439, -1.8439,  ..., -1.9980, -1.9980, -1.9980],
         [-1.8439, -1.8439, -1.8439,  ..., -1.9980, -1.9980, -1.9980],
         ...,
         [-1.3815, -1.3815, -1.3815,  ..., -1.5185, -1.5185, -1.5185],
         [-1.3644, -1.3815, -1.3815,  ..., -1.5185, -1.5185, -1.5185],
         [-1.3644, -1.3815, -1.3815,  ..., -1.5185, -1.5185, -1.5185]],

        [[-1.9132, -1.9132, -1.9132,  ..., -1.9132, -1.9132, -1.9132],
         [-1.9132, -1.9132, -1.9132,  ..., -1.9132, -1.9132, -1.9132],
         [-1.9132, -1.9132, -1.9132,  ..., -1.9132, -1.9132, -1.9132],
         ...,
         [-1.7731, -1.7906, -1.7906,  ..., -1.8081, -1.8081, -1.8081],
         [-1.7731, -1.7906, -1.7906,  ..., -1.8081, -1.8081, -1.8081],
         [-1.7731, -1.7906, -1.7906,  ..., -1.8081, -1.8081, -1.8081]],

        [[-1.6302, -1.6302, -1.6302,  ..., -1.6824, -1.6824, -1.6824],
         [-1.6302, -1.6302, -1.6302,  ..

In [195]:
type(learn.dls)

fastai.data.core.DataLoaders

In [196]:
type(emb_dl1)

fastai.data.core.TfmdDL

In [206]:
hex(id(learn.dls.valid.after_batch))

'0x7f4a2cde2350'

In [201]:
id(emb_dl1.after_batch), id(emb_dl2.after_batch), id(emb_dl3.after_batch), id(learn.dls.after_batch)

(139956557032656, 139956338041104, 139956568997712, 139956564855888)

In [103]:
embeddings, _ = learn.get_preds(dl=emb_dl, act=noop) # we need to set activation as no operation because it's set for classification output

Check the embeddings and finally save

In [83]:
embeddings,  embeddings.shape, embeddings.norm(dim=1)

(TensorBase([[ 0.9732,  5.4093,  1.0725,  ...,  0.2657,  0.2560,  1.6736],
         [ 2.0293,  3.4570,  0.4585,  ...,  0.1125,  0.5919,  1.3086],
         [ 4.1430,  5.6190, 18.5848,  ...,  0.5565,  0.6212,  1.8319],
         ...,
         [ 2.3026,  1.6281,  3.4819,  ...,  0.7759,  0.5221,  0.3157],
         [ 5.6842,  2.0450,  0.6188,  ...,  0.2059,  0.1493,  0.4630],
         [ 3.0677,  4.3904,  2.5808,  ...,  0.3809,  0.4606,  0.9369]]),
 torch.Size([26348, 1024]),
 TensorBase([102.1530, 123.4563, 147.9856,  ..., 202.9648, 139.4411, 107.1795]))

In [84]:
emb_dir = Path("./embeddings")
if not emb_dir.exists(): emb_dir.mkdir()
torch.save(embeddings, emb_dir/'swav_iwang_embeddings.pth')
df.to_csv(emb_dir/'iwang.csv',index=False)

### Extracting with PyTorch

Sometimes you might have a inputs which are not compatible with training dls. In those cases, you can create your own custom dls but making sure to use the same item tfms you used during training, such as `RandomResizedCrop`, image resolution and `Normalize`. I will add a dummy function to my tfms list which will mimic your own custom scenario for reading and generating inputs. 

In [41]:
def dummy_func(x):
    # your own transform for reading
    return x

In [149]:
files = get_image_files(source)
tfms = [[dummy_func, PILImage.create, ToTensor, RandomResizedCrop(size, min_scale=1.)], 
        [parent_label, Categorize()]]

dsets = Datasets(files, tfms=tfms, splits=RandomSplitter(valid_pct=0.1)(files))
batch_tfms = [IntToFloatTensor, Normalize.from_stats(*imagenet_stats)]
dls = dsets.dataloaders(bs=bs, num_workers=4, after_batch=batch_tfms)

In [150]:
emb_dl = dls.test_dl(embedding_files)

In [151]:
emb_dl.after_batch

Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} -> Normalize -- {'mean': tensor([[[[0.4850]],

         [[0.4560]],

         [[0.4060]]]], device='cuda:0'), 'std': tensor([[[[0.2290]],

         [[0.2240]],

         [[0.2250]]]], device='cuda:0'), 'axes': (0, 2, 3)}

In [90]:
embeddings, _ = learn.get_preds(dl=emb_dl, act=noop) # we need to set activation as no operation because it's set for classification output

In [91]:
embeddings,  embeddings.shape, embeddings.norm(dim=1)

(TensorBase([[ 0.9235,  6.0359,  1.4319,  ...,  0.2793,  0.2447,  1.6558],
         [ 1.0574,  1.2625,  0.1994,  ...,  0.1196,  0.6004,  1.2695],
         [ 4.5818,  4.8277, 19.6919,  ...,  0.5332,  0.7392,  1.7014],
         ...,
         [ 1.8617,  1.8730,  4.1641,  ...,  0.8961,  0.5835,  0.2820],
         [ 5.7593,  1.8655,  0.4892,  ...,  0.2413,  0.1738,  0.4212],
         [ 2.1063,  3.4830,  2.9620,  ...,  0.3681,  0.4959,  0.9483]]),
 torch.Size([26348, 1024]),
 TensorBase([103.5237, 113.3170, 138.3017,  ..., 181.9094, 127.4829, 102.1634]))