# 🚀 Finetuning ResNet for Totally Looks Like Dataset

💡 Totally-Looks-Like is a benchmarking dataset for image similarity detection. Today, we have deep learning models that can determine whether two images are similar or not with a certain level of accuracy. 

❓ If you want to deploy these models in the real world, the accuracy needs to be at par with human perception of image similarity. But how would you increase the accuracy of pre-trained deep learning models? 👉 That's where Finetuner comes in! 

🧠 Finetuner lets you tune the weights of any deep neural network for better embeddings on search tasks.

🎨 In this example, we will finetune ResNet50 on [Totally Looks Like dataset](https://sites.google.com/view/totally-looks-like-dataset) for similar image detection and will see how it affects the accuracy of the model.

### ⏰ Installing & Importing Dependencies

We will start this tutorial by installing the necessary ***pip*** dependencies. 

In [None]:
!pip install gdown
!pip install finetuner
!pip install torchvision

We will import the necessary dependencies.

In [2]:
from os import path
import finetuner as ft
import torchvision
import torch.nn as nn
from finetuner.tuner.pytorch.losses import TripletLoss
from finetuner.tuner.pytorch.miner import TripletEasyHardMiner

### 🔨 Data Preparation 

In this step, we will download the data using the ***gdown*** library. 

We will download the data as two folders -> `left.zip` and `right.zip`

Each of them consists of *6016 images* which can be formed into pairs based on the same file name.

In [None]:
!gdown https://drive.google.com/uc?id=1jvkbTr_giSP3Ru8OwGNCg6B4PvVbcO34
!gdown https://drive.google.com/uc?id=1EzBZUb_mh_Dp_FKD0P4XiYYSd0QBH5zW

We will then unzip the data to be used for further pre-processing.

In [None]:
if path.exists("/content/left") & path.exists("/content/right"):
  print("File directory already exists")
else:
  !unzip left.zip
  !unzip right.zip

### 🕹 Data Pre-Processing & Manipulation (Using [DocArray](https://docarray.jina.ai/))

We will perform the following pre-processing tasks to prepare the data for model input:

1. We will load all images from unzipped `left` and `right` folders and turn them into sorted order as Jina's `DocumentArray`
2. After that, we will do a train/test split (Training Data - 80%, Test Data - 20%)

In [None]:
from docarray import DocumentArray

left_da = DocumentArray.from_files('left/*.jpg')
right_da = DocumentArray.from_files('right/*.jpg')
# we use 80% for training machine learning model.
left_da.sort(key=lambda x: x.uri)
right_da.sort(key=lambda x: x.uri)

ratio = 0.8
train_size = int(ratio * len(left_da))

train_da = left_da[:train_size] + right_da[:train_size]

### 💻 Preparing the training data (Using [Finetuner](https://finetuner.jina.ai/))

After loading data into Jina DocumentArray, we can prepare documents for training using Finetuner that makes the entire process like a breeze. We just have to do the following:

1. Assign a label into each Document named `finetuner_label` as its class name.

2. Pre-process each document:

  *  Load the image from the URI
  *  Normalize the image and reshape the image from `H, W, C` to `C, H W` will `C` is the color channel of the image.





In [None]:
def assign_label_and_preprocess(doc):
    doc.tags['finetuner_label'] = doc.uri.split('/')[1]
    return doc.load_uri_to_image_blob().set_image_blob_normalization().set_image_blob_channel_axis(-1, 0)

In [None]:
train_da.apply(assign_label_and_preprocess)

### ☁ Loading the Pre-trained Model

We will load the pre-trained ResNet50 model from torchvision. Since we want to learn a better `embedding`, the first thing is to see which layer is suitable for use as an `embedding layer`. 

You can call `finetuner.display(model, input_size)` to plot the model architecture.

In [None]:
resnet = torchvision.models.resnet50(pretrained=True)
ft.display(resnet, (3, 224, 224))

Since the model is pre-trained on ImageNet data for classification task, so the output `fc` layer should not be considered as embedding layer. 

Instead, we can use the pooling layer `adaptiveavgpool2d_173` as the output of the embedding model. This layer generates a `2048` dimensional dense embedding as output.

To make the model produce the desired output, you can use the [Tailor](https://finetuner.jina.ai/components/tailor/) component of finetuner. 


### ⏳ Model Finetuning

To finetune any pre-trained model 👉 

*   Plug in the pre-trained model
*   Plug in the training data
*   Configure the hyperparameters





The code below combines [Tailor](https://finetuner.jina.ai/components/tailor/) & [Tuner](https://finetuner.jina.ai/components/tuner/) interface for model fine-tuning.

We will save the returned embedding model as `tuned_model`, given an input image, at inference time, this model will generate a ***2048-dimension vector*** representation of the image.

To undestand the usage of each parameter in detail refer to this [tutorial](https://finetuner.jina.ai/get-started/totally-looks-like/) and the [documentation](https://finetuner.jina.ai/).



In [None]:
tuned_model = ft.fit(
    model=resnet,
    train_data=train_da,
    epochs=6,
    batch_size=128,
    loss=TripletLoss(miner=TripletEasyHardMiner(neg_strategy='hard'), margin=0.3), 
    learning_rate=1e-5,
    device='cuda',
    to_embedding_model=True,
    input_size=(3, 224, 224),
    layer_name='adaptiveavgpool2d_173',
    num_items_per_class=2,
    freeze=['conv2d_1', 'batchnorm2d_2', 'conv2d_5', 'batchnorm2d_6', 'conv2d_8', 'batchnorm2d_9', 'conv2d_11', 'batchnorm2d_12'],
)

### 🔎 Evaluating the embedding quality

We will use the ***hit@10*** method to evaluate the quality of finetuned embeddings with the pre-trained embeedings. 

***hit@10*** means for all the test data, how likely is it for the positive match to be ranked within the top 10 matches with respect to the `query` Document.

We already have the `train_da` ready, now we will perform the same preprocessing on test `DocumentArray`.

In [None]:
def preprocess(doc):
    return doc.load_uri_to_image_blob().set_image_blob_normalization().set_image_blob_channel_axis(-1, 0)

In [None]:
test_left_da = left_da[train_size:]
test_right_da = right_da[train_size:]

test_left_da.apply(preprocess)
test_right_da.apply(preprocess)

We create the embeddings for the test set using the fine-tuned model.

In [None]:
# use finetuned model to create embeddings， only test data
test_left_da.embed(tuned_model, device='cuda')
test_right_da.embed(tuned_model, device='cuda')

We will then match `test_left_da` against `test_right_da`.

You can consider `test_left_da` as user queries, while `test_right_da` is our indexed document collection. 

For each `test_left_da`, match function will find ***top-10*** nearest embeddings in `test_right_da`. And we evaluate result with ***hit@10***

In [None]:
def hit_rate(da, topk=1):
    hit = 0
    for d in da:
        for m in d.matches[:topk]:
            if d.uri.split('/')[-1] == m.uri.split('/')[-1]:
                hit += 1
    return hit/len(da)

In [None]:
test_left_da.match(test_right_da, limit=10)

In [None]:
for k in range(1, 11):
    print(f'hit@{k}:  finetuned: {hit_rate(test_left_da, k):.3f}')

### ⚡ Comparision with pre-trained model

In this section, we will load the pre-trained model and evaluate its embedding on the test data with finetuned model using the ***hit@10*** method.

In the first step, we will chop off the last classification layer and use the model as feature extractor (for creating embeddings).

In [None]:
resnet_pretrained = torchvision.models.resnet50(pretrained=True)
resnet_pretrained.fc = nn.Identity()

In [None]:
# Load and pre-process the test data
test_left_da_pretrained = left_da[train_size:]
test_right_da_pretrained = right_da[train_size:]

test_left_da_pretrained.apply(preprocess)
test_right_da_pretrained.apply(preprocess)

In [None]:
# Use the pre-trained model to create embeddings only test data
test_left_da_pretrained.embed(resnet_pretrained, device='cuda')
test_right_da_pretrained.embed(resnet_pretrained, device='cuda')

In [None]:
test_left_da_pretrained.match(test_right_da_pretrained, limit=10)

In [None]:
for k in range(1, 11):
    print(f'hit@{k}:  pre-trained: {hit_rate(test_left_da_pretrained, k):.3f}')

### ✨ Conclusion
Here, you can clearly see the difference between hit rate of fine-tuned and pre-trained model. The finetuned model performs much better in terms of finding the right match. 

Let’s look at some results from finetuned and pre-trained model side-by-side. It can clearly observed that finetuned model do a better job at finding the similar images.

![](https://finetuner.jina.ai/_images/result-final1.png)


### ✈ Next Steps

Read the finetuner [tutorials](https://finetuner.jina.ai/get-started/totally-looks-like/) and [documentation](https://finetuner.jina.ai/) to understand its functioning in detail. 