# 1. Image search using [SIFT](https://www.thepythoncode.com/article/sift-feature-extraction-using-opencv-in-python)

Let's think about information retrieval in the context of image search. How can we find images similar to a query in a fast way (faster than doing pair-wise comparison with all images in a database)? How can we identify same objects taken in slightly different contexts? 

One way to do this is to find special points of interest in every image, so called keypoints (or descriptors), which characterize the image and which are more or less invariant to scaling, orientation, illumination changes, and some other distortions. There are several algorithms available that identify such keypoints, and today we will focus on [SIFT](https://en.wikipedia.org/wiki/Scale-invariant_feature_transform). 

Your task is to apply SIFT to a dataset of images and enable similar images search.

## Get dataset

We will use `Caltech 101` dataset, download it from [here](https://data.caltech.edu/records/mzrjq-6wc02). It consists of pictures of objects belonging to 101 categories. About 40 to 800 images per category. Most categories have about 50 images. The size of each image is roughly 300 x 200 pixels.

## SIFT example

Below is the example of SIFT keyponts extraction using `opencv`. [This](https://docs.opencv.org/trunk/da/df5/tutorial_py_sift_intro.html) is a dedicated tutorial, and [this](https://docs.opencv.org/master/dc/dc3/tutorial_py_matcher.html) is another tutorial you may need to find matches between two images (use in your code `cv.drawMatches()` function to display keypoint matches).

In [1]:
!pip install opencv-python opencv-contrib-python
# or use https://huggingface.co/datasets/will33am/Caltech101
!pip install datasets

Collecting opencv-python
  Downloading opencv_python-4.7.0.72-cp37-abi3-win_amd64.whl (38.2 MB)
     --------------------------------------- 38.2/38.2 MB 16.0 MB/s eta 0:00:00
Collecting opencv-contrib-python
  Downloading opencv_contrib_python-4.7.0.72-cp37-abi3-win_amd64.whl (44.9 MB)
     ---------------------------------------- 44.9/44.9 MB 9.9 MB/s eta 0:00:00
Installing collected packages: opencv-python, opencv-contrib-python
Successfully installed opencv-contrib-python-4.7.0.72 opencv-python-4.7.0.72


In [2]:
from datasets import load_dataset
load_dataset("will33am/Caltech101")
ds = load_dataset("will33am/Caltech101")

FileNotFoundError: Couldn't find a dataset script at c:\Users\user\Documents\githubWorkspace\3rd_year_2\IR_Labs\12\will33am\Caltech101\Caltech101.py or any data file in the same directory. Couldn't find 'will33am/Caltech101' on the Hugging Face Hub either: FileNotFoundError: Dataset 'will33am/Caltech101' doesn't exist on the Hub. If the repo is private or gated, make sure to log in with `huggingface-cli login`.

In [None]:
test, train = ds['test'], ds['train']

CLASSES = sorted(list(set([row['label'] for row in train])))
print(CLASSES)

strawberries = [row for row in train if row['label'] == 'strawberry']
wrenches = [row for row in train if row['label'] == 'wrench']
wrenches[17]

In [None]:
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt

# img_dir = '../../101_ObjectCategories'
img = np.array(wrenches[17]['image'])
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)

# older versions of OpenCV
# sift = cv.xfeatures2d.SIFT_create()
sift = cv.SIFT_create()

kp = sift.detect(gray, None)
# use detectAndCompute(...) to get descriptors themselves

print(f"1st keypoint Location ({kp[0].pt[0]:.2f}, {kp[0].pt[1]:.2f})")
print(f"1st keypoint Radius: {kp[0].size};  angle:{kp[0].angle}")
img=cv.drawKeypoints(gray, kp, img, flags=cv.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
plt.imshow(img)
plt.show()

### Discussion

**Q**: Discuss what you see here. What is the meaning of circle diameter? Of the angle?

## Index of keypoints

Let's suppose we've found image descriptors. How do we find similar images, having this information? In our case the descriptors are 128-dinensional vectors per keypoint, and there can be hundreds of such points. To enable fast search of similar images, you will index descriptors of all images using some data structure for approximate nearest neighbors search, such as Navigable Small World or Annoy. Then, for a (new) query image you will generate descriptors, and for each of these descriptors you will find its nearest neighbors (using Euclidean or Cosine distance, which you prefer). Finally, you will sort potential similar images (retrieved from neighbor descriptors) by frequency with which they appear in the nearest neighbors (more matches - higher the rank).

### Build an index

Read all images, saving category information. For every image generate SIFT descriptors and index them.

In [None]:
# read all images and add their descriptors to index
import glob
import numpy as np
from tqdm import tqdm

def generate_sift_descriptors(img):    

    # TODO return keypoints and their descriptors
    # YOUR CODE HERE
    
    return kp, des
        

def get_top_descriptors(kp, des, top_k):
    response_sort_indices = [i for (v, i) in sorted(((v, i) for (i, v) in enumerate(kp)), 
                                       key=lambda k: k[0].response, reverse=True)]        
    top_des = np.take(des, response_sort_indices[:top_k], axis=0)
    return top_des

In [None]:
# test
kp, des = generate_sift_descriptors(wrenches[3]['image'])
print("Keypoints:", len(kp), len(des))
print("Center:", kp[4].pt)
print("Vector:\n", des[4].reshape(16, -1))

In [None]:
vectors = []
dataset = train
categories = list(set([row['label'] for row in dataset]))
cat_lookup = dict((c, i) for i, c in enumerate(categories))

for i, row in enumerate(tqdm(train)):
    idn, img, label = row['id'], row['image'], row['label'] 
    keypoints, descriptors = generate_sift_descriptors(img)
    vectors.append(get_top_descriptors(keypoints, descriptors, 32))

In [None]:
vectors[10].shape

In [None]:
%%time
from annoy import AnnoyIndex

annoy = AnnoyIndex(128, 'euclidean')
for i, vectors32 in enumerate(vectors):
    for vec in vectors32:
        annoy.add_item(i, vec)

annoy.build(100, n_jobs=-1)

### Implement search function

Implement a function which returns `k` besr matching classes (names).

In [None]:
from collections import Counter

def classifier(image, index, k):
    keypoints, descriptors = generate_sift_descriptors(image)
    vecs = get_top_descriptors(keypoints, descriptors, 32)
    
    # TODO:
    # return the list of ordered pairs,
    # contaning similarity and class name
    
    return counter.most_common(k)



print("STRAWBERRY")
result = classifier(strawberries[22]['image'], annoy, 10)
print(*result, sep='\n')
print()
print("WRENCH")
result = classifier(wrenches[11]['image'], annoy, 10)
print(*result, sep='\n')

**Q**: Before going further, discuss, why SIFT is unsuitable for searching similar things. What is the application area? Is it reliable in this area, and to which extent?


# 2. Deep classifiers and Embeddings

Based on:
- https://www.analyticsvidhya.com/blog/2020/08/top-4-pre-trained-models-for-image-classification-with-python-code/
- https://github.com/christiansafka/img2vec
- https://github.com/ultralytics/yolov5

### Obtain a single label for the image

In [None]:
!pip install torch torchvision imageio

In [None]:
import torch
# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')  # or yolov5m, yolov5l, yolov5x, custom

In [None]:
from imageio import imread

url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/5/5a/Ferry_in_Istanbul_01.JPG/1200px-Ferry_in_Istanbul_01.JPG'
im = imread(url)
results = model(im)
pandas_detections_df = results.pandas().xyxy[0]
pandas_detections_df

In [None]:
results.print()

### Compute the classes for the dataset. 

In [None]:
%matplotlib inline

for i, row in enumerate(dataset):
    if i % 213 != 0: continue
    results = model(row['image'])
    tag = results.pandas().xyxy[0]['name']
    tag = tag[0] if len(tag) else None
    cat = row['label']
    print(f"{cat}\t{tag}")
    plt.figure(figsize=(3,2))
    plt.imshow(np.array(row['image']))
    plt.show()
    print()

**Discuss:** 
- Look at the results. 
- Can we use this for retrieval in the same way as we used SIFT features? 
- What if the labels are different from original? What if there are multiple or no labels?

## Vector embedding for image

In [None]:
!pip install img2vec_pytorch Pillow

In [None]:
from img2vec_pytorch import Img2Vec
from PIL import Image

url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/5/5a/Ferry_in_Istanbul_01.JPG/1200px-Ferry_in_Istanbul_01.JPG'
img = imread(url)

# Initialize Img2Vec
img2vec = Img2Vec(cuda=False)
# some magic with broken Pillow.
vector = img2vec.get_vec([Image.fromarray(img)]).reshape(-1)
vector.shape

In [None]:
embedding_vectors = []

def get_vectors(images):

    # TODO
    # return the np.array with the shape of (files x 512)

    # some magic with broken pillow
    # this should work: img2vec.get_vec([Image.fromarray(np.array(image))])
    
    return ...

sorted_dataset = map(int, np.argsort([row['label'] for row in dataset]))
images_sample = [dataset[row]['image'] for i, row in enumerate(sorted_dataset) if i < 200]
embedding_vectors = get_vectors(images_sample)

In [None]:
embedding_vectors.shape

In [None]:
from sklearn.metrics import pairwise_distances
d = pairwise_distances(embedding_vectors, metric='cosine')

In [None]:
plt.figure(figsize=(10, 10))
plt.imshow(d, cmap='RdBu', vmin=0, vmax=1)
plt.show()