# 1. Image search using [SIFT](https://www.thepythoncode.com/article/sift-feature-extraction-using-opencv-in-python)

Let's think about information retrieval in the context of image search. How can we find images similar to a query in a fast way (faster than doing pair-wise comparison with all images in a database)? How can we identify same objects taken in slightly different contexts? 

One way to do this is to find special points of interest in every image, so called keypoints (or descriptors), which characterize the image and which are more or less invariant to scaling, orientation, illumination changes, and some other distortions. There are several algorithms available that identify such keypoints, and today we will focus on [SIFT](https://en.wikipedia.org/wiki/Scale-invariant_feature_transform). 

Your task is to apply SIFT to a dataset of images and enable similar images search.

## Get dataset

We will use `Caltech 101` dataset, download it from [here](http://www.vision.caltech.edu/Image_Datasets/Caltech101/). It consists of pictures of objects belonging to 101 categories. About 40 to 800 images per category. Most categories have about 50 images. The size of each image is roughly 300 x 200 pixels.

## SIFT example

Below is the example of SIFT keyponts extraction using `opencv`. [This](https://docs.opencv.org/trunk/da/df5/tutorial_py_sift_intro.html) is a dedicated tutorial, and [this](https://docs.opencv.org/master/dc/dc3/tutorial_py_matcher.html) is another tutorial you may need to find matches between two images (use in your code `cv.drawMatches()` function to display keypoint matches).

In [None]:
!pip install opencv-python opencv-contrib-python

In [None]:
import cv2 as cv
from matplotlib import pyplot as plt

img_dir = '../../101_ObjectCategories'
img = cv.imread(img_dir + '/gramophone/image_0018.jpg')
gray= cv.cvtColor(img,cv.COLOR_BGR2GRAY)

# older versions of OpenCV
# sift = cv.xfeatures2d.SIFT_create()
sift = cv.SIFT_create()

kp = sift.detect(gray, None)
# use detectAndCompute(...) to get descriptors themselves

print(f"Location ({kp[0].pt[0]:.2f}, {kp[0].pt[1]:.2f})")
print(f"Radius: {kp[0].size};  angle:{kp[0].angle}")
img=cv.drawKeypoints(gray, kp, img, flags=cv.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
plt.imshow(img)

### Discussion

Discuss what you see here. What is the meaning of circle diameter? Of the angle?

## Index of keypoints

Let's suppose we've found image descriptors. How do we find similar images, having this information? In our case the descriptors are 128-dinensional vectors per keypoint, and there can be hundreds of such points. To enable fast search of similar images, you will index descriptors of all images using some data structure for approximate nearest neighbors search, such as Navigable Small World or Annoy. Then, for a new (query) image you will generate descriptors, and for each of them find its nearest neighbors (using Euclidean or Cosine distance, which you prefer). Finally, you will sort potential similar images (retrieved from neighbor descriptors) by frequency with which they appear in the nearest neighbors (more matches -- higher the rank).

### Build an index

Read all images, saving category information. For every image generate SIFT descriptors and index them.

In [None]:
# read all images and add their descriptors to index
import glob
import numpy as np
from tqdm import tqdm

def generate_sift_descriptors(img_path):    

    #TODO return keypoints and their descriptors

    return kp, des


def read_dataset(img_dir):
    for filename in glob.iglob(img_dir + '/*/*.jpg', recursive=True):
        category = filename.split('/')[-2]
        fn = "/".join(filename.split('/')[-2:])
        kp, des = generate_sift_descriptors(filename)    
        yield fn, kp, des, category
        

def get_top_descriptors(kp, des, top_k):
    response_sort_indices = [i for (v, i) in sorted(((v, i) for (i, v) in enumerate(kp)), 
                                       key=lambda k: k[0].response, reverse=True)]        
    top_des = np.take(des, response_sort_indices[:top_k], axis=0)
    return top_des

In [None]:
categories = {}
vectors = {}
filenames = []

for filename, keypoints, descriptors, category in tqdm(read_dataset(img_dir), total=9144):
    categories[filename] = category
    vectors[filename] = get_top_descriptors(keypoints, descriptors, 32)
    filenames.append(filename)

In [None]:
%%time
from annoy import AnnoyIndex

lookup = []
annoy = AnnoyIndex(128, 'euclidean')

for filename in filenames:
    for i, v in enumerate(vectors[filename]):
        annoy.add_item(len(lookup), v)
        lookup.append([filename, i])

annoy.build(100, n_jobs=-1)

### Implement search function

Implement a function which returns `k` neighbours (names) sorted for a given image name.

In [None]:
from collections import Counter

def anns(imagename, k):
    vecs = vectors[imagename]
    # TODO
    # return the list of ordered pairs, s
    # imilarity first, better is in the beginning
    return [(-1, imagename)]

# finds query image in the result, as it is indexed
filename = 'strawberry/image_0022.jpg'
result = anns(filename, 10)
assert any([f[1] == filename for f in result]), "Should return a duplicate"

print(*result, sep='\n')

## Estimate the quality

Build a bucket from these images.
```
accordion/image_0043.jpg
laptop/image_0052.jpg
pagoda/image_0038.jpg
revolver/image_0043.jpg
rhino/image_0040.jpg
sea_horse/image_0038.jpg
soccer_ball/image_0057.jpg
starfish/image_0011.jpg
strawberry/image_0022.jpg
wrench/image_0013.jpg
```
Consider `relevant` if **class of the query and class of the result match**. Compute `DCG` for every query and for the bucket in average.

In [None]:
## write your code here

## Deep classifiers and Embeddings

Based on:
- https://www.analyticsvidhya.com/blog/2020/08/top-4-pre-trained-models-for-image-classification-with-python-code/
- https://github.com/christiansafka/img2vec
- https://github.com/ultralytics/yolov5

### Obtain a single label for the image

In [None]:
!pip install torch torchvision

In [None]:
import torch
# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')  # or yolov5m, yolov5l, yolov5x, custom

In [None]:
i = 'http://www.vision.caltech.edu/Image_Datasets/Caltech101/SamplePics/image_0022.jpg'
results = model(i)
pandas_detections_df = results.pandas().xyxy[0]
pandas_detections_df

In [None]:
results.print()

### Compute the classes for the dataset. 

In [None]:
%matplotlib inline

for filename in filenames[::400]:
    results = model(img_dir + "/" + filename)
    tag = results.pandas().xyxy[0]['name']
    tag = tag[0] if len(tag) else None
    cat = categories[filename]
    print(f"{filename:25}\t{cat}\t{tag}")
    plt.figure(figsize=(3,2))
    plt.imshow(cv.imread(img_dir + "/" + filename)[:, :, ::-1])
    plt.show()

**Discuss:** 
- Look at the results. 
- Can we use this for retrieval in the same way as we used SIFT features? 
- What if the labels are different from original? What if there are multiple or no labels?

## Vector embedding for image.

In [None]:
!pip install img2vec_pytorch Pillow

In [None]:
from img2vec_pytorch import Img2Vec
from PIL import Image

# Initialize Img2Vec
img2vec = Img2Vec(cuda=False)

# Read in an image (rgb format)

img_file = img_dir + '/gramophone/image_0018.jpg'
img = Image.open(img_file)
vector = img2vec.get_vec([img]).reshape(-1)
vector.shape

In [None]:
MAX = 914 # 9144

embedding_vectors = []
borders = list(range(0, MAX, 100)) + [MAX]
print(borders)

def get_vectors(filenames):
    # TODO
    # return the np.array with the shape of (files x 512)
    return ev

for i in range(len(borders) - 1):
    embedding_vectors += [get_vectors(filenames[borders[i]:borders[i+1]])]

embedding_vectors = np.vstack(embedding_vectors)

In [None]:
from sklearn.metrics import pairwise_distances
d = pairwise_distances(embedding_vectors, metric='cosine')

In [None]:
plt.figure(figsize=(10, 10))
plt.imshow(d, cmap='RdBu', vmin=0, vmax=1)
plt.show()