<h1 align="center"> Content Based Image Retrieval
 </h1>

$$
\text{CSE2230 Multimedia Analysis}\\
\text{2021-2022 Q3 Week 2}\\
\textbf{Deadline:}\text{ 22 Feburary 2022}
$$

### How to submit your work:
After making this jupiter notebook, in brightspace you can find under Labs -> Lab 2 -> YOURNAME_LAB1.docx. Fill in this document and convert it to a PDF or Docx file. In brightspace under assigments there will be a assigment called **"Lab 2"**. Here you can hand in the PDF or Docx you just created with your answers.

### Prerquisites:
During today's lab, you will be using an extra set of images. Before getting started, check if there is a folder called `studentimages` in the same folder as this notebook. If not, make the folder `studentimages`. After you have done this download the images from here: https://drive.google.com/drive/folders/1tvvere7UX07dpWJY0SXbnf0Fv6HOZmlZ?usp=sharing and put them in this folder. You will notice that the images are named using the following annotations:

    y_tag.jpg

- y = ewi for “Photos of EWI”
- y = d for the “Direction photos”
- y = b|w|l  for the “Delft Campus Visitor photos” (according to whether it’s a photo of the bike, water, or streetlight)
- tag = a photo tag, which is different for the different types of photos (a descriptive adjective) for “Direction photos”: tag=N|S|E|W|NE|SE|NW|SW

After this do the same for the `Image/joint` folder from this link: https://drive.google.com/drive/folders/1IDm27SqMYBuqBGWRB-aI0UDRxgAzkf9O. Here you also find the databese we are going to use. These databases should be in the `Databases` folder. 

### Our goals today:
#### Part I: OpenCV Continued
After this lab, you should be comfortable with OpenCV. You should also have gained (more) practical experience with SIFT keypoints and descriptors. It is important to refresh SIFT descriptors so that you understand how SIFT descriptors are used to create visual words in Part II of today’s lab.

#### Part II: Content Based Image Retrieval
After this lab, you will be able to build your own content-based image retrieval system using color histograms and visual words. You should understand how CBIR can be used to find near-duplicate matches and semantic matches. You also should be able to discuss the difference between the algorithm perspective and the user perspective on CBIR.

By using the given images of Delft, you gain a hands-on understanding of the sensitivity of image analysis and retrieval to the wide variability of real-world images. 

Multimedia research aims to close the semantic gap. The semantic gap is defined as, “the difference between the information that a machine can extract from the (perceptual) data and the interpretation that a user in a given situation has for the same data.” We want to comprehend the full implications of the semantic gap.

### 1 OpenCV Continued 
In this part of today’s lab, we will be using SIFT to match near-duplicate images. This section helps you to review SIFT and also gives you more experience with OpenCV.

For a small summary of SIFT with OpenCV visit: 
https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_feature2d/py_sift_intro/py_sift_intro.html

### 1.1 SIFT function in OpenCV
Tip: remember to activate your virtual environment before working with Python.
If you open Python in your shell window you can use the following code to read the documentation on SIFT:

    import cv2
    help(cv2.xfeatures2d.SIFT_create())

The command-line documentation is a bit dry and more useful if you already know a
bit about the functionality. Online sources are typically better for more descriptive
information. Note that in a lot of online documentation you will find cv2.SIFT instead of cv2.xfeatures2d.SIFT create() . This is because the latter notation has only been in use since OpenCV 3.0.

If you get the error that xfeatures2d does not exist, open the anaconda prompt and type the following:
    `pip3 install numpy opencv-python==3.4.2.16 opencv-contrib-python==3.4.2.16`
This should install the right opencv functions for Python 3.0 that we use during this course.

In [1]:
import cv2
help(cv2.xfeatures2d.SIFT_create())

Help on xfeatures2d_SIFT object:

class xfeatures2d_SIFT(Feature2D)
 |  Method resolution order:
 |      xfeatures2d_SIFT
 |      Feature2D
 |      Algorithm
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  __repr__(self, /)
 |      Return repr(self).
 |  
 |  create(...) from builtins.type
 |      create([, nfeatures[, nOctaveLayers[, contrastThreshold[, edgeThreshold[, sigma]]]]]) -> retval
 |      .   @param nfeatures The number of best features to retain. The features are ranked by their scores
 |      .   (measured in SIFT algorithm as the local contrast)
 |      .   
 |      .   @param nOctaveLayers The number of layers in each octave. 3 is the value used in D. Lowe paper. The
 |      .   number of octaves is computed automatically from the image resolution.
 |      .   
 |      .   @param contrastThreshold The contrast threshold used

### 1.2 Testing SIFT module
Previous steps have provided us with some information about the function parameters. Calling this function returns an `xfeatures2d_SIFT` object. The square brackets around parameters indicate that the parameters are optional. This means we can either just call `cv2.xfeatures2d.SIFT_create()` in which case the optional parameters are assigned default values, or we can assign them with specific values like this:

    sift = cv2.xfeatures2d.SIFT_create(nfeatures = 10, sigma=3)

The help text provided by OpenCV does not explain the type of the parameters and what the different parameters are used for. To learn more about the OpenCV functions we have to consult the online documentation.

#### ***Answer the following question:***
Find the online documentation and explain the parameters nfeatures and sigma of
the SIFT function shown above in your own words. Note: some parts of the OpenCV
documentation are only available for C++, however the parameters of the functions
are similar.

#### Write your answer here:

<font color='red'>
    ANSWER
</font>

### 1.3 Extracting SIFT keypoints
With our newfound knowledge about the `cv2.xfeatures2d.SIFT_create()` function we can put it to work. However, this function is just a constructor. Besides returning a SIFT object, it does nothing. To do the actual SIFT transform we first have to feed our target image to the SIFT object to obtain a list of keypoints.

First, create the SIFT object and load an image.

    sift = cv2.xfeatures2d.SIFT_create()
    im = cv2.imread('path/to/image.jp', cv2.IMREAD_GRAYSCALE)

$\textbf{Tip:}$ make sure the path to the image is correct. Remember the path is relative to your working directory (ie. where you run the code from).

Figure out how to compute the SIFT keypoints and add it to your code. Search for a function that outputs the keypoints (there is helpful information for this online). Note: you will need the SIFT keypoints in the following exercises.

In [None]:
def return_keypoints(img):
    keypoints = None
    # Start answer here
    
    # End snswer here
    return keypoints

bookshelf_image = None
bookshelf_keypoints = None
sift = None
# Start answer here
    
# End snswer here

### 1.4 Displaying keypoints
OpenCV has a nice helper function that draws the keypoints for you. It is called as follows:

    k_im = cv2.drawKeypoints(im, keypoints, None, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

Now plot the image like this:

    import matplotlib.cm as cm
    import matplotlib.pyplot as plt
    plt.imshow(k_im, cmap=cm.Greys_r)
    plt.show()


In [None]:
# Start answer here
    
# End snswer here

#### ***Answer the following questions:***

1. What do the centre points of the circles correspond to?
2. And the line inside the circles?
3. And the size of the circles?
4. In what kind of image regions (homogeneous, edge, corner, etc) does the SIFT detector find keypoints?
5. What happens at the borders of the image?

#### Write your answers here:

<font color='red'>
    ANSWER
</font>

### 1.5  SIFT Descriptors 
We now would like to compare different images based on the SIFT descriptors. 

We can compute the SIFT keypoints and descriptors using:

    kp, desc = sift.detectAndCompute(im, None)

Here is an example descriptor for a single keypoint:

    [ 114.   16.    1.    0.    0.    2.   39.  114.    8.    7.    3.    0.
     1.   72.  114.   47.    5.    2.    0.    0.    0.   38.   91.   19.
    15.   11.    1.    0.    0.    7.   26.    9.  114.  114.   17.    0.
     0.    1.   15.   62.  114.  107.   29.    5.    4.   46.   61.   52.
    14.    6.    9.   32.   56.  114.   95.   23.    2.    2.    1.    3.
     7.   23.   69.   17.   83.   23.    1.    0.    0.    1.   47.   93.
     114.   95.   73.   19.    6.   15.   33.   35.   10.   28.  114.  114.
    36.   16.    8.    6.   15.   71.   30.    8.    4.    3.   20.   13.
    77.   50.   10.    0.    0.    2.   11.   35.   11.   30.   60.   18.
    13.   26.   17.    6.   11.    7.   24.    8.    4.   10.   30.   22.
     6.   29.    7.    0.    0.    4.   48.   21.]

It is a vector with 128 dimensions. Do you remember what the individual components of this vector represent? (Not necessary to write the answer here. Just make sure that you remember.)


### 1.6 SIFT Matching 
Next, we will match two images using the SIFT keypoints and their descriptors. Find the images nieuwekerk1.jpg and nieuwekerk2.jpg or notreDame1.jpg and notreDame2.jpg in the `Images` folder. 

Attempt to show the matching between them using the following code:

    # create BFMatcher object
    bf = cv2.BFMatcher(cv2.NORM_L2, crossCheck=True)

    # Match descriptors.
    matches = bf.match(des1,des2)

    # Sort them in the order of their distance.
    matches = sorted(matches, key = lambda x:x.distance)

    # Draw first 10 matches.
    img3 = cv2.drawMatches(img1,kp1,img2,kp2,matches[:10], None, flags=2)

    plt.imshow(img3)
    plt.show()

Source:
https://docs.opencv.org/3.4/dc/dc3/tutorial_py_matcher.html

Your result should resemble:
![Example](Images/Example1-6.PNG)

In [None]:
# Start answer here
    
# End snswer here

#### ***Answer the following questions:***

1. Do the points in the photos connected by the match lines correspond to the same physical point in the real-world scene depicted by the photo (i.e., to the same part of the building)? 
2. Do they correspond to points that have the same shape on the real-world building?
3. What do you expect happens when you rotate one of the images by 90 degrees? Are the same physical points in the real-world scene matched?
4. Now, create below another version of this image pair that depicts the next 10 best matches, i.e., `[10:20]`.
5. Do you notice a difference between the top 10 best matches and the next 10 best matches in terms of their ability to connect points in both photos corresponding to the same real-world point? Did you observe what you expected to observe? Try pushing the limit further, what would be a good cutoff to use? 


#### Write your answers here (except 4):

<font color='red'>
    ANSWER
</font>

In [None]:
# For question 4 from 1.6
# Start answer here
    
# End snswer here

## 2 CBIR
CBIR stands for Content-Based Image Retrieval. CBIR techniques retrieve images based on their content (pixels). In this lab, we look at Query-by-image, which means that the query is itself an image. 

When studying CBIR it is important to differentiate the Algorithm perspective from the User perspective.

1. **From the Algorithm perspective:** Images are represented by features derived from pixels. CBIR compares images on the basis of these features. Standardly, a feature vector is built for the query image, and is compared to the feature vectors of each image in the collection. We distinguish local features, which represent certain image regions, and global features, which represent the image as a whole. Visual words built from SIFT descriptors are an example of local features. Features based on the color histogram are an example of global features. Good visual features are good at capturing the sorts of similarities between images that human users see.

2. **From the User perspective:** The goal of the CBIR system is to return images that are relevant to the user information need behind the query. Users with different information needs might submit the same query to an information retrieval system. For example, in the case of images depicting an object, a user might want to find images depicting the same object instance, or the same object class. The system should try to satisfy the users as well as possible.

To carry out a formal evaluation of a CBIR system it is necessary to have ground truth that specifies which images are relevant to which queries. There are two main ways of creating ground truth: first, recording it at the time at which the image is captured (for example, using the camera GPS or adding a tag) and, second, creating it by having a large number of people look at images and judge whether or not they are relevant to a specific information need. Which method you use depends on your application, and also on the resources available to you (whether you have a large number of people willing to judge images for you). True evaluation requires a test set consisting of a list of queries. For each query, you test the system and record how well the system scores with respect to a specific metric (e.g., precision or recall). The final score is usually reported as the average score overall queries. Remember in this lab you are not carrying out a formal evaluation, but rather trying to get an informal feel for how the system works.


### 2.1 Representing images as vectors

In order to carry out Content Based Image Retrieval, we need to be able to compare images using feature vectors. 

First, let’s consider the case of the color histogram. The color histogram has three series of 256 bins, which we can easily convert to a single vector with normalized bin heights. At that point, we can pick and choose from a plethora of similarity measures.

In contrast, if we consider SIFT descriptors, it is not immediately obvious how we should proceed. One image might have more SIFT descriptors than the next image. Even assuming that we can magically enforce how many SIFT descriptors there are, it is still not clear how to calculate the similarity between two different sets of SIFT descriptors. Fortunately, however, there is another technique for approaching these issues.

In the lectures, we have discussed the Bag of Words approach. Here, we apply the same principle of representing an item as a set of disassociated elements. In the fields of Computer Vision and Multimedia, this is referred to as “Bag of Visual Words” (BOVW). Here we will be doing retrieval, but it helps to look at this Wikipedia page, which describes classification:
http://en.wikipedia.org/wiki/Bag-of-words_model_in_computer_vision
For a small summary of BOW please visit:  https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb

Each image can be explained as a bag of visual words. Each visual word is a cluster of SIFT visual descriptors. 
![visualwords](Images/bagoffeatures_visualwordsoverview.png)
**source:** Mathworks
http://nl.mathworks.com/help/vision/ug/image-classification-with-bag-of-visual-words.html

Under the BOVW approach, we create visual words by clustering the SIFT descriptors that we have extracted from the images in our data set using k-means clustering, a commonly used clustering algorithm. We recommend you have a look at the Wikipedia page: http://en.wikipedia.org/wiki/K-means_clustering

Applying k-means clustering to the SIFT descriptors results in a set of sets of descriptors (i.e., a set of clusters of descriptors that are similar to each other). The center of each of the clusters is used to represent the cluster. It is treated as a codeword in a codebook. Note that for this reason the centers (and therefore the codewords) have exactly the same form as a regular feature. 

When a new query image is presented to the CBIR system, we need to create a representation made of visual words, in order to compare that image to images in the background collection. We extract SIFT descriptors from that image. For each descriptor, we check the distance to every codeword in the codebook and return the codeword closest to it (and the distance). We count how often each codeword comes out as the closest match to a feature and use these counts to build our BOVW vector.


### 2.2 Visual word vocabulary and vector representations of the image set
We train a visual word vocabulary using a set of images, and then use that vocabulary to create a vector representation of each of those images. 

Creating vector representations of multimedia content such as images is sometimes called indexing, and the result is an index, which usually implies a special data structure. However, in this lab, we refer to the set of vectors as the database (because we are not using a special data structure.) The set of images will be referred to as the collection or the background collection, but also the database images (because these are the images that are represented by the vectors in the database).

We will perform CBIR by matching the vector representation of a query image to the vector representations of each the images in the collection, and returning the top-N closest images (i.e., most similar images).

The first step is already done. In other words, we have already trained the SIFT vocabulary and created the image index, which allows you to retrieve images from the image collection (referred to here as the “database”). 

We are going to use **two different versions of the database. Please pay attention and switch the versions carefully** so that you are using the right version for the right section below.


### 2.3 Information about the databases and images

Please check if there is a folder `/Images/joint` with images and databases in the `/Databases` folder. Take a look at the images to understand what the databases contain. The suffixes used are the same as those from the given image collection for this lab (see the Prerquisites). If the images and databases are there, you can skip to the next section to process the images you were given at the start of the lab.

Take a look at the images to understand what the databases contain. The direction database contains only the images indicated by a d and the scale database contains only the images indicated by an s. The joint database contains all of the images. The suffixes used are the ones used in the given image collection (see the Prerquisites).


### 2.4 Image pre-processing
1. Now, you should create a new folder in the `/studentimages_resized` directory.
2. Then, resize those images with the resizing tool here below.

The tool is simple function that only does resizing and keeps the meta data of the image. The function has the following inputs:
- path_to_images -> this is the absolute path to the folder where the image are that needed to be resized.
- withd -> this the new size of the image in pixels.
- path_to_output -> this is where the new images will be saved (default = path_to_images).

Have a look at the python file in this folder and resize all images to 500 pixels.

In [5]:
from resize import resize
# Start answer here
    
# End snswer here

'ls' is not recognized as an internal or external command,
operable program or batch file.


### 2.5 Local vs. Global Features for CBIR
First, we will be using the database from the Databases/direction folder, which was created using the direction images. This collection contains images taken by other students in the past years of Multimedia Analysis standing in front of EWI facing different directions.

While in the MMA/Code folder, investigate the “help” of ./query.py. Make sure you understand the arguments of this tool. This tool need some extra packages. Go to anaconda terminal and do the following pip installs:
- pip install progressbar
- pip install progressbar2
- pip install exifread
- pip install geopy

Choose one of the direction (eg. d_W.jpg) photos from the given image collection (see the Prerquisites) as a query. And answer the following questions.

 1. What do you expect to be the result when you query the direction database using this query? Why? (Remember that we are using visual word representations of both the query and all of the images in the database, which are local features.)

<font color='red'>
    ANSWER
</font>

Run a query passing the necessary arguments and your image.

 2. Were your expectations upheld? Are you seeing evidence that visual content-based matching is able to capture similarity between different photos of the same scene? You can save the matplotlib images and add your best results to the report. Describe this evidence. 

<font color='red'>
    ANSWER
</font>

Now you will try the same query using a database containing color histogram representations of the complete image collection (/joint). 

In order to experiment with color histogram run:

    %run Code/dbt.py -d colorHistDB colorhist ../Images/joint/

This code will create a database named colorHistDB.db in the /MMA/Code/db folder. 

 3. What do you expect to be the result when you query the database using this query? (Remember that we are now using color histogram features, which are global features).

Run the query with the colorhist option.

 4. Are you seeing evidence that visual content-based matching behaves differently when the images are represented with color histogram features? Again, you can save the matplotlib images and add your best results to the report. Describe what you have found 

Now download the image son_ewi_4ever.jpg
https://drive.google.com/open?id=0BzUoY1B9h0PMQmdDaVVmS21qdVk
Use this image as a query, query the joint collection. 

 5. Do you see evidence of SIFT visual words being invariant to scale and rotation? You can save the matplotlib images and add your evidence to the report. Describe this evidence (or lack thereof) briefly.


In [2]:
%run Code/query.py "C:/Users/ardyz/Documents/TU Delft projects/mma-lab/Labs/Lab2_ImageQueries/Databases/direction/MMA" "C:/Users/ardyz/Documents/TU Delft projects/mma-lab/Labs/Lab2_ImageQueries/studentimages/d_W.jpg" "sift"
# %run Code/dbt.py "sift" "C:/Users/ardyz/Documents/TU Delft projects/mma-lab/Labs/Lab2_ImageQueries/studentimages/" -p "C:/Users/ardyz/Documents/TU Delft projects/mma-lab/Labs/Lab2_ImageQueries/Databases/direction/"


Multi Media Analysis Query Tool

Query the database with [ C:/Users/ardyz/Documents/TU Delft projects/mma-lab/Labs/Lab2_ImageQueries/studentimages/d_W.jpg ] for [ sift ] features...
Loading SIFT vocabulary ...


[                                                                        ]   0%

Generating SIFT features for [ 1 ] images ...




Query database with a SIFT histogram...





OperationalError: no such table: sift_imwords