# Assignment 1
## Building a simple image search algorithm

For this assignment, you'll be using ```OpenCV``` to design a simple image search algorithm.

The dataset is a collection of over 1000 images of flowers, sampled from 17 different species. The dataset comes from the Visual Geometry Group at the University of Oxford, and full details of the data can be found [here](https://www.robots.ox.ac.uk/~vgg/data/flowers/17/).

For this exercise, you should write some code which does the following:

- Define a particular image that you want to work with
- For that image
  - Extract the colour histogram using ```OpenCV```
- Extract colour histograms for all of the **other* images in the data
- Compare the histogram of our chosen image to all of the other histograms 
  - For this, use the ```cv2.compareHist()``` function with the ```cv2.HISTCMP_CHISQR``` metric
- Find the five images which are most simlar to the target image
  - Save a CSV file to the folder called ```out```, showing the five most similar images and the distance metric:

|Filename|Distance]
|---|---|
|target|0.0|
|filename1|---|
|filename2|---|

### Import `modules`

In [6]:
# load packages 
import os

#python path 
import sys 
sys.path.append("..")

# image processing
import cv2
import numpy as np
import pandas as pd



### Function for generating histograms

In [9]:
def gen_hists(image):
    '''
    This function generates normalized image histograms using the cv2 module.
    '''
    # load element
    ref_img = cv2.imread(image)
    # generate histogram 
    hist = cv2.calcHist([ref_img], [0, 1, 2], None, [255, 255, 255], [0,256, 0,256, 0,256])
    # normalize
    norm_hist = cv2.normalize(hist, hist, 0, 1.0, cv2.NORM_MINMAX)
    return norm_hist


### ```Filepaths```

In [21]:
os.getcwd() # get the current working directory

'/work/CDS-visual/CDS-visual/Assignments/Assignment1/src'

In [26]:
# path for the flowers directory
filepath = os.path.join("..",   
                        "..",
                        "..",
                        "..",
                        "..",
                        "cds-vis-data",
                        "flowers"
)

# filepath for the reference image (very non-creative - image number 1)
filepath_ref = os.path.join("..",  
                        "..",
                        "..",
                        "..",
                        "..",
                        "cds-vis-data",
                        "flowers", 
                        "image_0001.jpg"
)

# output path to store the final csv
output_path = os.path.join("..",
                            "out"
)



### Generate ```similarity csv```

In [13]:
# get reference hist 
hist_ref = gen_hists(filepath_ref)

# get image names for all images in the folder 
all = sorted(os.listdir(filepath))

# initialize empty list 
results = []

# get all histograms and compare them with image number 1
for img in all:
    input_path = os.path.join(filepath, img) # combine the filepath with the image names
    hist = gen_hists(input_path) # generate histograms
    distance = round(cv2.compareHist(hist_ref, hist, cv2.HISTCMP_CHISQR), 2) # calculate distance metric
    results.append((img, distance)) # append to list
    df = pd.DataFrame(results, 
                columns = ["Filename", "Distance"]).sort_values(by=["Distance"]) # convert to pandas df, sorted by distance. 
    
    


In [14]:
# show only first 5
df_5 = df.head()
df_5

Unnamed: 0,Filename,Distance
0,image_0001.jpg,0.0
927,image_0928.jpg,178.12
875,image_0876.jpg,188.55
772,image_0773.jpg,190.08
141,image_0142.jpg,190.21


In [27]:
# save to out 

df_5.to_csv(os.path.join(output_path, "distance.csv"), index = False)