## Assignment 2 - Simple image search

### By Jakub Raszka and Peter Thramkrongart

#### Description

Creating a simple image search script

Download the Oxford-17 flowers image data set, available at this link:


https://www.robots.ox.ac.uk/~vgg/data/flowers/17/


Choose one image in your data that you want to be the 'target image'. Write a Python script or Notebook which does the following:

    Use the cv2.compareHist() function to compare the 3D color histogram for your target image to each of the other images in the corpus one-by-one.
    In particular, use chi-square distance method, like we used in class. Round this number to 2 decimal places.
    Save the results from this comparison as a single .csv file, showing the distance between your target image and each of the other images. The .csv file should show the filename for every image in your data except the target and the distance metric between that image and your target. Call your columns: filename, distance.


#### General instructions

    For this exercise, you can upload either a standalone script OR a Jupyter Notebook
    Save your script as image_search.py OR image_search.ipynb
    If you have external dependencies, you must include a requirements.txt
    You can either upload the script here or push to GitHub and include a link - or both!
    Your code should be clearly documented in a way that allows others to easily follow along
    Similarly, remember to use descriptive variable names! A name like hist is more readable than h.
    The filenames of the saved images should clearly relate to the original image


#### Purpose

This assignment is designed to test that you have a understanding of:

    how to make extract features from images based on colour space;
    how to compare images for similarity based on their colour histogram;
    how to combine these skills to create an image 'search engine'

In [3]:
import cv2 #For image processing
import numpy as np #For arrasys and csv exporting
from pathlib import Path #for accessing all files in the directory

In [25]:
"""
This function compares the similarity of a key image to all other images in the file directory.
It outputs a csv-file of of the filenames and the distance in color distributions measured with the chi-square method.

Inputs:
path =  the directory path of all images

key_image_name = the filename of the key image. It should located in the same directory as the other images.
"""

def compare_colors(path,key_image_name):
    filename = [] #The list of image file names
    distance = [] #The list of chi-squared distances to the key image
    image1 = cv2.imread(f"{path}{key_image_name}")  #read keay image
    hist1 = cv2.calcHist([image1], [0,1,2],None,[8,8,8],[0,256,0,256,0,256]) #calculate color distributions of key image
    hist1 = cv2.normalize(hist1,hist1,0,255,cv2.NORM_MINMAX) #normalize values for optimal comparison
    
    for file in sorted(Path(path).glob("*.jpg")): #for each image in the sorted directory:
        file = str(file)  #convert the filename to string
        filename.append(file.split(sep = "/")[3]) #append the filename without the path
        image2 = cv2.imread(file) #read the file
        hist2 = cv2.calcHist([image2], [0,1,2],None,[8,8,8],[0,256,0,256,0,256]) #calculate color distribution
        hist2 = cv2.normalize(hist2,hist2,0,255,cv2.NORM_MINMAX) #Normalize
        calculated_distance = cv2.compareHist(hist1,hist2,cv2.HISTCMP_CHISQR)  #Calculate distance
        calculated_distance = round(calculated_distanc,2)  #Round to two decimals
        distance.append(calculated_distance)  #append distance to list
        
        
    metadata = np.array((filename,distance)) #create array
    metadata = np.column_stack(metadata) #flip array to long format
    metadata=metadata[np.all(metadata != key_image_name, axis=1)] #remove the entry of the key image
    
    split_key_name = key_image_name.split(".")[0] #Get the image name without the file format

    np.savetxt(#save csv
        f"{path}{split_key_name}_distance_data.csv",#filename
        metadata, #array
        delimiter=',',#comma separeted
        header="filename,distance", #column names
        fmt='%s', #We not sure about this... We get an error without. It is something with the encoding of the numbers or something...
        comments = "")#this is to remove the hashtag from the header
    


In [26]:
compare_colors("../data/jpg/","image_0001.jpg") #Test with the first image in the directory

NameError: name 'calculated_distanc' is not defined