# Hobbits and Histograms Tutorial

## A How-To Guide to Building Your First Image Search Engine in Python

This tutorial is provided on pyimagesearch and can be found at this link: https://www.pyimagesearch.com/2014/01/27/hobbits-and-histograms-a-how-to-guide-to-building-your-first-image-search-engine-in-python/

## Overview:
Build an image search engine<br>
Learn the 4 steps that are required

## Goal:

We have 25 images in our dataset that are categorized into five different locations in the Lord of the Rings. We will create an image search engine using this data. Our goal is given an query (input) image from one of the categories, we return all five images from said category in the top 10 search results

## 4 Steps to Building an Image Search Engine

(1) Define your descriptor<br>
(2) Index your dataset<br>
(3) Define your similarity metric <br>
(4) Searching: apply descriptor to your query image. sort results via similary and examine them

## Step 1: The Descriptor - A 3D RGB Color HIstogram

we compute a 3D histogram with 8 bins. We have to flatten it to reshape the array in numpy

In [18]:
import imutils
import cv2
import os
import pickle
import numpy as np

In [4]:
# Create class for RGB Histogram
class RGBHistogram:
    def __init__(self, bins):
        # num bins in histogram
        self.bins = bins
    
    def describe(self, image):
        # compute normlaized 3D histogram in RGB colorspace
        hist = cv2.calcHist([image], [0, 1, 2], None, self.bins, 
                          [0, 256, 0, 256, 0, 256])
        
        if imutils.is_cv2():
            hist = cv2.normalize(hist)
        else:
            hist = cv2.normalize(hist, hist)
        
        # return histogram as flattened array
        return hist.flatten()          
            

It is good practice to define image descriptors as classes rather than functions because you rarely ever extract features from a single image alone

## Step 2: Indexing our Dataset

Apply our  image descriptor to each image in the dataset

In [6]:
# The index dictionary will keep the value of the descriptors for each file
index = {}

# Initalize descriptor object
desc = RGBHistogram([8, 8, 8])

# Loop over every file in the images directory
for _, _, files in os.walk(os.getcwd() + '/images'):
    for file in files:
        # Get image path
        path = os.getcwd() + '/images/' + file

        # load image, describe it and update the histogram
        image = cv2.imread(path)
        cv2.imshow('image', image)
        features = desc.describe(image)
        index[file] = features        
        
# Save index to pickle file
f = open(os.getcwd() + '/index.pkl', 'wb')
f.write(pickle.dumps(index))
f.close()

## Step 3: The Search

Our index is now ready to be searched. This will compare two feature vectors and determine similarity

In [15]:
class Searcher:
    def __init__(self, index):
        self.index = index
        
    def search(self, queryFeatures):
        # initialize dictionary of results
        results = {}
        
        # loop over the index
        for (k, features) in self.index.items():
            
            # Compute chi-squared distance between features
            d = self.chi2_distance(features, queryFeatures)
            
            # save the result
            results[k] = d
        
        # sort results so that smallest distances are at the front of the list
        results = sorted([(v, k) for (k, v) in results.items()])
        
        return results
    
    def chi2_distance(self, histA, histB, eps=1e-10):
        # compute and return chi-squared distance
        return 0.5 * np.sum([((a - b) ** 2) / (a + b + eps)
                          for (a,b) in zip(histA, histB)])
        

## Step 4: Performing a Search

Perform the search on files inside and outside the dataset

In [26]:
# create search function given an input picture
def search_query(query_path):
    ''' This function will print the 5 highest matched image from our dataset
    relative to the query picture'''
    
    # Load and show query image
    query = cv2.imread(query_path)
    cv2.imshow("Query", query)
    
    # Describe the input query image
    desc = RGBHistogram([8, 8, 8])
    queryFeatures = desc.describe(query)

    # load in index and initialize an object of the searcher class
    index = pickle.loads(open(os.getcwd() + '/index.pkl', "rb").read())
    searcher = Searcher(index)
    
    # Get results
    results = searcher.search(queryFeatures)
    
    # Print the top 5 results
    for i in range(5):
        print(results[i][1])

In [20]:
# Define file paths
querypaths = {}
querypaths['mordor'] = os.getcwd() + '/images/Mordor-002.png'
querypaths['rivendell'] = os.getcwd() + '/queries/rivendell-query.png'
querypaths['shire'] = os.getcwd() + '/queries/shire-query.png'

In [27]:
# Perform searches on Mordor in dataset
search_query(querypaths['mordor'])

Mordor-002.png
Mordor-004.png
Mordor-001.png
Mordor-003.png
Mordor-005.png


In [28]:
# Perform search on Rivendell outside of dataset
search_query(querypaths['rivendell'])

Rivendell-002.png
Rivendell-004.png
Rivendell-001.png
Rivendell-005.png
Rivendell-003.png


In [30]:
# Perform search on Shire outside of dataset
search_query(querypaths['shire'])

Shire-004.png
Shire-003.png
Shire-001.png
Shire-002.png
Shire-005.png


In [3]:
























Spacing




























NameError: name 'Spacing' is not defined