# **Building an Image Search Engine - Image Query**


## Overview


The image search engine project built a prototype to enable image query. The guided project has two parts. The first part focus on building the image encoding system and the second part focus on building the image query system.

This application has many business use cases such as:

1.  Enable customers to look for similar apparels, furniture, auto parts etc.
2.  Help image application to eliminate near duplicated images.
3.  Enable image to be used as feature embedding for modeling task.


## Objectives

After completing this notebook you will be able to:

*   Import the embeddings dataset from previous notebook
*   Generate embeddings for a query image
*   Search the embeddings dataset for closest match for the given query image


## Setup Runtime

*   we recommand to use anaconda to manage your runtime.
*   install the dependencies into your runtime.


System requirements:

1.  Stable internet access
2.  TensorFlow 2.x
3.  Jupyter notebook
4.  2GB storage if choose local file system.


In [None]:
!pip install -U tensorflow

In [None]:
import tensorflow as tf
import csv
import random
import numpy as np
import pandas as pd
from random import shuffle
import zipfile

import PIL
import PIL.Image as Image

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical
from tensorflow.keras import regularizers
from keras.preprocessing.image import ImageDataGenerator

from tensorflow.keras import backend as K
from tensorflow.keras.callbacks import Callback

from keras.applications.inception_v3 import InceptionV3
from keras.applications.vgg16 import VGG16
from keras.applications.vgg19 import VGG19
from keras.applications.xception import Xception

import matplotlib.pyplot as plt

import skillsnetwork

##  Download Image Dataset

*Note* : If you have downloaded the dataset as part of the previous notebook, please skip this step.

For this prototype we will use a clothing dataset of tshirts/apparel created by [Alexey Grigorev](https://github.com/alexeygrigorev). A fork of the dataset can be found [here](https://github.com/CODAIT/clothing-dataset) on IBM CODAIT's GitHub.

*   Click [the link](https://github.com/CODAIT/clothing-dataset) to download the data manually.
*   Save the downloaded dataset to your local file system.

Alternatively you can use the `wget` command below to download the dataset within the notebook kernel.


In [None]:
# Download the dataset
!git clone https://github.com/CODAIT/clothing-dataset.git

##  Load The Image Encodings and Image Disctionary

Now we load the image encodings/embeddings we generated in the previous notebook into memory. In case you haven't generated the embeddings as part of the previous notebook, please use this link or clone the github repo for this guided project to find the file.


In [None]:
# OPTIONAL: Download the embeddings if the previous notebook wasn't executed fully
# UNCOMMENT THE CELL BELOW
# await skillsnetwork.prepare("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-GPXX0W3UEN/cloth-vgg16-500dim-encodings.npy.zip")

In [None]:
# load saved encoding

image_encodings = np.load('cloth-vgg16-500dim-encodings.npy')

Next, we load the image dictionary that contains the mapping from the image name to the underlying file path on disc for retrieval.


In [None]:
# OPTIONAL: Download the mapping file if the previous notebook wasn't executed fully
# !wget https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-GPXX0W3UEN/image_dictionary.csv

In [None]:
df = pd.read_csv('image_dictionary.csv')
df.head()

##  Query Similar Apparel

We are finally ready to query the dataset given a sample image.

Step 1: First we pick a random image (`index i`) and then generate the embedding for that image. Since we are picking an image from the dataset, we can also retrieve the embedding for that image from our dataset.

Step 2: Use a similarity metric to measure distance to all embeddings and get the closest ones. In our example we use the `cosine` similarity distance metric.

Step 3: Retrieve the underlying images for the closest matching embeddings and visualize them to show as output.


In [None]:
# example 1, 2, 9, 100, 114, 1200, 1500, 5000

# Step 1
# Change `i` to query a different image and try out other images
i = 455

print('> Query Image:')
display(PIL.Image.open(df.iloc[i]['full_path_file_name']))
# inverse indexing design
foo = np.zeros(image_encodings.shape[0])


# Step 2
for j in range(image_encodings.shape[1]):
    encodings = image_encodings[:,j,:]
    foo += np.dot(encodings, encodings[i].reshape(-1,1)).ravel()
result = np.argsort(foo)

# Step 3
print('> Top 3 Similar Images:')
for j in [-2,-3,-4,-5]:
    print(foo[result[j]])
    display(PIL.Image.open(df.iloc[result[j]]['full_path_file_name']))