# Improving Content Based Image Retrieval through Image Segmentation

This demo will take a user given image, segment it and compare the results of the Content Based Image Retrieval system using the original and the segmented image.

## 1. Dataset

We prepare the dataset used for this research

In [1]:
import cv2
import numpy as np
import os
import sys

root = os.path.abspath("./")

### COCO

We will use the COCO validation dataset and use the cocoapi to extract information from the annotations.
Download the images from the [COCO](http://cocodataset.org/#download) website and extract it to the datasets folder.
Then at the cocoapi [repo](https://github.com/philferriere/cocoapi), follow the installation instructions

In [2]:
MODEL_FOLDER = "models"
DATASET_FOLDER = "datasets"
FEATURES_FOLDER = "features"

SUBSET = 'val'
YEAR = '2017'

TEMP_DIR = "temp"

RESIZED = False

In [3]:
sys.path.append(os.path.join(root, "cocoapi/"))
from pycocotools.coco import COCO
import coco

# initialize COCO api for instance annotations
dataset = coco.CocoDataset()
coco_dat = dataset.load_coco(dataset_dir=DATASET_FOLDER, subset=SUBSET, year=YEAR, return_coco=True, auto_download=True)
dataset.prepare() 

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Using TensorFlow backend.


Will use images in datasets/val2017
Will use annotations in datasets/annotations/instances_val2017.json
loading annotations into memory...
Done (t=0.42s)
creating index...
index created!


### Feature Database
The features are stored in a hd5f database which allows for easy search and retrieval

In [4]:
from utils import Database

RESIZE = 299
FEATURES_DB = '{0}/{1}{2}.hdf5'.format(FEATURES_FOLDER, SUBSET, YEAR)
print('Saving features to {}'.format(FEATURES_DB))

database = Database(FEATURES_DB)

Saving features to features/val2017.hdf5


Create the database

In [5]:
import glob
import os.path
from os import path



image_dir = "{0}/{1}{2}".format(DATASET_FOLDER, SUBSET, YEAR)
print("Retrieving images from {}".format(image_dir))

IMAGES = []
for image_format in ["jpg"]:
    IMAGES += glob.glob("{}/*.{}".format(image_dir, image_format))
print("Creating dataset from {} images".format(len(IMAGES)))
IMAGES.sort()

database.create(FEATURES_DB, IMAGES)

Retrieving images from datasets/val2017
Creating dataset from 5000 images
Database exists at features/val2017.hdf5


## 2. Feature Extraction
Extract the image features from the dataset using a pretrained model (vgg16, vgg19, inception or resnet)

### Local Binary Patterns
Local Binary Patterns are used to detect intensity changes over an image. This can then be used to detect textural features of an image. Here we use an extension to the Local Binary Patterns where we use a number of points P and a radius R to calculate the LBP value of a pixel.

In [6]:
from utils import LocalBinaryPatterns

lbp = LocalBinaryPatterns(num_processes=8, temp_dir=TEMP_DIR, hdf5_path=FEATURES_DB, num_points=24, radius=3, eps=1e-7)
lbp.dump(image_paths=IMAGES)

Dataset lbp already exists in features/val2017.hdf5


### Color Moments

In [7]:
from utils import Color
color = Color(num_processes=8, temp_dir=TEMP_DIR, hdf5_path=FEATURES_DB)
color.dump(image_paths=IMAGES)


Dataset color already exists in features/val2017.hdf5


### Coltex HSV Histograms

In [8]:
from utils import Coltex

coltex = Coltex(num_processes=8, temp_dir=TEMP_DIR, hdf5_path=FEATURES_DB, quantization_hue=4, quantization_int=4)
coltex.dump(image_paths=IMAGES)

Cleaned temp files in /home/asch/Documents/Bachelor_Project/temp
[INFO] launching pool using 8 processes...
[INFO] starting process 2[INFO] starting process 1[INFO] starting process 0[INFO] starting process 3[INFO] starting process 5[INFO] starting process 6[INFO] starting process 7[INFO] starting process 4









Process ForkPoolWorker-1:
Process ForkPoolWorker-2:
Process ForkPoolWorker-3:
Process ForkPoolWorker-4:
Process ForkPoolWorker-5:
Process ForkPoolWorker-8:
Process ForkPoolWorker-7:
Process ForkPoolWorker-6:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiproce

KeyboardInterrupt
KeyboardInterrupt
  File "/home/asch/Documents/Bachelor_Project/utils/coltex.py", line 20, in calculate_weight_hue
    return saturation**(0.1 * ((255 / intensity) ** 0.85) )
KeyboardInterrupt


KeyboardInterrupt: 

## 3. Image Segmentation

This program will use Facebooks's [Mask-RCNN](https://github.com/matterport/Mask_RCNN) convolutional neural network, to segment a given image and give the bounding boxes of those segments.

In [None]:
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize


COCO_MODEL_PATH = '{}/{}'.format(MODEL_FOLDER, 'mask_rcnn_coco.h5')

if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)

In [None]:
class InferenceConfig(coco.CocoConfig):
    # Set batch size to 1 since we'll be running inference on
    # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1

config = InferenceConfig()
config.display()

In [None]:
# Create model object in inference mode.
print('Creating MaskRCNN object')
model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_FOLDER, config=config)

# Load weights trained on MS-COCO
print('Loading weights from {}'.format(COCO_MODEL_PATH))
model.load_weights(COCO_MODEL_PATH, by_name=True)

### Retrieve a random image
We select a random image from the dataset and use it to match against other images and also get the classes from the image

In [None]:
def get_class(image_path, coco_dat):
    """Retrieves the list of classes for a given image name from the COCO annotations.

        Args:
            image_path: The relative path to the image.

        Returns:
            2D array: Returns a list of classes in string format.
    """
    image_id = get_image_id(image_path)
    annid = coco_dat.getAnnIds(image_id)
    ann = coco_dat.loadAnns(annid)
    cats = [coco_dat.loadCats(ann[i]['category_id'])[0]['name'] for i in range(len(ann))]
    return cats

def get_image_id(image_path):
    """Retrieves the id for a given image name.

        Args:
            image_path: The relative path to the image.

        Returns:
            int: Returns the integer id of the given image.
    """
    no_ext = os.path.splitext(image_path)[0]
    base_id = no_ext.split('/')[-1]
    return int(base_id.lstrip('0'))

def score(query_id, retrieved_ids, coco_dat):
    query_class = get_class(query_id, coco_dat)
    retrieved_class = [get_class(retrieved_ids[i], coco_dat) for i in range(len(retrieved_ids))]
    intersects = [len(np.intersect1d([query_class], retrieved_class[i])) for i in range(len(retrieved_ids))]
    scores = [intersects[i] / min(len(query_class), max(1, len(retrieved_class[i]))) for i in range(len(retrieved_ids))]
    final_score = np.sum(scores) / len(retrieved_ids) 
    return final_score, scores

    
def calc_precision_recall(query_id, retrieved_ids, coco_dat):
    query_class = get_class(query_id, coco_dat)
    retrieved_class = [get_class(retrieved_ids[i], coco_dat) for i in range(len(retrieved_ids))]
    check = np.array([len(np.intersect1d([query_class], retrieved_class[i])) > 0 for i in range(len(retrieved_ids))])  
    print(coco_dat.getCatIds(query_class[0]))
    print(len(coco_dat.getImgIds(catIds=[39])))
    class_max = [len(coco_dat.getImgIds(coco_dat.getCatIds(query_class[i]))) for i in range(len(query_class))]
    print(check)
    print(class_max)
    tp = np.sum(check)
    precision, recall = 0, 0
    return precision, recall
    


NUM_RETRIEVE = 100

In [None]:
import random

file_names = next(os.walk(DATASET_FOLDER + '/' + SUBSET + YEAR))[2]
selected_image = random.choice(file_names)
print(selected_image)
print(get_class(selected_image,coco_dat))

### Running Mask-RCNN on the given image
We read the randomly selected image and run the Mask-RCNN on the image to obtain the segments, masks and bounding

In [None]:
rand_image_name = '{0}/{1}{2}/{3}'.format(DATASET_FOLDER, SUBSET, YEAR, selected_image)
IMAGE = cv2.imread(rand_image_name)
if RESIZED:
    image = cv2.resize(IMAGE, (RESIZE, RESIZE))

# Run detection
print('Running inference on {}'.format(rand_image_name))
results = model.detect([IMAGE])

# Visualize results
RESULT = results[0]
visualize.display_instances(IMAGE[...,::-1], RESULT['rois'], RESULT['masks'], RESULT['class_ids'], 
                            dataset.class_names, RESULT['scores'], figsize=(8,8))

CLASSES = [dataset.class_names[x] for x in RESULT['class_ids']]
CLASSES.append("no class")
t = lbp.describe(IMAGE)
print((t))

## 4. Run Tests

In [None]:
RETRIEVE_NUM = 200
SHOW_NUM = 20

IDS = database.read('id')
base_image_index = np.where(IDS == rand_image_name)[0][0]

# Remove the base image from all datasets
IDS = np.delete(IDS, base_image_index, axis=0)

LBP = database.read('lbp')
LBP = np.delete(LBP, base_image_index, axis=0)

COL = database.read('color_moments')
COL = np.delete(COL, base_image_index, axis=0)

### 4.1 Feature Extraction on query image

#### Local Binary Patterns

We extract the Local Binary Pattern from the arbitrarily chosen image.

In [None]:
from imutils import build_montages
from sklearn.metrics import pairwise_distances
from utils import visualize_histogram

# Convert the RGB image into a Grayscaled image
GRAY = cv2.cvtColor(IMAGE, cv2.COLOR_BGR2GRAY)

original_lbp = lbp.describe(GRAY);
visualize_histogram(original_lbp, CLASSES)

Do a pairwise Euclidean distance calculation of the LBP of the chosen image and the LBPs of all images in the database

In [None]:
reshaped_lbp = original_lbp[0].reshape(1, -1)
distances_lbp = pairwise_distances(reshaped_lbp, LBP).ravel()

distances_sorted_indices = distances_lbp.argsort()
sorted_image_names = IDS[distances_sorted_indices]

Here we show the first N retrieved images

In [None]:
from utils import visualize_images

N = 5

classes = [get_class(i, coco_dat) for i in sorted_image_names[:N]]
sorted_images = [cv2.imread(i) for i in sorted_image_names[:N]]

visualize_images(sorted_images, classes)

In [None]:
HSV = cv2.cvtColor(IMAGE, cv2.COLOR_BGR2HSV)
hsv = color.process_hsv(HSV)
original_color = color.calc_moment(hsv)

distances_color = pairwise_distances(original_color.reshape(1,-1), COL)

import matplotlib.pyplot as plt

fig = plt.figure()
plt.imshow(HSV)
fig.show()
# indices_color = np.argsort(distances_color)[0][:RETRIEVE_NUM]

# image_names_color = [IDS[index] for index in indices_color[:SHOW_NUM]]
# images_color = [cv2.imread(IDS[index]) for index in indices_color[:SHOW_NUM]]
# images_color = [cv2.resize(image, (200,200)) for image in images_color]

# result = build_montages(images_color, (200, 200), (5,3))[0]
# cv2.imshow("Result", result)
# cv2.waitKey(0)
# cv2.destroyAllWindows()

[HSV histograms](https://www.researchgate.net/publication/4145416_Color-texture_feature_extraction_using_soft_decision_from_the_HSV_color_space)

In [None]:
from utils import Coltex

coltex = Coltex(4, 4)

        

In [None]:
def combine_distances(distances, weights):
    combined = np.zeros(distances.shape[-2:])
    for i in range(len(distances)):
        combined += distances[i] * weights[i]
    return combined

distance_combined = combine_distances(np.array([distances_lbp, distances_color]), [0.5, 0.5])
indices_combined = np.argsort(distance_combined)[0][:RETRIEVE_NUM]

image_names_combined = [IDS[index] for index in indices_combined[:SHOW_NUM]]
images_combined = [cv2.imread(IDS[index]) for index in indices_combined[:SHOW_NUM]]
images_combined = [cv2.resize(image, (200,200)) for image in images_combined]

# result = build_montages(images_combined, (200, 200), (5,3))[0]
# cv2.imshow("Result", result)
# cv2.waitKey(0)
# cv2.destroyAllWindows()

### Feature Extraction on segmented query image

In [None]:
def extract_region(image, roi):
    return image[roi[0]:roi[2], roi[1]:roi[3]]

def join_features(features, weights):
    combined = np.zeros(features.shape[1])
    for i in range(features.shape[0]):
        combined += np.multiply(features[i], weights[i])
    return combined

Here we take the result of the inference, which provides us with the masks of all the segments and we create the segments from the base image. The last segmented image is the inverted combination of all masks.

In [None]:
ROIS = np.array(RESULT['rois'])
ROIS = np.vstack([ROIS, [0, 0, IMAGE.shape[0], IMAGE.shape[1]]])
NUM_SEGMENTS = ROIS.shape[0]
print('Extracted {} segments'.format(NUM_SEGMENTS))

# Retrieve the masks for all the segments from the MRCNN result.
MASKS = np.array(RESULT['masks'])

# Combine all masks to create a mask for the image without segments.
COMBINED_MASK = np.zeros((MASKS.shape[:2]), dtype=bool)
for i in range(MASKS.shape[2]):
    COMBINED_MASK = np.logical_or(COMBINED_MASK, MASKS[:,:,i])
COMBINED_MASK = ~COMBINED_MASK

MASKS = np.dstack([MASKS, COMBINED_MASK])

# Create segmented images from the masks.
MASKED_IMAGES = []
    
for k in range(NUM_SEGMENTS):
    mask = IMAGE.copy()
    for i in range(IMAGE.shape[0]):
        for j in range(IMAGE.shape[1]):
            if MASKS[i, j, k] == False:
               mask[i, j] = 0
    MASKED_IMAGES.append(mask)





# REGION_WEIGHTS = [np.count_nonzero(MASKS[:,:,i]==0) / (IMAGE.shape[0] * IMAGE.shape[1]) for i in range(NUM_SEGMENTS)]
# REGION_WEIGHTS[-1] *= np.sort(REGION_WEIGHTS)[-2]
# REGION_WEIGHTS = np.divide(REGION_WEIGHTS, np.sum(REGION_WEIGHTS))
# print(REGION_WEIGHTS)

Here we visualize the segmented image with the corresponding class of each segment

In [None]:
from utils import visualize_images

visualize_images(MASKED_IMAGES, CLASSES, fontsize=20)

For the Local Binary Patterns we run the same algorithm as before to produce the LBP histograms of each segment

In [None]:
from utils import visualize_histograms

# For each of the images in the segmented image list, convert the image into grayscale
gray_masked_segments = [cv2.cvtColor(MASKED_IMAGES[i], cv2.COLOR_BGR2GRAY) for i in range(NUM_SEGMENTS)]
# Then extract the region using the bounding box of the segment.
gray_masked_segments = [extract_region(gray_masked_segments[i], ROIS[i]) for i in range(NUM_SEGMENTS)]

# Run the Local Binary Patterns algorithm on the grayscaled regions.
lbp_segments = [lbp.describe(gray_masked_segments[i]) for i in range(NUM_SEGMENTS)]


visualize_images(gray_masked_segments, CLASSES, fontsize=20, gray=True)
visualize_histograms(lbp_segments, CLASSES, fontsize=20)


In [None]:
# result_ex_lbp = join_features(np.array(ex_lbp), REGION_WEIGHTS)

# dists_ex_lbp = [pairwise_distances(ex_lbp[i].reshape(1,-1), LBP) for i in range(NUM_SEGMENTS)]

# dists_ex_lbp_com = combine_distances(np.array(dists_ex_lbp), REGION_WEIGHTS)
# dist_ex_lbp = pairwise_distances(result_ex_lbp.reshape(1,-1), LBP)

# indices_ex_lbp = np.argsort(dist_ex_lbp)[0][:RETRIEVE_NUM]

# images_ex_lbp = [cv2.imread(IDS[index]) for index in indices_ex_lbp]
# images_ex_lbp = [cv2.resize(image, (200,200)) for image in images_ex_lbp]
# image_names_ex_lbp = [IDS[index] for index in indices_ex_lbp]

In [None]:
# print(MASKS[0].shape)
ex_proc_hsv = [color.process_hsv(HSV, MASKS[:, :, i]) for i in range(NUM_SEGMENTS)]
ex_color = [color.calc_moment(ex_proc_hsv[i]) for i in range(NUM_SEGMENTS)]
result_ex_color = join_features(np.array(ex_color), REGION_WEIGHTS)

dists_ex_color = [pairwise_distances(ex_color[i].reshape(1,-1), COL) for i in range(NUM_SEGMENTS)]

dists_ex_color_com = combine_distances(np.array(dists_ex_color), REGION_WEIGHTS)
dist_ex_color = pairwise_distances(result_ex_color.reshape(1,-1), COL)

indices_ex_color = np.argsort(dists_ex_color_com)[0][:RETRIEVE_NUM]

images_ex_color = [cv2.imread(IDS[index]) for index in indices_ex_color]
images_ex_color = [cv2.resize(image, (200,200)) for image in images_ex_color]
image_names_ex = [IDS[index] for index in indices_ex_color]

# result = build_montages(images_ex_color, (200, 200), (5,3))[0]
# cv2.imshow("Result", result)
# cv2.waitKey(0)
# cv2.destroyAllWindows()

In [None]:
dist_ex_combined = combine_distances(np.array([dist_ex_lbp, dist_ex_color]), [0.5, 0.5])
indices_ex_combined = np.argsort(dist_ex_combined)[0][:RETRIEVE_NUM]

image_names_ex_combined = [IDS[index] for index in indices_ex_combined[:SHOW_NUM]]
images_ex_combined = [cv2.imread(IDS[index]) for index in indices_ex_combined[:SHOW_NUM]]
images_ex_combined = [cv2.resize(image, (200,200)) for image in images_ex_combined]

print(image_names_ex_combined[0])

# result = build_montages(images_ex_combined, (200, 200), (5,3))[0]
# cv2.imshow("Result", result)
# cv2.waitKey(0)
# cv2.destroyAllWindows()

### Returned Images
Here are the first N returned results of the unsegmented and segmented image resp

### Results

We calculate the Recall and Precision for both the unsegmented and segmented retrievals

For this experiment we will assume that an image is correctly retrieved when it shares at least one class with the query image

In [None]:
retrieved_un_ids = [int(get_image_id(IDS[i])) for i in indices_combined[:NUM_RETRIEVE]]
retrieved_un_ids[retrieved_un_ids == id] = int(get_image_id(IDS[indices_combined[NUM_RETRIEVE]]))

# precision_un, recall_un = calc_precision_recall(id, retrieved_un_ids, coco_dat)
un_score, un_scores = score(id, retrieved_un_ids, coco_dat)

un_classes = [get_class(i, coco_dat) for i in retrieved_un_ids]

# retrieved_un = [cv2.imread(IDS[index]) for index in retrieved_un_ids[:SHOW_NUM]]
# retrieved_un = [cv2.resize(image, (200,200)) for image in retrieved_un]

In [None]:
retrieved_seg_ids = [int(get_image_id(IDS[i])) for i in indices_ex_combined[:NUM_RETRIEVE]]
retrieved_seg_ids[retrieved_seg_ids == id] = int(get_image_id(IDS[indices_ex_combined[NUM_RETRIEVE]]))
seg_score, seg_scores = score(id, retrieved_seg_ids, coco_dat)
un_ex_classes = [get_class(i, coco_dat) for i in retrieved_seg_ids]

# retrieved_seg = [cv2.imread(IDS[index]) for index in retrieved_seg_ids[:SHOW_NUM]]
# retrieved_seg = [cv2.resize(image, (200,200)) for image in retrieved_seg]

In [None]:
print('Score Unsegmented image: {}'.format(un_score))
print('Score Segmented image: {}'.format(seg_score))

In [None]:
plt.imshow(IMAGE[...,::-1])
original_classes = get_class(get_image_id(rand_image_name), coco_dat)
plt.title(", ".join(original_classes))
plt.show()

In [None]:

vis_cbir_results(IMAGE, images_combined, un_classes)

In [None]:
vis_cbir_results(IMAGE, images_ex_combined, un_ex_classes)