# Búsqueda de Imágenes Similares Basado en su Contenido (CBIR)
## Autores: Joaquín Zepeda V. / Benjamín Irarrázabal T.

En el siguiente notebook se tiene como objetivo desarrollar e implementar un algoritmo CBIR utilizando las bases de datos de INRIA Holidays dataset y GPR1200. \\
En este, se trabajará con dos extractores de características distintos, uno clásico (HOG) y una red convolucional pre entrenada(VGG16). Luego, usando una medida de similitud (Distancia Euclidiana) y dos tipos de ranking definidos se tomarán los resultados obtenidos. 

In [None]:
# Importamos librerías importantes
import cv2
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import glob

from skimage.transform import resize
from skimage import exposure
import pickle

## Handcrafted metods

In [None]:
def classic_extractor_histogram(img):
    """
    Extractor de caracteristicas de una imagen con metodos handcrafted

    :param numpy.ndarray img: imagen
    
    :return: el vector caracteristicas
    """
    #The Histogram RGB feature descriptor

    # ajustamos el tamaño de la imagen
    resized_img = resize(img, (128*4, 64*4))

    #creating histogram features
    histogram, bin_edges = np.histogram(img, bins=256)

    return histogram

## CNN metod

In [None]:
from keras.applications.vgg16 import VGG16
from keras.models import Model

model = VGG16(include_top=False,input_shape=( 224, 224, 3))
# remove the output layer
model = Model(inputs=model.inputs, outputs=model.layers[-1].output)

def cnn_extractor_VGG16(img,model=model):
    """
    Extractor de caracteristicas de una imagen con una red resnet pre-entrenada

    :param numpy.ndarray img: imagen

    :return: el vector caracteristicas
    """
    # preprocess img
    img = cv2.resize(img,(224,224))
    img = np.reshape(img,[1,224,224,3]) # return the image with shaping that TF wants.


    # get extracted features
    features = model.predict(img)
    print('.',end='')

    # 1x7x7x512 = 25088
    return features.flatten()



## **1) Cálculo de vector de características**
Esta función calcula el vector de características y los guarda en un dataframe, el cual luego se guarda utilizando pickle.





In [None]:
def extract_features(img,tipo_extractor):
    """
    Extractor de caracteristicas de una imagen con metodos handcrafted

    :param numpy.ndarray img: imagen
    :param str tipo_extractor: Tipo del extractor, puede ser clasico o CNN

    :return: el vector caracteristicas
    """
    if tipo_extractor == 'classic':
        features = classic_extractor_histogram(img)
    elif tipo_extractor == 'CNN':
        features = cnn_extractor_VGG16(img)
    else:
        print('seleccione el tipo de extractor')
        return 
    return features

# INRIA Holidays
 Primero se utiliza el comando wget para cargar la base de datos jpg1 y jpg2, en donde luego se descomprimen utilizando el comando tar.

In [None]:
!mkdir jpg1
%cd /content/jpg1
!wget ftp://ftp.inrialpes.fr/pub/lear/douze/data/jpg1.tar.gz
!tar -xf jpg1.tar.gz

In [None]:
%cd ..
!mkdir jpg2
%cd /content/jpg2
!wget ftp://ftp.inrialpes.fr/pub/lear/douze/data/jpg2.tar.gz
!tar -xf jpg2.tar.gz
%cd ..

In [None]:
dataFrame = pd.DataFrame(columns=['Name', 'Feature vector'])

## Handcrafted metod

In [None]:
# Creamos una lista que guardará las imágenes correspondientes a la primera secuencia 
path = glob.glob("jpg1//jpg//*.jpg") #ojo la lista de nombres no esta ordenada

for file in path:
    img = cv2.imread(file)
    feature = extract_features(img,tipo_extractor='classic')
    dataFrame = dataFrame.append({'Name': file[10:-4], 'Feature vector':feature}, ignore_index=True)

In [None]:

# Creamos una lista que guardará las imágenes correspondientes a la primera secuencia 
path = glob.glob("jpg2//jpg//*.jpg") #ojo la lista de nombres no esta ordenada

for file in path:
    img = cv2.imread(file)
    feature = extract_features(img,tipo_extractor='classic')
    dataFrame = dataFrame.append({'Name': file[10:-4], 'Feature vector':feature}, ignore_index=True)

In [None]:
import pickle
with open('FeaturesHC_pkl', 'wb') as file:
    pickle.dump(dataFrame, file)

## CNN metod

In [None]:
dataFrame = pd.DataFrame(columns=['Name', 'Feature vector'])

In [None]:
# Creamos una lista que guardará las imágenes correspondientes a la primera secuencia 
path = glob.glob("jpg1//jpg//*.jpg") #ojo la lista de nombres no esta ordenada

for file in path:
    img = cv2.imread(file)
    feature = extract_features(img,tipo_extractor='CNN')
    dataFrame = dataFrame.append({'Name': file[10:-4], 'Feature vector':feature}, ignore_index=True)

In [None]:

# Creamos una lista que guardará las imágenes correspondientes a la primera secuencia 
path = glob.glob("jpg2//jpg//*.jpg") #ojo la lista de nombres no esta ordenada

for file in path:
    img = cv2.imread(file)
    feature = extract_features(img,tipo_extractor='CNN')
    dataFrame = dataFrame.append({'Name': file[10:-4], 'Feature vector':feature}, ignore_index=True)

In [None]:
import pickle
with open('FeaturesCNN_pkl', 'wb') as file:
    pickle.dump(dataFrame, file)

# GPR1200

In [None]:
!pip install wget

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting wget
  Downloading wget-3.2.zip (10 kB)
Building wheels for collected packages: wget
  Building wheel for wget (setup.py) ... [?25l[?25hdone
  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9675 sha256=60b9267e4936b807a72913e0f6a1e7f33fb52c75373c3edefbad0cc421275b3e
  Stored in directory: /root/.cache/pip/wheels/a1/b6/7c/0e63e34eb06634181c63adacca38b79ff8f35c37e3c13e3c02
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2


In [None]:
import wget
url = 'https://visual-computing.com/files/GPR1200/GPR1200.zip'
filename = wget.download(url)
!unzip GPR1200.zip
print('Se cargo la base de datos de GPR1200')

In [None]:
!pip install wget

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting wget
  Downloading wget-3.2.zip (10 kB)
Building wheels for collected packages: wget
  Building wheel for wget (setup.py) ... [?25l[?25hdone
  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9675 sha256=60b9267e4936b807a72913e0f6a1e7f33fb52c75373c3edefbad0cc421275b3e
  Stored in directory: /root/.cache/pip/wheels/a1/b6/7c/0e63e34eb06634181c63adacca38b79ff8f35c37e3c13e3c02
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2


In [None]:
import wget
url = 'https://visual-computing.com/files/GPR1200/GPR1200.zip'
filename = wget.download(url)

In [1]:
!unzip GPR1200.zip

In [None]:
# hay que tener ojo con los paths pues a veces toma como diferentes los archivos .jpg y .JPG
path = glob.glob("images/*.jpg")
path2 = glob.glob("images/*.JPEG")
path3 = glob.glob("images/*.JPG")

........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

# Handcrafted metod

In [None]:
GPR_Features = pd.DataFrame(columns=['Name', 'Features'])
for file in path:
    img = cv2.imread(file)
    feature = extract_features(img,'classic')
    GPR_Features = GPR_Features.append({'Name': file[6:-4], 'Features':feature}, ignore_index = True)

for file in path2:
    img = cv2.imread(file)
    feature = extract_features(img,'classic')
    GPR_Features = GPR_Features.append({'Name': file[6:-5], 'Features':feature}, ignore_index = True)

for file in path3:
    img = cv2.imread(file)
    feature = extract_features(img,'classic')
    GPR_Features = GPR_Features.append({'Name': file[6:-4], 'Features':feature}, ignore_index = True)

In [None]:
import pickle
with open('GPR_FeaturesHC_pkl', 'wb') as file:
    pickle.dump(GPR_Features, file)

## CNN metod

In [None]:
GPR_Features_cnn = pd.DataFrame(columns=['Name', 'Features'])
for file in path:
    img = cv2.imread(file)
    feature = extract_features(img,'CNN')
    GPR_Features_cnn = GPR_Features_cnn.append({'Name': file[6:-4], 'Features':feature}, ignore_index = True)

for file in path2:
    img = cv2.imread(file)
    feature = extract_features(img,'CNN')
    GPR_Features_cnn = GPR_Features_cnn.append({'Name': file[6:-5], 'Features':feature}, ignore_index = True)

for file in path3:
    img = cv2.imread(file)
    feature = extract_features(img,'CNN')
    GPR_Features_cnn = GPR_Features_cnn.append({'Name': file[6:-4], 'Features':feature}, ignore_index = True)

........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Se guardan las características como un archivo binario utilizando pickle, se codifica con el fin de no perder información, incluso se guarda como dataframe.

In [None]:
import pickle
with open('GPR_FeaturesCNN_pkl', 'wb') as file:
    pickle.dump(GPR_Features_cnn, file)

# Cargar features

Ya fueron extraidas las caracteristicas, para cargarlas se puede hacer clonando el repositorio de github.

In [None]:
!git clone https://github.com/BenjaminIrarrazabal/Laboratorios_Inteligencia.git
!ls
%cd Laboratorios_Inteligencia/Proyecto\ final/features

In [None]:
import pickle
# load saved model
with open('img_query_pkl' , 'rb') as f:
    img_query = pickle.load(f)
with open('img_database_pkl' , 'rb') as f:
    img_database = pickle.load(f)
with open('img_query_cnn_pkl' , 'rb') as f:
    img_query_cnn = pickle.load(f)
with open('img_database_cnn_pkl' , 'rb') as f:
    img_database_cnn = pickle.load(f)