<a href="https://colab.research.google.com/github/cagBRT/computer-vision/blob/master/CV4_Advanced_ContentBasedImageRetrieval.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Clone the entire repo.
!git clone -l -s https://github.com/cagBRT/computer-vision.git cloned-repo
%cd cloned-repo
!ls

# **Content Based Image Retrieval (CBIR)** 
CBIR uses an image to search for similiar images. 

This notebook uses the MNIST training dataset to train the autoencoder. <br>
Then use the MNIST test dataset to make predictions. 

**The CBIR Steps**:<br>
>Phase #1: Train the autoencoder<br>
Phase #2: Extract features from all images in our dataset by computing their latent-space representations using the autoencoder<br>
Phase #3: Compare latent-space vectors to find all relevant images in the dataset<br>

https://www.pyimagesearch.com/2020/03/30/autoencoders-for-content-based-image-retrieval-with-keras-and-tensorflow/


In [None]:
# import the necessary packages
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import Conv2DTranspose
from tensorflow.keras.layers import LeakyReLU
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Reshape
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K
import numpy as np

In [None]:
# set the matplotlib backend so figures can be saved in the background
# import the necessary packages
#from pyimagesearch.convautoencoder import ConvAutoencoder
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.datasets import mnist
import matplotlib.pyplot as plt
import numpy as np
import argparse
import cv2
from google.colab.patches import cv2_imshow

In [None]:
# import the necessary packages
from tensorflow.keras.models import Model
from tensorflow.keras.models import load_model
from tensorflow.keras.datasets import mnist
import pickle


In [None]:
# import the necessary packages
from imutils import build_montages

# **Autoencoders**
Autoencoder is an unsupervised artificial neural network that learns how to efficiently compress and encode data then learns how to reconstruct the data back from the reduced encoded representation to a representation that is as close to the original input as possible.<br>
<br>
Autoencoders, by design, reduce data dimensions by learning how to ignore the noise in the data.

In [None]:
image = cv2.imread("images/autoencoder.jpeg")
cv2_imshow(image)

# **Autoencoders consist of four main parts:<br>**
1- **Encoder**: the model learns to reduce the input dimensions and compress the input data into an encoded representation.<br>
2- **Bottleneck**: the layer that contains the compressed representation of the input data. This is the lowest possible dimensions of the input data.<br>
3- **Decoder**: the model learns how to reconstruct the data from the encoded representation, which should be as close to the original input as possible.<br>
4- **Reconstruction Loss**: This is the method that measures how well the decoder is performing and how close the output is to the original input.

# **Deep learning-based CBIR and image retrieval**

When training the autoencoder, we do not use class labels, which can be considered a form of unsupervised learning.<br>
The autoencoder computes the latent-space vector representation for each image in our dataset (i.e., our “feature vector” for a given image)
Then, at search time, we compute the distance between the latent-space vectors — the smaller the distance, the more relevant/visually similar are the two images.

In [None]:
image = cv2.imread("images/keras_autoencoder_steps.png")
cv2_imshow(image)

**Latent Space Projector**<br>
http://projector.tensorflow.org/

Go to this link and search for a word. <br>
You will see all the words close to your word. 


In [None]:
image = cv2.imread("images/latentSpace.png")
cv2_imshow(image)

# **Build the Autoencoder**

In [None]:
class ConvAutoencoder:
	@staticmethod
	#use this class to create an Autoencoder of any size and shape

	def build(width, height, depth, filters=(32, 64), latentDim=16):
		# initialize the input shape to be "channels last" along with
		# the channels dimension itself
		# channels dimension itself
		inputShape = (height, width, depth)
		chanDim = -1

		#1. define the input to the encoder
		inputs = Input(shape=inputShape)
		x = inputs

		#2. loop over the number of filters
		for f in filters:
			# apply a CONV => RELU => BN operation
			x = Conv2D(f, (3, 3), strides=2, padding="same")(x)
			x = LeakyReLU(alpha=0.2)(x)
			x = BatchNormalization(axis=chanDim)(x)
	 
		#3. flatten the network and then construct our latent vector
		volumeSize = K.int_shape(x)
		x = Flatten()(x)
		latent = Dense(latentDim, name="encoded")(x)
		#this layer is named because we will access this layer later
	
		#===Building the decoder===#
		#4. start building the decoder model which will accept the
		# output of the encoder as its inputs
		x = Dense(np.prod(volumeSize[1:]))(latent)
		x = Reshape((volumeSize[1], volumeSize[2], volumeSize[3]))(x)
	
		#5. loop over our number of filters again, but this time in
		# reverse order
		for f in filters[::-1]:
			# apply a CONV_TRANSPOSE => RELU => BN operation
			x = Conv2DTranspose(f, (3, 3), strides=2,
				padding="same")(x)
			x = LeakyReLU(alpha=0.2)(x)
			x = BatchNormalization(axis=chanDim)(x)
	 
		#6. apply a single CONV_TRANSPOSE layer used to recover the
		# original depth of the image
		x = Conv2DTranspose(depth, (3, 3), padding="same")(x)
		outputs = Activation("sigmoid", name="decoded")(x)
	
		#7. construct our autoencoder model
		autoencoder = Model(inputs, outputs, name="autoencoder")
	
		# return the autoencoder model
		return autoencoder

# **Visualize the predictions**

In [None]:
def visualize_predictions(decoded, gt, samples=30):
	# initialize our list of output images
	outputs = None
	# loop over our number of output samples
	for i in range(0, samples):
		# grab the original image and reconstructed image
		original = (gt[i] * 255).astype("uint8")
		recon = (decoded[i] * 255).astype("uint8")
		# stack the original and reconstructed image side-by-side
		output = np.hstack([original, recon])
		# if the outputs array is empty, initialize it as the current
		# side-by-side image display
		if outputs is None:
			outputs = output
		# otherwise, vertically stack the outputs
		else:
			outputs = np.vstack([outputs, output])
	# return the output images
	return outputs

# **Train the autoencoder**

In [None]:
# initialize the number of epochs to train for, initial learning rate,
# and batch size
EPOCHS = 20
INIT_LR = 1e-3
BS = 32
# load the MNIST dataset
print("[INFO] loading MNIST dataset...")
((trainX, _), (testX, _)) = mnist.load_data()
# add a channel dimension to every image in the dataset, then scale
# the pixel intensities to the range [0, 1]
trainX = np.expand_dims(trainX, axis=-1)
testX = np.expand_dims(testX, axis=-1)
trainX = trainX.astype("float32") / 255.0
testX = testX.astype("float32") / 255.0
# construct our convolutional autoencoder
print("[INFO] building autoencoder...")
autoencoder = ConvAutoencoder.build(28, 28, 1)
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
autoencoder.compile(loss="mse", optimizer=opt)
# train the convolutional autoencoder
H = autoencoder.fit(
	trainX, trainX,
	validation_data=(testX, testX),
	epochs=EPOCHS,
	batch_size=BS)

# **Make predictions**

In [None]:
# use the convolutional autoencoder to make predictions on the
# testing images, construct the visualization, and then save it
# to disk
print(" making predictions...")
print("original on the left, reconstructed on the right")
decoded = autoencoder.predict(testX)
vis = visualize_predictions(decoded, testX)
#cv2.imwrite('imageTest.jpg', vis)
cv2_imshow(vis)

# serialize the autoencoder model to disk
#print("[INFO] saving autoencoder...")
autoencoder.save('H', save_format="h5")
#save the model as "modelOut"
autoencoder.save("modelOut", save_format="h5")

# **Plot the training loss and accuracy**

In [None]:
# construct a plot that plots and saves the training history
N = np.arange(0, EPOCHS)
#plt.style.use("ggplot")
fig = plt.figure()

plt.plot(N, H.history["loss"], label="train_loss")
plt.plot(N, H.history["val_loss"], label="val_loss")
plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
#plt.savefig("plot.jpg")
plt.show()

The autoencoder is now trained. <br>
Now we do the feature extraction and indexing stage of the image retrieval pipeline.


In [None]:
# load the MNIST dataset
print("loading MNIST training split...")
((trainX, _), (testX, _)) = mnist.load_data()

# add a channel dimension to every image in the training split, then
# normalize the data of each pixel
trainX = np.expand_dims(trainX, axis=-1)
trainX = trainX.astype("float32") / 255.0

In [None]:
# load the autoencoder we created and called modelOut
print("loading the autoencoder model...")
autoencoder = load_model("modelOut")
# create the encoder model which consists of *just* the encoder
# portion of the autoencoder
encoder = Model(inputs=autoencoder.input,
	outputs=autoencoder.get_layer("encoded").output)
# quantify the contents of our input images using the encoder
print("encoding images...")
features = encoder.predict(trainX)


In [None]:
# construct a dictionary that maps the index of the MNIST training
# image to its corresponding latent-space representation
indexes = list(range(0, trainX.shape[0]))
data = {"indexes": indexes, "features": features}
# write the data dictionary to disk
print("saving index...")
f = open("index.pickle", "wb")
f.write(pickle.dumps(data))
f.close()

**Dictionaries**<br>
Dictionaries are Python's implementation of a data structure that is more generally known as an associative array. A dictionary consists of a collection of key-value pairs. Each key-value pair maps the key to its associated value.<br>
Our dictionary has: <br>
>indexes: Integer indices of each MNIST digit image in the dataset<br>
features: The corresponding feature vector for each image in the dataset<br>

It is in the file called index.pickle

In [None]:
#print(data)

Define a function called euclidean<br>
This function calculates the similarity between two feature vectors. <br>
It uses euclidean distance - cosine distance in this case. 

In [None]:
def euclidean(a, b):
	# compute and return the euclidean distance between two vectors
	return np.linalg.norm(a - b)

# **Define the search function**<br>
This function is responsible for comparing all feature vectors for similarity and returning the results

In [None]:
def perform_search(queryFeatures, index, maxResults=64):
	# initialize our list of results
	results = []

	# loop over our index
	for i in range(0, len(index["features"])):
		# compute the euclidean distance between our query features
		# and the features for the current image in our index, then
		# update our results list with a 2-tuple consisting of the
		# computed distance and the index of the image
		d = euclidean(queryFeatures, index["features"][i])
		results.append((d, i))
  
	# sort the results and grab the top ones
	results = sorted(results)[:maxResults]
	# return the list of results
	return results

# **Load and preprocess the data**

In [None]:
# load the MNIST dataset
print("loading MNIST dataset...")
((trainX, _), (testX, _)) = mnist.load_data()
# add a channel dimension to every image in the dataset, then scale
# the pixel intensities to the range [0, 1]
trainX = np.expand_dims(trainX, axis=-1)
testX = np.expand_dims(testX, axis=-1)
trainX = trainX.astype("float32") / 255.0
testX = testX.astype("float32") / 255.0

**Load the autoencoder and index**

In [None]:
# load the autoencoder model and index from disk
print("loading autoencoder and index...")
autoencoder = load_model("modelOut")
index = pickle.loads(open("index.pickle", "rb").read())
# create the encoder model which consists of *just* the encoder
# portion of the autoencoder
encoder = Model(inputs=autoencoder.input,
	outputs=autoencoder.get_layer("encoded").output)
# quantify the contents of our input testing images using the encoder
print("[INFO] encoding testing images...")
features = encoder.predict(testX)

**Use a random sample of the set as queries**<br>
10 random digits are selected and used as queries to the dataset. <br>
225 closest results are returned.<br>
Check the results - which searches were the most/least successful?

In [None]:
# randomly sample a set of testing query image indexes
queryIdxs = list(range(0, testX.shape[0]))
queryIdxs = np.random.choice(queryIdxs, size=10,
	replace=False)
# loop over the testing indexes
for i in queryIdxs:
	# take the features for the current image, find all similar
	# images in our dataset, and then initialize our list of result
	# images
	queryFeatures = features[i]
	results = perform_search(queryFeatures, index, maxResults=225)
	images = []
	# loop over the results
	for (d, j) in results:
		# grab the result image, convert it back to the range
		# [0, 255], and then update the images list
		image = (trainX[j] * 255).astype("uint8")
		image = np.dstack([image] * 3)
		images.append(image)
	# display the query image
	query = (testX[i] * 255).astype("uint8")
	cv2_imshow( query)
	# build a montage from the results and display it
	montage = build_montages(images, (28, 28), (15, 15))[0]
	cv2_imshow(montage)
	cv2.waitKey(0)

https://www.pyimagesearch.com/2020/03/30/autoencoders-for-content-based-image-retrieval-with-keras-and-tensorflow/
