# Quiz 6 - Object Recognition: BoF vs ConvNets
#### Edson Roteia Araujo Junior e João Pedro Moreira Ferreira

### Instructions
The goal of this quiz is to implement two object recognition approaches:

1.     A classifier based on bag of features:

    - Use SVM or Random Forest as classifiers.

    - Try different sizes for the dictionary.

2.     A classifier using ConvNet implemented in Keras:
 
     - Use an architecture inspired in the LeNet5

Your code must be implemented on a notebook python and you must use the CIFAR-10 (https://www.cs.toronto.edu/~kriz/cifar.html) for training and testing.

The notebook must present a confusion matrix and the average accuracy for each approach. You also have to report both training and test accuracies.



#### In order to enable the SIFT module of the OpenCV library.<span style="color:red"> We install the opencv-contrib-python at version 3.4.2.16.</span> You should install it to make this notebook work. You can install it through any python package manager, since the module is in a pypi repository. More informations [here](https://pypi.org/project/opencv-contrib-python/3.4.2.16/) ####

#### Also you may need to install the <span style="color:red">scikit-image</span> module. Again it can be installed by your python package manager ####

In [1]:
!pip install opencv-contrib-python==3.4.2.16



### Load required Libraries

In [2]:
import numpy as np
import cv2 
from sklearn.cluster import KMeans
from sklearn.cluster import MiniBatchKMeans
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
from sklearn.neighbors import NearestNeighbors
from sklearn.metrics import confusion_matrix
from keras.datasets import cifar10
from keras.utils import np_utils
from skimage import img_as_ubyte
import warnings
warnings.filterwarnings('ignore')

Using TensorFlow backend.


### Load Dataset

In [4]:
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train = X_train / 255.0
X_test = X_test / 255.0

y_train_not_categorical = y_train.copy()
y_train = np_utils.to_categorical(y_train)
y_test_not_categorical = y_test.copy()
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]

print("Train Data Size: %d" % len(X_train))
print("Test Data Size: %d" % len(X_test))
print("Number of Classes: %d" % num_classes)

Train Data Size: 50000
Test Data Size: 10000
Number of Classes: 10


## Bag of Features


### Initializing Methods

In [5]:
detect = cv2.xfeatures2d.SIFT_create()
extract = cv2.xfeatures2d.SIFT_create()

### FLANN Initialization

In [7]:
FLANN_INDEX_KDTREE = 0
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = {}  # or pass empty dictionary
flann = cv2.FlannBasedMatcher(index_params,search_params)

### BoW Initialization

In [14]:
vocab_size = 10
bow = cv2.BOWKMeansTrainer(vocab_size)

### Adding descriptors to BoW

In [16]:
for img in X_train:
    img = img_as_ubyte(img)
    gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
    kp, des_img = extract.detectAndCompute(gray, None)
    if des_img is not None:
        bow.add(des_img)

### Creating Clusters

In [17]:
dictionary = bow.cluster()

### Creating Dictionary

In [18]:
bowDiction = cv2.BOWImgDescriptorExtractor(extract, flann)
bowDiction.setVocabulary(dictionary)

### Extracting bags from training images

In [73]:
train_bags = []
for img in X_train:
    img = img_as_ubyte(img)
    gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
    bow_value = np.ravel(bowDiction.compute(gray, detect.detect(img)))
    if len(bow_value) < 10:
        bow_value = np.zeros(10)
    train_bags.append(bow_value)
train_bags = np.asarray(train_bags)

### Extracting bags from test images

In [75]:
test_bags = []
for img in X_test:
    img = img_as_ubyte(img)
    gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
    bow_value = np.ravel(bowDiction.compute(gray, detect.detect(img)))
    if len(bow_value) < 10:
        bow_value = np.zeros(10)
    test_bags.append(bow_value)
test_bags = np.asarray(test_bags)

### Random Forest Classifier

In [None]:
params_RF = {"n_estimators": 50, "max_depth": 15, "min_samples_leaf": 0.2, }
rfc = RandomForestClassifier()
# tune the hyperparameters via a randomized search
grid = RandomizedSearchCV(rfc, params_RF)
grid.fit(train_bags, y_train)
acc = grid.score(test_bags, y_test)
print("[INFO] grid search accuracy: {:.2f}%".format(acc * 100))
print("[INFO] randomized search best parameters: {}".format(grid.best_params_))

### SVM Classifier

In [None]:
params_SVM = {"kernel": ["linear", "poly", "rbf", "sigmoid"], "decision_function_shape": ["ovo", "ovr", None]}
clf = svm.SVC()
# tune the hyperparameters via a randomized search
grid = RandomizedSearchCV(clf, params_SVM)
start = time.time()
grid.fit(train_bags, y_train_not_categorical)
#print("[INFO] randomized search took {:.2f} seconds".format(time.time() - start))
acc = grid.score(test_bags, y_test_not_categorical)
print("[INFO] grid search accuracy: {:.2f}%".format(acc * 100))
print("[INFO] randomized search best parameters: {}".format(grid.best_params_))

In [None]:
def featureExtraction(sift,image):
    image = img_as_ubyte(image)
    gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
    keypoints, descriptors = sift.detectAndCompute(gray, None)
    return keypoints, descriptors
sift = cv2.xfeatures2d.SIFT_create()

#### Training Features

In [None]:
train_descriptors_list = []
for image in X_train:
    keypoints, descriptors = featureExtraction(sift,image)
    train_descriptors_list.append(descriptors)
    
print("Number of training images with descriptors: %d" % sum(x is not None for x in train_descriptors_list))

#### Test Features

In [None]:
test_descriptors_list = []
for image in X_test:
    keypoints, descriptors = featureExtraction(sift,image)
    test_descriptors_list.append(descriptors)
    
print("Number of test images with descriptors: %d" % sum(x is not None for x in test_descriptors_list))

### Clustering

In [None]:
def build_histogram(descriptor_list, cluster_alg):
    histogram = np.zeros(len(cluster_alg.cluster_centers_))
    cluster_result =  cluster_alg.predict(descriptor_list)
    for i in cluster_result:
        histogram[i] += 1.0
    return histogram

In [None]:
flatten_descriptors_list = []
for sublist in train_descriptors_list:
    if not sublist is None:  
      for item in sublist:
          flatten_descriptors_list.append(item)

print("Number of descriptors in total: %d" % len(flatten_descriptors_list))

In [None]:
def get_bovws(k):
  kmeans = MiniBatchKMeans(n_clusters = k, verbose = False, max_iter = 5000, batch_size = 500)
  kmeans.fit(flatten_descriptors_list)
  
  train_image_bags = []
  for image_descriptors in train_descriptors_list:
    if(image_descriptors is not None):
      histogram = build_histogram(image_descriptors, kmeans)
      train_image_bags.append(histogram)
    else:
      train_image_bags.append(np.zeros((k,)))
  
  test_image_bags = []
  for image_descriptors in test_descriptors_list:
    if(image_descriptors is not None):
      histogram = build_histogram(image_descriptors, kmeans)
      test_image_bags.append(histogram)
    else:
      test_image_bags.append(np.zeros((k,)))
  
  return train_image_bags, test_image_bags


In [None]:
k = 3
train_image_bags, test_image_bags = get_bovws(k)
print(train_image_bags[0:5])
print(test_image_bags[0:5])

In [None]:
bow = cv2.BOWKMeansTrainer(100)
for desc in flatten_descriptors_list:
    bow.add(desc)
print 'Added all to BoW'
dictionary = bow.cluster()
print 'Created clusters'

# FLANN parameters
FLANN_INDEX_KDTREE = 0
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = {}  # or pass empty dictionary
flann = cv2.FlannBasedMatcher(index_params,search_params)

bowDiction = cv2.BOWImgDescriptorExtractor(cv2.xfeatures2d.SIFT_create(), flann)
bowDiction.setVocabulary(dictionary)


### Random Forest Classifier

In [None]:
rfc = RandomForestClassifier()
rfc.fit(train_image_bags, y_train)
#acc = rfc.score(test_image_bags, y_test)
acc = rfc.score(train_image_bags, y_train)
print("Random Forest Classifier accuracy: {:.2f}%".format(acc * 100))
predictions = rfc.predict(test_image_bags)
conf_matrix = confusion_matrix(y_test.argmax(axis=1), predictions.argmax(axis=1))
print(conf_matrix)

### SVM Classifier

In [None]:
# clf = SVC()
# clf.fit(image_bags, y_train_not_categorical)
# acc = clf.score(image_bags, y_train_not_categorical)
# print("[INFO] SVM search accuracy: {:.2f}%".format(acc * 100))

## ConvNet

In [None]:
# Plot ad hoc CIFAR10 instances
from keras.datasets import cifar10
from matplotlib import pyplot
from PIL import Image
# from scipy.misc import toimage -> DEPRECATED
from keras.layers import *

# load data
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

In [None]:
# create a grid of 3x3 images
for i in range(0, 9):
    pyplot.subplot(330 + 1 + i)
    pyplot.imshow(Image.fromarray(X_train[i]))
# show the plot
pyplot.show()

#### Simple Covolutional Neural Network for CIFAR-10 ####

In [None]:
import numpy
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.constraints import maxnorm
from keras.optimizers import SGD
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import np_utils
from keras import backend as K
from keras.layers import *
K.set_image_dim_ordering('th')

seed = 7
numpy.random.seed(seed)

(X_train, y_train), (X_test, y_test) = cifar10.load_data()

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train = X_train / 255.0
X_test = X_test / 255.0

y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]

model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(3, 32, 32), padding='same', activation='relu', kernel_constraint=maxnorm(3)))
model.add(Dropout(0.2))
model.add(Conv2D(32, (3, 3), activation='relu', padding='same', kernel_constraint=maxnorm(3)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(512, activation='relu', kernel_constraint=maxnorm(3)))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
# Compile model
epochs = 25
lrate = 0.01
decay = lrate/epochs
sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
print(model.summary())

# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, batch_size=32)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))


#### Larger Covolutional Neural Network for CIFAR-10 ####

In [None]:
# Create the model
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(3, 32, 32), activation='relu', padding='same'))
model.add(Dropout(0.2))
model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(Dropout(0.2))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(Dropout(0.2))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dropout(0.2))
model.add(Dense(1024, activation='relu', kernel_constraint=maxnorm(3)))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu', kernel_constraint=maxnorm(3)))
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax'))
# Compile model
epochs = 25
lrate = 0.01
decay = lrate/epochs
sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
print(model.summary())

numpy.random.seed(seed)
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, batch_size=64)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

#### Architecture Inspired in the LeNet5 ####

![lenet5](imgs/lenet5.jpg)

In [None]:
model = Sequential()

# model.add(Conv2D(32, (3, 3), input_shape=(3, 32, 32), activation='relu', padding='same'))
model.add(Conv2D(filters=6, kernel_size=(3, 3), activation='relu', input_shape=(3,32,32)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(filters=16, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(units=120, activation='relu'))
model.add(Dense(units=84, activation='relu'))
model.add(Dense(units=10, activation = 'softmax'))

epochs = 25
lrate = 0.01
decay = lrate/epochs
sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
print(model.summary())

numpy.random.seed(seed)
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, batch_size=64)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))