<a href="https://colab.research.google.com/github/ChiefGokhlayeh/MV/blob/main/Assignment3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Machine Vision - Assignment 3: Object classification with HoG/SVM and Multilayer Perceptrons using public datasets

---

Prof. Dr. Markus Enzweiler, Esslingen University of Applied Sciences

markus.enzweiler@hs-esslingen.de

---

This is the third assignment for the "Machine Vision" lecture. 
It covers:
* training a pedestrian classifier using HoG features and linear SVM classifiers 
* training a neural network (MLP) to categorize an image into 1 out of 10 categories
* working with public benchmark datasets ([Daimler Pedestrian Detection Benchmark](https://markus-enzweiler.de/datasets/) and [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html))

**Make sure that "GPU" is selected in Runtime -> Change runtime type**

To successfully complete this assignment, it is assumed that you already have some experience in Python and numpy. You can either use [Google Colab](https://colab.research.google.com/) for free with a private (dedicated) Google account (recommended) or a local Jupyter installation.

---


## Preparations


### Import important libraries (you should probably start with these lines all the time ...)

In [None]:
# OpenCV
import cv2   

# NumPy                    
import numpy as np   

# glob
import glob # glob

# Matplotlib    
import matplotlib.pyplot as plt
import matplotlib.patches as patches
# make sure we show all plots directly below each cell
%matplotlib inline 

# Some Colab specific packages
if 'google.colab' in str(get_ipython()):
  # image display
  from google.colab.patches import cv2_imshow 


# scikit learn for SVM (support vector machines)
from sklearn import svm


### Some helper functions that we will need

In [None]:
def my_imshow(image, windowTitle="Image"):
  '''
  Displays an image and differentiates between Google Colab and a local Python installation. 

  Args: 
    image: The image to be displayed

  Returns:
    - 
  '''

  if 'google.colab' in str(get_ipython()):
    cv2_imshow(image)
  else:
    cv2.imshow(windowTitle, image)

## Exercise 1 - Pedestrian Classification using HoG / linear SVM (10 points) 

In this exercise you will be developing a pedestrian classifier based on the famous ["Histograms of Oriented Gradients (HoG) for Human Detection" approach](https://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf). The dataset you will be using is the public ["Daimler Pedestrian Detection dataset"](http://www.dariu.net/pami09.pdf) which is available in *pedestrianData.zip*.  

For HoG features, you will be using OpenCV, e.g. the *HOGDescriptor* class in *cv2*. Hint: Use ```help(cv2.HOGDescriptor()``` to view the documentation. Linear Support Vector Machines to train in the HoG feature spaces are available via [scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC). 

### Unzip the dataset. It provides 9800 labeled images for training and 9800 labeled images for testing. (**PROVIDED**)

In [None]:
################ Unzip the dataset in the Colab runtime #################
import io
import requests
import zipfile

url = 'https://raw.githubusercontent.com/ChiefGokhlayeh/MV/main/data/pedestrian_detection/pedestrian_data.zip'

response = requests.get(url, allow_redirects = True)
stream = io.BytesIO(response.content)

print("unzipping {}".format(url))

with zipfile.ZipFile(stream, 'r') as zip_ref:
    zip_ref.extractall("pedestrian")

# training images
trainPed    = glob.glob('pedestrian/data/train/ped_examples/' + "*.pgm")
trainNonPed = glob.glob('pedestrian/data/train/non-ped_examples/' + "*.pgm")

# test images
testPed     = glob.glob('pedestrian/data/test/ped_examples/' + "*.pgm")
testNonPed  = glob.glob('pedestrian/data/test/non-ped_examples/' + "*.pgm")

print("trainPed    : {} image paths".format(len(trainPed)))
print("trainNonPed : {} image paths".format(len(trainNonPed)))
print("testPed     : {} image paths".format(len(testPed)))
print("testNonPed  : {} image paths".format(len(testNonPed)))

### Visualize a few images to get a feeling for what the data looks like (**PROVIDED**)

In [None]:
def displayRandomImages(dataSet):
  indices = np.arange(len(dataSet))
  np.random.shuffle(indices)
  count=0
  for i in indices[0:50]:
      plt.subplot(5,10,count+1)
      plt.xticks([])
      plt.yticks([])
      plt.grid(False)
      sampleImage = cv2.imread(dataSet[i])   
      plt.imshow(sampleImage)
      count = count+1
  plt.show()


# display some pedestrians
print("Random pedestrians")
displayRandomImages(trainPed)

# display some non-pedestrians
print("Random non-pedestrians")
displayRandomImages(trainNonPed)

### HoG feature transform (**add your code here**)

Compute HoG features using OpenCV's ```cv2.HOGDescriptor.compute()``` function for all 4 image sets (trainPed, trainNonPed, testPed, testNonPed). Use the following parameters for HoG:

* ```winSize = (18,36)```
* ```blockSize = (6,6)```
* ```blockStride = (3,3)```
* ```cellSize = (3,3)```
* ```nbins = 12```

The result should be a 2640 dimensional HoG feature vector for every 18x36 pixel input image. Store the HoG feature representation in 4 matrices (one per image set) with the shape: numImages (rows) x featureDimension (cols). You will need to transpose the output of ```cv2.HOGDescriptor.compute()``` to make it a column vector. 

Your result should be four matrices of size 4800(5000) x 2640.  

Hint: The [C++ documentation of HOGDescriptor()](https://docs.opencv.org/master/d5/d33/structcv_1_1HOGDescriptor.html#a5c8e8ce0578512fe80493ed3ed88ca83) might be helpful ...

In [None]:
# apply HoG feature transform to the 4 image sets

# --------------------------------------------------#
def applyHogTransform(image_set, hog_descriptor):

  # feature dimension
  hog_dim = hog_descriptor.getDescriptorSize()

  # num_images x feature_dimension
  hog_feature_set = np.zeros((len(image_set), hog_dim))

  # loop through the image_set and compute HoG features for each image
  for i, image_path in enumerate(image_set):
    # read image
    image = cv2.imread(image_path)

    # compute HoG features
    hog_feature_set[i, :] = hog_descriptor.compute(image).flatten()
 
  print("Shape of feature matrix: {}".format(hog_feature_set.shape))
  return hog_feature_set
# --------------------------------------------------#

# create an instance of cv2.HOGDescriptor with the given parameters
win_size = (18, 36)
block_size = (6, 6)
block_stride = (3, 3)
cell_size = (3, 3)
nbins = 12
hog_descriptor = cv2.HOGDescriptor(win_size, block_size, block_stride, cell_size, nbins)

################################

# apply transform to all four image sets
trainPedHogFeatures    = applyHogTransform(trainPed,    hog_descriptor)
trainNonPedHogFeatures = applyHogTransform(trainNonPed, hog_descriptor)
testPedHogFeatures     = applyHogTransform(testPed,     hog_descriptor)
testNonPedHogFeatures  = applyHogTransform(testNonPed,  hog_descriptor)

### Linear SVM Training (**add your code here**)

Train a linear support vector machine on the HoG representation of trainPed and trainNonPed using  [scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC). Set the "C" parameter to ```C=0.1```. 

Training a linear SVM (see [svm.LinearSVC.fit()](https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC.fit)) requires the **whole** training set (trainPedHogFeatures and trainNonPedHogFeatures) as a single matrix. Browse through [svm.LinearSVC.fit()](https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC.fit) to find out on how to concatenate the data. 

Additionally, a vector of target class labels is required. Use the label ```1```for pedestrian samples and ```-1```for non-pedestrian samples. Hint: ```np.concatenate()``` and ```np.full()``` might come in handy ...



In [None]:
# Linear SVM Training
import sklearn

# create instance of LinearSVC
svc = sklearn.svm.LinearSVC(C = 0.1)

# training data
features = np.vstack((trainPedHogFeatures, trainNonPedHogFeatures))

# labels for the training data 
target = np.concatenate((np.full(len(trainPed), 1), np.full(len(trainNonPed), -1)))
                                               
# train SVM
trained = svc.fit(features, target)

###############################

### Evaluate Your HoG / SVM Classifier (**add your code here**)

Use [svm.LinearSVC.score()](https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC.score) to compute the mean accuracy of your classifier on the training set and on the test set, both in the HoG feature space. 

Your classifier should reach approximately 99.4% accuracy on the training set and 75.4% accuracy on the test set.


In [None]:
# get score on training set
features = np.vstack((trainPedHogFeatures, trainNonPedHogFeatures))
target = np.concatenate((np.full(len(trainPed), 1), np.full(len(trainNonPed), -1)))

mean_accuracy = svc.score(features, target)
print("Mean accuracy (training set) = {}%".format(mean_accuracy))


# get score on test set
features = np.vstack((testPedHogFeatures, testNonPedHogFeatures))
target = np.concatenate((np.full(len(testPed), 1), np.full(len(testNonPed), -1)))

mean_accuracy = svc.score(features, target)
print("Mean accuracy (test set) = {}%".format(mean_accuracy))

###############################

### Find the best and worst predictions of your HoG / SVM classifier (**PROVIDED**)



In [None]:
# find the pedestrian test sample with the highest SVM prediction score
predictions = svc.decision_function(testPedHogFeatures)
best_index = np.argmax(predictions)
sample_image = cv2.imread(testPed[best_index])  
plt.figure(figsize=(10,5))
plt.imshow(sample_image, cmap='gray') 
plt.title("PED: highest score at index {}: {}".format(best_index, predictions[best_index]));

# find the pedestrian test sample with the lowest SVM prediction score
worst_index = np.argmin(predictions)
sample_image = cv2.imread(testPed[worst_index])  
plt.figure(figsize=(10,5))
plt.imshow(sample_image, cmap='gray') 
plt.title("PED: lowest score at index {}: {}".format(worst_index, predictions[worst_index]));

# find the non-pedestrian test sample with the highest SVM prediction score
predictions = svc.decision_function(testNonPedHogFeatures)
best_index = np.argmin(predictions)
sample_image = cv2.imread(testNonPed[best_index])  
plt.figure(figsize=(10,5))
plt.imshow(sample_image, cmap='gray') 
plt.title("NON-PED: highest score at index {}: {}".format(best_index, predictions[best_index]));

# find the non-pedestrian test sample with the lowest SVM prediction score
worst_index = np.argmax(predictions)
sample_image = cv2.imread(testNonPed[worst_index])  
plt.figure(figsize=(10,5))
plt.imshow(sample_image, cmap='gray') 
plt.title("NON-PED: lowest score at index {}: {}".format(worst_index, predictions[worst_index]));

## Exercise 2 - CIFAR-10 Classification using Multilayer Perceptrons in TensorFlow / Keras (10 points)

In this exercise you will train a multilayer perception neural network using TensorFlow and Keras on the [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset. There will be no previous feature transform, i.e. the raw pixel values are the input to the neural network. Adam will be used as optimizer. 

[Keras](https://keras.io/) is a high-level API built on top of TensorFlow that provides an easier API to the training of neural networks in comparison to plain TensorFlow.  

### Some more imports


In [None]:
# TensorFlow and Keras
import tensorflow as tf
from keras.models import Sequential
from tensorflow.keras import layers
from keras.layers import Flatten, Dense
from keras.optimizers import Adam

### Getting familiar with the CIFAR-10 dataset (**PROVIDED**)

In [None]:
# CIFAR-10 is available as standard dataset in Keras. Nice :)
from keras.datasets import cifar10
from keras.utils import to_categorical

# load the data
(trainSamples, _trainLabels), (testSamples, _testLabels) = cifar10.load_data()

# scale the image data to float 0-1 (always recommended with neural networks)
trainSamples = trainSamples.astype('float32') / 255.0
testSamples  = testSamples.astype('float32') / 255.0

# convert a class vector (integers) to binary class matrix.
trainLabels  = to_categorical(_trainLabels)
testLabels   = to_categorical(_testLabels)

# text representation of class labels
classNames = ['airplane', 'automobile', 'bird', \
               'cat', 'deer', 'dog', \
               'frog', 'horse', 'ship', 'truck']

# Visualize 25 random images
plt.figure(figsize=(10,10))
indices = np.arange(len(trainSamples))
np.random.shuffle(indices)
count=0
for i in indices[0:25]:
    plt.subplot(5,5,count+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(trainSamples[i], cmap=plt.cm.binary)
    plt.xlabel("label: {}".format(classNames[np.argmax(trainLabels[i])]))
    count = count+1
plt.show()

### Neural Network Model Definition (**add your code here**)

We want to design a standard "feed-forward" multilayer perceptron. In Keras-terms, this is referred to as a [sequential model](https://www.tensorflow.org/guide/keras/sequential_model). 

We will need the following layers (input to output):
* 1 [Flatten](https://keras.io/api/layers/reshaping_layers/flatten/) layer that transforms our 32x32x3 pixel input to a 3072-dimensional vector
* 4 [Dense](https://keras.io/api/layers/core_layers/dense/) hidden layers with 2048, 1024, 512, 64 neurons and ```relu```activation functions

* 1 [Dense](https://keras.io/api/layers/core_layers/dense/) output layer with 10 neurons (1 per class) and ```softmax``` activation. 

[Adam](https://keras.io/api/optimizers/adam/) will be used as optimizer, see ```model.compile()```. As loss function we will use cross-entropy. 

Your ```model.summary()``` should look as follows (layer indices might differ):

```_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_1 (Flatten)          (None, 3072)              0         
_________________________________________________________________
dense_5 (Dense)              (None, 2048)              6293504   
_________________________________________________________________
dense_6 (Dense)              (None, 1024)              2098176   
_________________________________________________________________
dense_7 (Dense)              (None, 512)               524800    
_________________________________________________________________
dense_8 (Dense)              (None, 64)                32832     
_________________________________________________________________
dense_9 (Dense)              (None, 10)                650       
=================================================================
Total params: 8,949,962
Trainable params: 8,949,962
Non-trainable params: 0

```

In [None]:
model = Sequential()

##### YOUR CODE GOES HERE ######
# Define the layers of the MLP network model
model = Sequential(
    [
     # input layer
     layers.Flatten(input_shape=(32, 32, 3)),
     # hidden layers
     layers.Dense(2048, activation="relu", name="dense_5"),
     layers.Dense(1024, activation="relu", name="dense_6"),
     layers.Dense(512, activation="relu", name="dense_7"),
     layers.Dense(64, activation="relu", name="dense_8"),
     # output layer
     layers.Dense(10, activation="softmax", name="dense_9")
    ]
)

################################

# compile the model including optimizer and loss
model.compile(optimizer=Adam(learning_rate=3e-4, beta_1=0.9, beta_2=0.999),
             loss='categorical_crossentropy',
             metrics=['accuracy'])

print(model.summary())

### Neural Network Training (**add your code here**)

Train your multilayer perceptron network using [model.fit()](https://keras.io/api/models/model_training_apis/). Pass ```trainSamples```and ```trainLabels```as training set and ```testSamples```and ```testLabels``` as ```validation_data```. 

Use the following hyper-parameters:
* ```batch_size = 50```
* ```epochs = 25```
* ```verbose = 1```

[model.fit()](https://keras.io/api/models/model_training_apis/) returns an history object which we will use later to visualize the behavior of the training and validation loss over time (epochs). 

The overall training should take about 5 seconds per epoch (**on a GPU**). Reported accuracies on the training (validation) data should be approx. 85% (53%) after 25 training epocs.   


In [None]:
history = model.fit(trainSamples,
          trainLabels,
          validation_data = (testSamples, testLabels),
          batch_size = 50,
          epochs = 25,
          verbose = 1)

### Visualize the behavior of the loss (**add your code here**)

Using the ```history``` object returned by ```model.fit()```, plot the training loss and validation loss as a function of epochs.  

In [None]:
plt.figure(figsize = (12, 5))
plt.xlabel("epoch")
plt.ylabel("loss")
plt.title("Training & Validation Loss")
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.legend(("Traning Loss", "Validation Loss"))
plt.grid(True)

### Run your network on some images to get predictions (**PROVIDED**)

In [None]:
# select 50 images randomly from the test set and run them through the MLP

plt.figure(figsize=(20,10))

# 50 random images
indices = np.arange(len(testSamples))
np.random.shuffle(indices)
count=0
for i in indices[0:50]:
    plt.subplot(5,10,count+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(testSamples[i], cmap=plt.cm.binary)

    # predict MLP (need to reshape 32 x 32 x 3 pixels -> 1 x 3072 pixels)
    prediction = model.predict(testSamples[i].reshape(1,32*32*3))
   
    # visualize true and predicted labels
    groundTruthLabel = classNames[np.argmax(testLabels[i])]
    predictedLabel   = classNames[np.argmax(prediction)]
    plt.xlabel("T: {} / P: {}".format(groundTruthLabel, predictedLabel))
    count = count+1
plt.show()