# People detection and counting with Computer Vision

#### Why count people?

Counting people may sound like an easy task at first thought, but it has often proven itself to be a delicate and controversial task. Take Donald Trump's inauguration in Januaray 2017 for example. President Trump and his administration were not shy to claim that "This was the largest audience to ever witness an inauguration". However, given the image below, would you agree?

![Obama_vs_Trump_Inauguration_Reuteurs_and_Pool_Camera](Obama_vs_Trump.jpg)

##### A view of the National Mall during Barack Obama’s first inauguration in 2009 and for Donald Trump’s inauguration in 2017
Photos by Reuters and Pool Camera

### Objectives
For this project, I will attempt, from a given photograph, to identify and and count the people in it using traditional computer vision techniques.

 As this is an introductory project to Computer Vision, to avoid excessive complexity, I will focus my work on photographs including only a few people and not large crouds. I will work with photographs taken from angles where the people's silouhettes are fairly clear.

### An Object Detection problem

The task of interest in this project is the detection of a person from an image. Indeed, once we can do this, counting them is a trivial task. The detection of people is a well studied problem in the field of Image Recognition and Object Detection.

An Image Recognition and Object Detection problem is generally solved by training a binary image classifier to identify whether a patch of an image represents the considered object or not. The methodology to build such a classifier goes as follows.

Given a dataset of labelled images:
* The images are preprocessed. 
* Significant features are extracted from these preprocessed images. 
* The extracted feature data is split into two subsets, a training set and a test set.
* The training data is used to train a classifier model.
* The model is evaluated and optimised using the test data.

The key step in this process is the feature extraction. To train a machine learning model effectively, the input data should describe the features that are actually significant for the considered problem. Irrelevant information, "noise" data, should be minimised. This is especially true in image recognition where only little information carried by an image is actually useful for detecting an object. 

For Imgae Recogntion and Object Detection, Haar-like features, Histogram of Oriented Gradients (HOG), Scale-Invariant Feature Transform (SIFT) and Speeded Up Robust Feature (SURF) are extensively used.

Once a suitable feature extraction process is defined. The next issue to consider is the machine learning model. Many binary classifier model exist, it is all about finding the one that will perform best with out data.

As the options for feature extraction and classifying are vast, I looked into the academic litterature of the field and found three solutions for this people detection problem:

* __The Viola-Jones object detection framework__

Objects are detected here using Haar features of images and cascade classifiers that are a variant of the AdaBoost learning algorithm.
This framework was first proposed in a paper by Paul Viola and Micheal Jones and proved to be efficient in detecting human faces.

_P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001, 2001_




* __Histogram of Orientend Gradients (HOG) and Support Vector Machines (SVM)__

The significant features of an image are extracted and vectorised using a HOG descriptor. SVMs are then trained and used to detect objects within an image. This framework was proposed by Dalal and Triggs who published a paper, proving it's efficiency for human detection fitting this project's needs very well.

_Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05, 2005_




* __Deep Learning__

Nowadays, deep learning algorithms are by far the best performing solutions to image recognition and object detection. Human detection using a deep learning framework should be no exception. 


As I want to explore traditional computer vision techniques in this project I will not go with Deep Learning right away. Although the Viola Jones framework is a reference in the field of image recognition and performs well with human faces, it is not so efficient when it comes to detecting entire people. On the other hand, using HOG and SVMs, as detailed in Dalal and Triggs paper, has a proven efficiency for human detection. I will thus go forward implementing a __HOG + SVM__ people detector in this project.

## Building a HOG + SVM people detector

For this experimental part of the project, I will extensively use the [OpenCV](http://opencv.org) library.

### Collecting a dataset of images

Several relevant datasets are available online:
    
* [The INRIA Person Dataset](http://pascal.inrialpes.fr/data/human/) 
* [The Penn-Fudan Database for Pedestrian Detection and Segmentation](https://www.cis.upenn.edu/~jshi/ped_html/)
* [The Caltech Pedestrian Detection Benchmark](http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/)

I collected images from these sources and stored them in a Samples folder at the root directory of my project. These images are organised in two sub-folders, one for positive samples (images with people) and the other for negative samples (images with no people). 

#### Importing the dependency modules

In [None]:
import os
import glob
import re

import numpy as np

from sklearn.model_selection import train_test_split
from sklearn import svm as sklearn_svm

import cv2

import xml.etree.ElementTree as ET

import imutils
from imutils.object_detection import non_max_suppression
from imutils import paths

#### Setting the path to the image files

In [None]:
positive_image_paths = glob.glob(os.getcwd() + '/Samples/Single_Pedestrian/*')
negative_image_paths = glob.glob(os.getcwd() + '/Samples/No_Pedestrian/*')

## Extracting the significant features of the images

The fundamental idea behind the HOG descriptor is that the appearance and local form of an object in an image can be described by the distribution ("Histogram...") of the direction of it's edges ("...of Oriented Gradients").

The input image is divided into given size cells. Each cell is filtered using \[-1,0,1\] and \[-1,0,1\]T kernels to obtain the intensity and orientation of the gradients. Sobel and Gaussian filters have also been tested but have not produced better results. A histogram of the orientation of the gradients is then built for each cell. The histograms are then normalised over bigger sized and overlapping blocks that group adjancent cells. This is improves the robustness to lighting changes. 

As a result of this process, we get a single vector describing features that are actually significant in describing the object, which is what we need to effectively train our classifier.

<img src="DalalTriggsDiagram.png" alt="Dalal_Triggs_Diagram" style="width: 300px;"/>

_Figure downloaded from Berkin Bilgic's 
[paper](https://goo.gl/v934WD), Fast Human Detection with Cascaded Ensembles, 2010_

#### Create and configure the HOG descriptor

The values chosen to configure my HOG descriptor are those recommended in Dalal and Triggs paper on human detection using HOG.

In [None]:
#The HOG will be computed on the entire image
winSize = (64,128)
#Cell and block sizes are those used in Dalal and Triggs's paper
#A block groups 4 adjacent cells
blockSize = (16,16)
blockStride = (8,8)
cellSize = (8,8)
#The number of bins is also chosen based on Dalal and Triggs's paper
nbins = 9
#The following are by default values based on Dalal and Triggs's work
derivAperture = 1
winSigma = -1.
histogramNormType = 0
L2HysThreshold = 0.2
gammaCorrection = 1
nlevels = 64
#Gradients are oriented
useSignedGradients = False

#Initialise the HOG descriptor
hog = cv2.HOGDescriptor(winSize,blockSize,blockStride,cellSize,nbins,derivAperture,winSigma,histogramNormType,L2HysThreshold,gammaCorrection,nlevels, useSignedGradients)

#### Pre-process the images, compute their HOG vectors and label them

In order to compute a HOG feature vector, the input image should have 1:2 aspect ratio and be resized to 64x128. This is done in the pre-processing step.

In [None]:
hog_data = []
labels = []

for img in positive_image_paths:
    #pre-processing
    img = cv2.imread(img,0)
    img = cv2.resize(img,(64,128))
    
    #computing the HOG feature vector
    descriptor = hog.compute(img)
    temp = []
    for value in descriptor:
        temp.append(value[0])
    descriptor = temp
    hog_data.append(descriptor)
    #labelling
    labels.append(1)

for img in negative_image_paths:
    #pre-processing
    img = cv2.imread(img,0)
    img = cv2.resize(img,(64,128))
    #computing the HOG feature vector
    descriptor = hog.compute(img)
    temp = []
    for value in descriptor:
        temp.append(value[0])
    descriptor = temp
    hog_data.append(descriptor)
    #labelling
    labels.append(0)

hog_data = np.array(hog_data)
labels = np.array(labels)

## Training a Support Vector Machine classifier

Almost all machine learning libraries include SVM modules. I will firstly train an SVM classifier from my favourite machine learning library, Scikit learn. I will secondly train OpenCV's built-in SVM classifier that relies on the LIBSVM librarwith which one can build an actual HOG detector to be applied on test images. OpenCV's classifier relies the LIBSVM library.

#### Splitting the data into a test and training sets

In [None]:
X_train, X_test, y_train, y_test = train_test_split(hog_data, labels, test_size=0.2, random_state=42)

#### Training SciKit learn's SVM classifier

In [None]:
# Set up the SVM as SVC type with an RBF kernel
clf = sklearn_svm.SVC(kernel='rbf')
# Train the classifier with the training set
clf.fit(X_train, y_train)

#### Evaluating it's performance

In [None]:
clf.score(X_test, y_test)

#### Or using OpenCV's built in SVM classifier to build a HOG detector

In [None]:
# Set up SVM for OpenCV 3
svm = cv2.ml.SVM_create()
# Set SVM type
svm.setType(cv2.ml.SVM_C_SVC)
# Set SVM Kernel to Radial Basis Function (RBF) 
svm.setKernel(cv2.ml.SVM_RBF)

# Train SVM on training data  
svm.trainAuto(X_train, cv2.ml.ROW_SAMPLE, y_train)

svm.save("svm.xml")
tree = ET.parse('svm.xml')
root = tree.getroot()
# now this is really dirty, but after ~3h of fighting OpenCV its what happens :-)
SVs = root.getchildren()[0].getchildren()[-2].getchildren()[0] 
rho = float( root.getchildren()[0].getchildren()[-1].getchildren()[0].getchildren()[1].text )
svmvec = [float(x) for x in re.sub( '\s+', ' ', SVs.text ).strip().split(' ')]
svmvec.append(-rho)
hog.setSVMDetector( np.array(svmvec) )

#### Testing the HOG detector on a image

In [None]:
#select image path
test_image_path = glob.glob(os.getcwd() + "/Samples/test_image.bmp")[0]
img = cv2.imread(positive_image_paths[0],0)
boxes, weights = hog.detectMultiScale(img)

# Display the image
cv2.imshow("Detection result", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

#### Find the best fitting bounding boxes with non-maxima suppression

In [None]:
# overlapThresh is fairly high allowing enough overlap to identify closely positionned people
boxes = np.array([[x, y, x + w, y + h] for (x, y, w, h) in boxes])
best_fitting_boxes = non_max_suppression(boxes, probs=None, overlapThresh=0.65)

#### Display the results

In [None]:
# Draw the best bounding boxes on the image
for (xA, yA, xB, yB) in best_fitting_boxes:
    cv2.rectangle(img, (xA, yA), (xB, yB), (0, 255, 0), 2)

cv2.imshow("result", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

#### Count the people detected by counting the bounding boxes

In [None]:
People_Count = len(best_fitting_boxes)
People_Count

I have thus built a functional HOG + SVM people detector that reaches this projects objectives. This detector can identify and allows me to count people appearing in a given photograph. 

The implementation I have presented here is fairly simple and it's performances are very limited. OpenCV actually includes a far better trained SVM model for human detection based on Dalal and Trigg's work. 

### OpenCV's default person detector

#### Create a new HOG descriptor and configure the default SVM people detector

In [None]:
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())

#### Pre-process an image

In [None]:
#select image path
imagePath = glob.glob(os.getcwd() + '/test_image_2.bmp')[0]

image = cv2.imread(imagePath,0)
image = imutils.resize(image, width=min(400, image.shape[1]))

# Display the image
cv2.imshow("Detection result", image)
cv2.waitKey(0)
cv2.destroyAllWindows()

#### Use the HOG detector on the image

In [None]:
(boxes, weights) = hog.detectMultiScale(image, winStride=(4, 4), padding=(8, 8), scale=1.05)

#### Find the best fitting bounding boxes with non-maxima suppression

In [None]:
# overlapThresh is fairly high allowing enough overlap to identify closely positionned people
boxes = np.array([[x, y, x + w, y + h] for (x, y, w, h) in boxes])
best_fitting_boxes = non_max_suppression(boxes, probs=None, overlapThresh=0.65)

#### Display the results

In [None]:
# Draw the best bounding boxes on the image
for (xA, yA, xB, yB) in best_fitting_boxes:
    cv2.rectangle(image, (xA, yA), (xB, yB), (0, 255, 0), 2)
    
# Display resulting image
cv2.imshow("Detection result", image)
cv2.waitKey(0)
cv2.destroyAllWindows()

#### Count the people detected by counting the bounding boxes

In [None]:
People_Count = len(best_fitting_boxes)
People_Count

## Conclusion

Throughout this project I have explored traditional computer vision techniques to build a functional person detector implementing a HOG + SVM classifier. This implementation enables me to effectively detect and count people in a given image, fulfilling the objectives I had set for this project.






##### References

* _P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001, 2001_

* _Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05, 2005_

* _B. Bilgic, Fast Human Detection with Cascaded Ensembles, 2010_

##### Ressources

* _The [Learn OpenCV](https://www.learnopencv.com) website by Satya Mallick_
* _Machine Intelligence's [object detector guide](http://www.hackevolve.com/create-your-own-object-detector/)_
* _The [HOG person detector tutorial](http://mccormickml.com/2013/05/09/hog-person-detector-tutorial/) by Chris McCormick_