<a href="https://colab.research.google.com/github/cagBRT/computer-vision/blob/master/pedestrians.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!git clone -l -s https://github.com/opencv/opencv.git 
!pwd

In [None]:
!git clone -l -s https://github.com/cagBRT/computer-vision.git cloned-repo
%cd cloned-repo
!ls

In [None]:
!pip install --upgrade imutils

In [None]:
# import the necessary packages
from __future__ import print_function
import numpy as np
import cv2
from google.colab.patches import cv2_imshow
from imutils import paths
from imutils.object_detection import non_max_suppression

initialize the pedestrian detector. First, we make a call to hog = cv2.HOGDescriptor()  which initializes the Histogram of Oriented Gradients descriptor. Then, we call the setSVMDetector  to set the Support Vector Machine to be pre-trained pedestrian detector, loaded via the cv2.HOGDescriptor_getDefaultPeopleDetector()  function.

In [None]:
# initialize the HOG descriptor/person detector
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())

In [None]:
image = cv2.imread("images/family.jpg")
cv2_imshow(image)

In [None]:
imagePath = "images/family.jpg"
image = cv2.imread(imagePath)

**loading our image off disk and resizing it to have a maximum width of 400 pixels**. The reason we attempt to reduce our image dimensions is two-fold:

1. **Reducing image size ensures that less sliding windows in the image pyramid need to be evaluated** (i.e., have HOG features extracted from and then passed on to the Linear SVM), thus reducing detection time (and increasing overall detection throughput).<br>
2. **Resizing our image also improves the overall accuracy of our pedestrian detection** (i.e., less false-positives).

In [None]:
#Resize
scale_percent = 80 # percent of original size
width = int(image.shape[1] * scale_percent / 100)
height = int(image.shape[0] * scale_percent / 100)
dim = (width, height)
orig = image.copy()
# resize image
image = cv2_imshow(image)

The image should be no more than 400 pixels wide. 
See the problems caused with this large of an image. 

The winStride  parameter is a 2-tuple that dictates the “step size” in both the x and y location of the sliding window.

Both **winStride  and scale  are extremely important parameters that need to be set properly**. These parameter have tremendous implications on not only the accuracy of your detector, but also the speed in which your detector runs.

The smaller winStride  is, the more windows need to be evaluated

In the context of object detection, a sliding window is a rectangular region of fixed width and height that “slides” across an image

At each stop of the sliding window (and for each level of the image pyramid, discussed in the scale  section below), we (1) extract HOG features and (2) pass these features on to our Linear SVM for classification. The process of feature extraction and classifier decision is an expensive one, so we would prefer to evaluate as few windows as possible if our intention is to run our Python script in near real-time.

The padding  parameter is a tuple which indicates the number of pixels in both the x and y direction in which the sliding window ROI is “padded” prior to HOG feature extraction.

As suggested by Dalal and Triggs in their 2005 CVPR paper, Histogram of Oriented Gradients for Human Detection, adding a bit of padding surrounding the image ROI prior to HOG feature extraction and classification can actually increase the accuracy of your detector.

Typical values for padding include (8, 8), (16, 16), (24, 24), and (32, 32).

This scale  parameter controls the factor in which our image is resized at each layer of the image pyramid, ultimately influencing the number of levels in the image pyramid.

A smaller scale  will increase the number of layers in the image pyramid and increase the amount of time it takes to process your image:

a larger scale will decrease the number of layers in the pyramid as well as decrease the amount of time it takes to detect objects in an image

useMeanShiftGrouping  parameter is a boolean indicating whether or not mean-shift grouping should be performed to handle potential overlapping bounding boxes. This value defaults to False  and in my opinion, should never be set to True  — use non-maxima suppression instead; you’ll get much better results.

To suppress these multiple bounding boxes, Dalal suggested using mean shift (Slide 18). However, in my experience mean shift performs sub-optimally and should not be used as a method of bounding box suppression

utilize non-maxima suppression (NMS). Not only is NMS faster, but it obtains much more accurate final detections

In [None]:
# detect people in the image
(rects, weights) = hog.detectMultiScale(image, 
                                        winStride=(4, 4),
                                        padding=(8, 8), 
                                        scale=1.01)

	# draw the original bounding boxes
for (x, y, w, h) in rects:
		cv2.rectangle(orig, (x, y), (x + w, y + h), (0, 0, 255), 2)
	# apply non-maxima suppression to the bounding boxes using a
	# fairly large overlap threshold to try to maintain overlapping
	# boxes that are still people
rects = np.array([[x, y, x + w, y + h] for (x, y, w, h) in rects])
pick = non_max_suppression(rects, probs=None, overlapThresh=0.65)
	# draw the final bounding boxes
for (xA, yA, xB, yB) in pick:
	cv2.rectangle(image, (xA, yA), (xB, yB), (0, 255, 0), 2)
	# show some information on the number of bounding boxes
filename = imagePath[imagePath.rfind("/") + 1:]
print("[INFO] {}: {} original boxes, {} after suppression".format(
		filename, len(rects), len(pick)))
	# show the output images
cv2_imshow(orig)
cv2_imshow( image)

Tips on speeding up the object detection process
Whether you’re batch processing a dataset of images or looking to get your HOG detector to run in real-time (or as close to real-time as feasible), these three tips should help you milk as much performance out of your detector as possible:

Resize your image or frame to be as small as possible without sacrificing detection accuracy. Prior to calling the detectMultiScale  function, reduce the width and height of your image. The smaller your image is, the less data there is to process, and thus the detector will run faster.
Tune your scale  and winStride  parameters. These two arguments have a tremendous impact on your object detector speed. Both scale  and winStride  should be as large as possible, again, without sacrificing detector accuracy.
If your detector still is not fast enough…you might want to look into re-implementing your program in C/C++. Python is great and you can do a lot with it. But sometimes you need the compiled binary speed of C or C++ — this is especially true for resource constrained environments.