# EA979 Final Project

## Pedestrian Recognition: A HOG algorithm improved by frame dropping algorithm

### Authors:
André Barros de Medeiros

Luca

Francesco

Inspired by Andrew Rosebrock @ https://www.pyimagesearch.com/2015/11/09/pedestrian-detection-opencv/

  OpenCV ships with a pre-trained HOG + Linear SVM model based, on Dalal and Triggs method to automatically detect pedestrians in images, that can be used to perform pedestrian detection in both images and video streams. Below we give an outline of the two:

## Dependencies: 
    - OpenCV
    - Numpy
    - argparse
    - imutils (for imutils in Anaconda: conda install -c conda-forge imutils)
    - scipy.stats
    - time
  
## Histogram of Oriented Gradients (HOG)

While OpenCV's Cascade Classifiers are fast, they leave much to desire. That's where HOG comes in. But if you want to check out cascade classifiers, a java application for face recognition can be found [here](https://github.com/andre91998/JavaBasics/tree/master/FaceDetection).

### HOG algorithm:

<ul><li><strong>Step 1:</strong> Sample P positive samples from your training data of the object(s) you want to detect and extract HOG descriptors from these samples. </ul></li>

<ul><li><strong>Step 2:</strong> Sample N negative samples from a negative training set that does not contain any of the objects you want to detect and extract HOG descriptors from these samples as well. In practice N >> P.</ul></li>

<ul><li><strong>Step 3:</strong> Train a <strong>Linear Support Vector Machine</strong> on your positive and negative samples.</ul></li>

<ul><li><strong>Step 4:</strong> Apply <em>hard-negative mining</em>. For each image and each possible scale of each image in your negative training set, apply the sliding window technique and slide your window across the image. At each window compute your HOG descriptors and apply your classifier. If your classifier (incorrectly) classifies a given window as an object (and it will, there will absolutely be false-positives), record the feature vector associated with the false-positive patch along with the probability of the classification. </ul></li>

<ul><li><strong>Step 5:</strong> Take the false-positive samples found during the hard-negative mining stage, sort them by their confidence (i.e. probability) and re-train your classifier using these hard-negative samples. (Note: You can iteratively apply steps 4-5, but in practice one stage of hard-negative mining usually [not not always] tends to be enough. The gains in accuracy on subsequent runs of hard-negative mining tend to be minimal.)</ul></li>

<ul><li><strong>Step 6: </strong>Your classifier is now trained and can be applied to your test dataset.</ul></li>

## Linear Support Vector Machines (SVM)

SVMs are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. More formally, a support-vector machine constructs a hyperplane or set of hyperplanes in a high- or infinite-dimensional space, which can be used for classification, regression, or other tasks like outliers detection.

In the **linear** case (ours), the goal is to find the *"maximum-margin hyperplane"* that divides the data points into their correct classes, which means we want the distance between the closest data point of each class to be maximized.

## Our Idea:

In this project, we use Pearson's Correlation Coeficient to evaluate how similar each frame is to the one before. This allowed us to establish criteria for dropping frames similiar to the previous, drastically lessening the amount of processing power needed (which in turn increased the speed of the algorithm).

With PCC, each frame was attributed a value (coeficient). To decide wether or not to discard each frame, we compared the calculated value with a threshold value. If the coeficient was less than the threshold, than we re-applied the recognition algorithm, if not, then we just repeated the previously calculated identification rectangles (drawn around the identified people) on the new frame.

On top of that, instead of setting a fixed threshold for every image, we decided to implement a threshold that would change through time. By establishing this dynamic threshold (through the updated mean of the PC coeficients), we further reduced the amount of frames to be processed, by dropping them when they are similar enough to the one before, maintaining a non-deterministic approach (and increasing accuracy and speed).

In [3]:
# -*- coding: utf-8 -*-
"""
@author: Andre Barros de Medeiros
@Date:29/30/2020
@Copyright: Free to use, copy and modify
"""

# import the necessary packages
from __future__ import print_function
from collections import deque
from imutils.object_detection import non_max_suppression
from imutils.video import VideoStream
from imutils import paths
from scipy.stats import pearsonr
import numpy as np
import argparse
import imutils
import cv2
import time


#initialize frame counter
counter = 0
pts = deque(maxlen=32)

coef = (0,0)

vs = cv2.VideoCapture("/Users/Andre/Documents/GitHub/pedestrian_recognition/method4/pedestrian.mp4")

# allow the camera or video file to warm up
time.sleep(2.0)

# initialize the HOG descriptor/person detector
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())

old_frame = None
# keep looping
while True:

    # grab the current frame
    frame = vs.read()
    
    # handle the frame from VideoCapture or VideoStream
    frame = frame[1]
    
    # if we are viewing a video and we did not grab a frame,
    # then we have reached the end of the video
    if frame is None:
        break
    
    if old_frame is None:
        
        # resize image it to (1) reduce detection time and (2) improve detection accuracy
        frame = imutils.resize(frame, width=min(400, frame.shape[1]))
        orig = frame.copy()
        old_frame = orig.flatten() #update old frame

        # detect people in the image
        (rects, weights) = hog.detectMultiScale(frame, winStride=(7,7), 
                                                padding=(4,4), scale=1.3)
        
        # draw the original bounding boxes
        for (x, y, w, h) in rects: 
            cv2.rectangle(orig, (x, y), (x + w, y + h), (0, 0, 255), 2)
        
        # apply non-maxima suppression to the bounding boxes using a
        # fairly large overlap threshold to try to maintain overlapping
        # boxes that are still people
        rects = np.array([[x, y, x + w, y + h] for (x, y, w, h) in rects])
        pick = non_max_suppression(rects, probs=None, overlapThresh=0.65)
        
        # draw the final bounding boxes
        for (xA, yA, xB, yB) in pick:
        		cv2.rectangle(frame, (xA, yA), (xB, yB), (0, 255, 0), 2)
        
        # show the frame
        #cv2.imshow("Before NMS", orig)
        cv2.imshow("After NMS", frame)
        
        # increment counter
        counter += 1
        
        # if the 'q' key is pressed, stop the loop
        key = cv2.waitKey(1) & 0xFF 
        # (& 0xFF) keeps last 8 bits of  waitKey output
        if key == ord("q"): break

    if old_frame is not None:
        
        # resize image it to (1) reduce detection time and (2) improve detection accuracy
        frame = imutils.resize(frame, width=min(400, frame.shape[1]))
        
        #flatten current frame for running the Pearson's Correlation 
        flat_frame = frame.flatten()
        #calculate the pearson's correlation coeficient
        coef = pearsonr(flat_frame, old_frame)
        
        #if on second frame, create threshold_arr for holding the PCCs
        if (counter == 1): 
            threshold_arr = np.array(coef[0])
            
        #if on any other frame, append to the array
        else:
            threshold_arr = np.append(threshold_arr, coef[0])
            
        #dynamical threshold calcuation: mean of all previous PCCs    
        threshold = np.mean(threshold_arr)
        print(coef,threshold)
        
        #if PCC below the threshold, re-classify
        if (((coef[0] < threshold)and(coef[0]>0))or((coef[0]>-1*threshold)and(coef[0]<0))):
            orig = frame.copy()
            old_frame = orig.flatten() #update old_frame
            
            # detect people in the image
            (rects, weights) = hog.detectMultiScale(frame, winStride=(4,4), 
                                                    padding=(8,8), scale=1.05)
            
            # draw the original bounding boxes
            for (x, y, w, h) in rects: 
                cv2.rectangle(orig, (x, y), (x + w, y + h), (0, 0, 255), 2)
            
            # apply non-maxima suppression to the bounding boxes using a
            # fairly large overlap threshold to try to maintain overlapping
            # boxes that are still people
            rects = np.array([[x, y, x + w, y + h] for (x, y, w, h) in rects])
            pick = non_max_suppression(rects, probs=None, overlapThresh=0.65)
            
            # draw the final bounding boxes
            for (xA, yA, xB, yB) in pick:
            		cv2.rectangle(frame, (xA, yA), (xB, yB), (0, 255, 0), 2)
            
            # show the frame
            #cv2.imshow("Before NMS", orig)
            cv2.imshow("After NMS", frame)
            
            # increment counter and update last frame
            counter += 1
            
            # if the 'q' key is pressed, stop the loop
            key = cv2.waitKey(1) & 0xFF 
            # (& 0xFF) keeps last 8 bits of  waitKey output
            if key == ord("q"): break
        
        # if PCC is above threshold, update frame, but keep same rectangles
        else:
            # draw the same rectangle as before on the current frame (which wasn't proccessed by HoG)
            for (xA, yA, xB, yB) in pick:
                cv2.rectangle(frame, (xA, yA), (xB, yB), (0, 255, 0), 2)
            
            cv2.imshow("After NMS", frame)
            counter += 1
            
            # if the 'q' key is pressed, stop the loop
            key = cv2.waitKey(1) & 0xFF 
            # (& 0xFF) keeps last 8 bits of  waitKey output
            if key == ord("q"): break
            
        
# if we are not using a video file, stop the camera video stream
if not args.get("video", False): vs.stop()

# otherwise, release the camera
else: vs.release()

# close all windows
cv2.destroyAllWindows()

(0.9999999999999963, 0.0) 0.9999999999999963
(0.9999908519433329, 0.0) 0.9999954259716646
(0.9999649767325043, 0.0) 0.9999852762252779
(0.999946229554237, 0.0) 0.9999755145575177
(0.9999389095460005, 0.0) 0.9999681935552143
(0.9999472285694058, 0.0) 0.9999646993909129
(0.9999077154721001, 0.0) 0.9999565588310825
(0.9999043878076271, 0.0) 0.9999500374531505
(0.9998605896414907, 0.0) 0.9999400988074105
(0.999928229275057, 0.0) 0.9999389118541752
(0.9999404916163674, 0.0) 0.9999390554689199
(0.9998962346052922, 0.0) 0.9999354870636177
(0.9999429134036737, 0.0) 0.999936058320545
(0.9998884001267396, 0.0) 0.9999326541638446
(0.9999547526641501, 0.0) 0.9999341273971983
(0.9996967049875547, 0.0) 0.9999192884965957
(0.9998815057422822, 0.0) 0.9999170659816362
(0.9999326856335535, 0.0) 0.999917933740076
(0.9998498229584998, 0.0) 0.9999143489620983
(0.9999472314471822, 0.0) 0.9999159930863526
(0.9998942298662151, 0.0) 0.9999149567425365
(0.9999532857647208, 0.0) 0.9999166989708176
(0.99991000713

(0.9957863376831183, 0.0) 0.9934808684612506
(0.9891472749265632, 0.0) 0.9934576941642738
(0.9958886742753053, 0.0) 0.9934706249095452
(0.9850172650416638, 0.0) 0.9934258981377574
(0.99559774392978, 0.0) 0.993437328905084
(0.9894295108375869, 0.0) 0.9934163455644164
(0.9961750293817874, 0.0) 0.9934307137092985
(0.9850948485550705, 0.0) 0.9933875226981367
(0.9961637484554497, 0.0) 0.9934018331401847
(0.9894580851430389, 0.0) 0.9933816087914814
(0.995543648472234, 0.0) 0.9933926396061792
(0.9831848751177972, 0.0) 0.9933408235427863
(0.9960559400277709, 0.0) 0.9933545362523064
(0.989622391764895, 0.0) 0.9933357817573948
(0.9954434660869065, 0.0) 0.9933463201790425
(0.9830189799606693, 0.0) 0.9932949403769611
(0.9952156205605871, 0.0) 0.9933044486947017
(0.987766877882392, 0.0) 0.9932771700207494
(0.995033330892765, 0.0) 0.993285778652475
(0.980582759766387, 0.0) 0.9932238127066891
(0.9959876651326011, 0.0) 0.9932372294660383
(0.9887395104058667, 0.0) 0.9932155013546364
(0.996092397219657,

(0.9685453819400074, 0.0) 0.9815398733190382
(0.9689734895128119, 0.0) 0.9815060016645739
(0.9704988786439896, 0.0) 0.9814764126241959
(0.9385084195007343, 0.0) 0.981361216932176
(0.9684478792061159, 0.0) 0.9813266892911973
(0.9669434405016822, 0.0) 0.9812883339610919
(0.9668671174998142, 0.0) 0.9812499796619928
(0.9338585774980053, 0.0) 0.9811242730249531
(0.9712877521038775, 0.0) 0.981098250482834
(0.9705201719676296, 0.0) 0.9810703399854321
(0.968998085791609, 0.0) 0.9810385708954485
(0.9375504335366486, 0.0) 0.9809244288026433
(0.9651128365852971, 0.0) 0.98088303719998
(0.9643429617496049, 0.0) 0.9808398516243916
(0.9652457418516757, 0.0) 0.9807992419635251
(0.9284381047303006, 0.0) 0.9806632390096727
(0.9607070122048964, 0.0) 0.9806115389402301
(0.9596429208290953, 0.0) 0.9805573564644907
(0.958497079619915, 0.0) 0.9805005000808708
(0.9221583413757488, 0.0) 0.980350520238441
(0.9623230789003909, 0.0) 0.9803042960298821
(0.9652750290355755, 0.0) 0.9802658580068787
(0.96564455006241

(0.9887741193161518, 0.0) 0.979681114815398
(0.9774154149098108, 0.0) 0.9796770324732258
(0.9875582417959594, 0.0) 0.9796912073101374
(0.9656303421838681, 0.0) 0.97966596338711
(0.9870179314976323, 0.0) 0.9796791389572004
(0.9704229760710694, 0.0) 0.9796625805262771
(0.9677162928013862, 0.0) 0.9796412478696255
(0.8872665852719135, 0.0) 0.9794765871519825
(0.9482453001450561, 0.0) 0.9794210154669167
(0.9484138268650141, 0.0) 0.9793659405315672
(0.9546761161312596, 0.0) 0.9793221642471693


NameError: name 'args' is not defined