# Object Detection  

This example program develops a HOG based object detector for things like faces, pedestrians, and any other semi-rigid object.  In particular, we go though the steps to train the kind of sliding window object detector first published by Dalal and Triggs in 2005 in the  paper Histograms of Oriented Gradients for Human Detection.

It is similar to the method implemented in dlib (more optimized). However, this technique allows more control of the parameters.



## Create the XML files
DLIB requires images and bounding boxes around the labelled object. It has its own strructure for the XML files:

<?xml version="1.0" encoding="UTF-8"?>
<dataset>
    <name>dataset containing bounding box labels on images</name>
    <comment>created by BBTag</comment>
    <tags>
        <tag name="RunBib" color="#032585"/>
    </tags>
    <images>
        <image file="B:/DataSets/2016_USATF_Sprint_TrainingDataset/_hsp3997.jpg">
            <box top="945" left="887" width="85" height="53">
                <label>RunBib</label>
            </box>
            <box top="971" left="43" width="103" height="56">
                <label>RunBib</label>
            </box>
            <box top="919" left="533" width="100" height="56">
                <label>RunBib</label>
            </box>
        </image>
        <image file="B:/DataSets/2016_USATF_Sprint_TrainingDataset/_hsp3989.jpg">
            <box top="878" left="513" width="111" height="62">
                <label>my_label</label>
            </box>
        </image>     
   </images>
</dataset>
top: Top left y value
height: Height (positive down)
left: Top left x value
width: Width (positive to the right)

To create your own XML files you can use the imglab tool which can be found in the tools/imglab folder.  It is a simple graphical tool for labeling objects in images with boxes.  To see how to use it read the tools/imglab/README.txt file.  But for this example, we just use the training.xml file included with dlib.

Its a two part process to load the tagger.
1.) typing the following command:
#####    b:\HoodMachineLearning\dlib\tools\build\Release\imglab.exe -c mydataset.xml B:\HoodMachineLearning\datasets\MyImage
2.) 
####     b:\HoodMachineLearning\dlib\tools\build\Release\imglab.exe -c mydataset.xml

## Image pyramids and sliding windows

The technique uses image pyramids and sliding windows to minimize the effect of object location and object size. The pyramid is a set of subsample images and the sliding window remains the same and moves from left to right and top to bottom of each scale of the image.

### Image Pyramids
<img src="ImagePyramid.jpg">

Note: Dalai and Triggs showed that performance is reduced if you apply gaussian smoothing at each layer==> ski this stip

### Sliding Window

<img src="sliding_window_example.gif" loop=3>

* It is common to use a stepSize of 4 to 8 pixels
* windowSize is the size of the Kernal. An object detector will work best if the aspect ratio of the kernal is close to that of the desired object. Note: The sliding window size is also important for the HOG filter. For the HOG filter two parameters are important: <b>pixels_per_cell</b> and <b>cells_per_block </b>

In order to avoid having to 'guess' at the best window size that will satisfy object detector requirements and HOG requirments, a "explore_dims.py" method is used.

1.) Meet object detection requirments: loads all the images and computes the average width, average height, and computes the aspect ratio from those values.
2.) Meet HOG requirments: Pyimage rule of thumb is to divide the above values by two (ie, 1/4th the average size)
    * This reduces the size of the HOG feature vector
    * By dividing by two, a nice balance is struck between HOG feature vector size and reasonable window size.
    * Note: Our sliding_window dimension needs to be divisible by pixels_per_cell and cells_per_block so that the HOG descriptor will 'fit' into the window size
    * Its common for 'pixels_per_cell' to be a multiple of 4 and cells_per_block in the set (1,2,3)
    * Start with pixels_per_cell=(4,4) and cells_per_block=(2,2)
    * For example, in the Pyimage example, average W: 184 and average H:62. Divide by 2 ==> 92,31
    * Find values close to 92,31 that are divisible by 4 (and 2): 96,32  (Easy)
    * OBSERVATION:  When defining the binding boxes, it is best if all are around the same size. This can be difficult.  

### The 6 Step Framework
1. Sample P positive samples for your training data of the objects you want to detect. Extract HOG features from these objects.
    * If given an a general image containing the object, bounding boxes will also need to be given that indicate the location of the image
2. Sample N negative samples that do not contain the object and extract HOG features. In general N>>P  (I'd suggest images similar in size and aspect ratio to the P samples. I'd also avoid the bounding boxes and make the entire image the negative image. Pyimagesearch recommends using the 13 Natural Scene Category of the vision.stanford.edu/resources_links.html page
3. Train a Linear Support Vector Machine (SVM) on the negative images (class 0) and positive image (class 1)
4. Hard Negative Mining - for the N negative images, apply Sliding window and test the classifier. Ideally, they should all return 0. If they return a 1 indicating an incorrect classification, add it to the training set (for the next round of re-training)
5. Re-train classifier using with the added images from Hard Negative Mining (Usually once is enough)
6. Apply against test dataset, define a box around regions of high probability, when finished with the image, find the boxed region with the highest probability using "non-maximum suppression" to removed redundant and overlapping bounding boxes and make that the final box.

#### Note on DLIB library
* Similar to the 6 step framework but uses the entire training image to get the P's (indicated by bounding boxes) and the N's (not containing bounding boxes).  Note: It looks like it is important that all of the objects are identified in the image. For example, when doing running bibs, I may ignore some bibs for some reasons (too small, partially blocked, too many). My guess is that these images should just simply be avoided. This technique eliminates steps 2, 4, and 5.
* non-maximum supression is applied during the trainig phase helping to reduce false positives
* dlib using a highly accurate SVM engine used to find the hyperplane separating the TWO classes.


#### Use a JSON file to hold the hyper-parameters
{

"faces_folder": "B:\\DataSets\\2016_USATF_Sprint_TrainingDataset"
"myTrainingFilename": "trainingset_small.xml"
"myTestingFilename: "trainingset_small.xml"
"myDetector": "detector.svm"
}

#### Load and Dump hdf5 file
* hdf5 provides efficient data storage


In [245]:
from skimage import feature

class HOG:
	def __init__(self, orientations=12, pixelsPerCell=(4, 4), cellsPerBlock=(2, 2), normalize=True):
		# store the number of orientations, pixels per cell, cells per block, and
		# whether normalization should be applied to the image
		self.orientations = orientations
		self.pixelsPerCell = pixelsPerCell
		self.cellsPerBlock = cellsPerBlock
		self.normalize = normalize

	def describe(self, image):
		# compute Histogram of Oriented Gradients features
		hist = feature.hog(image, orientations=self.orientations, pixels_per_cell=self.pixelsPerCell,cells_per_block=self.cellsPerBlock, transform_sqrt=self.normalize)
		hist[hist < 0] = 0

		# return the histogram
		return hist

	def describe_and_return_HOGImage(self, image):
		# compute Histogram of Oriented Gradients features
		(hist,hogImage) = feature.hog(image, orientations=self.orientations, pixels_per_cell=self.pixelsPerCell,cells_per_block=self.cellsPerBlock, transform_sqrt=self.normalize, visualise=True)
		hist[hist < 0] = 0

		# return the histogram
		return hist,hogImage


In [268]:
# import the necessary packages
import numpy as np
import h5py

def dump_dataset(data, labels, path, datasetName, writeMethod="w"):
    # open the database, create the dataset, write the data and labels to dataset,
    # and then close the database
    db = h5py.File(path, writeMethod)
    dataset = db.create_dataset(datasetName, (len(data), len(data[0]) + 1), dtype="float")
    dataset[0:len(data)] = np.c_[labels, data]
    db.close()
    print("Finished Dumping Data")

def load_dataset(path, datasetName):
    # open the database, grab the labels and data, then close the dataset
    db = h5py.File(path, "r")
    (labels, data) = (db[datasetName][:, 0], db[datasetName][:, 1:])
    db.close()

    # return a tuple of the data and labels
    return (data, labels)

In [247]:
# import the necessary packages
# conda install -c anaconda simplejson
#import commentjson as json
import simplejson as json
class Conf:
	def __init__(self, confPath):
		# load and store the configuration and update the object's dictionary
		conf = json.loads(open(confPath).read())
		self.__dict__.update(conf)

	def __getitem__(self, k):
		# return the value associated with the supplied key
		return self.__dict__.get(k, None)

In [248]:
#import imutils
#import cv2

def crop_ct101_bb(image, bb, padding=10, dstSize=(32, 32)):
	# unpack the bounding box, extract the ROI from the image, while taking into account
	# the supplied offset
	(y, h, x, w) = bb # Looks like this is y1,y2,x1,x2
	#print("y,h,x,w ={} {} {} {}".format(y,h,x,w))
	(x, y) = (max(x - padding, 0), max(y - padding, 0))
	roi = image[y:h + padding, x:w + padding]
	#print("ROI: {}".format(roi))
	# resize the ROI to the desired destination size
	# It is important to resize the roi in order to keep the final feature vector the same size	
	roi = cv2.resize(roi, dstSize, interpolation=cv2.INTER_AREA)

	# return the ROI
	return roi

def pyramid(image, scale=1.5, minSize=(30, 30)):
	# yield the original image
	yield image

	# keep looping over the pyramid
	while True:
		# compute the new dimensions of the image and resize it
		w = int(image.shape[1] / scale)
		image = imutils.resize(image, width=w)

		# if the resized image does not meet the supplied minimum
		# size, then stop constructing the pyramid
		if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]:
			break

		# yield the next image in the pyramid
		yield image

def sliding_window(image, stepSize, windowSize):
	# slide a window across the image
	for y in xrange(0, image.shape[0], stepSize):
		for x in xrange(0, image.shape[1], stepSize):
			# yield the current window
			#print("X: {}".format(x))
			#print("Y: {}".format(y))
			#print("Window Shape Check: {}".format(image.shape[:2]))
			yield (x, y, image[y:y + windowSize[1], x:x + windowSize[0]])

In [249]:
#conda install -c anaconda progressbar
#from dlib import progressbar

In [250]:
from __future__ import print_function
from sklearn.feature_extraction.image import extract_patches_2d
#from pyimagesearch.object_detection import helpers
#from pyimagesearch.descriptors import HOG
#from pyimagesearch.utils import dataset
#from pyimagesearch.utils import Conf
##from imutils import paths
##from imutils import resize
#from scipy import io
import numpy as np
##import progressbar
import argparse
import random
import cv2
import os
#import import_training_images_function2 as imp # Used to either import via Matlab file (CalTech) or XML (Scikit-learn)
from skimage import exposure

In [251]:
from scipy import io
from lxml import etree
#from imutils import paths
##import progressbar
#import cv2
#from pyimagesearch.object_detection import helpers

def import_with_Matlab(conf,hog,SW):
    data = []   
    labels = []
    # grab the set of ground-truth images and select a percentage of them for training
    #trnPaths = list(paths.list_images(conf["image_dataset"]))
    trnPaths = list(os.listdir(conf["image_dataset"]))
    trnPaths = random.sample(trnPaths, int(len(trnPaths) * conf["percent_gt_images"]))
    print("[INFO] describing training ROIs...")
    # setup the progress bar
    #widgets = ["Extracting: ", progressbar.Percentage(), " ", progressbar.Bar(), " ", progressbar.ETA()]
    #pbar = progressbar.ProgressBar(maxval=len(trnPaths), widgets=widgets).start()
    # loop over the training paths
    for (i, trnPath) in enumerate(trnPaths):
        # load the image, convert it to grayscale, and extract the image ID from the path
        image = cv2.imread(trnPath)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        imageID = trnPath[trnPath.rfind("_") + 1:].replace(".jpg", "")
        
        # load the annotation file associated with the image and extract the bounding box
        p = "{}/annotation_{}.mat".format(conf["image_annotations"], imageID)
        bb = io.loadmat(p)["box_coord"][0] #(y,h,x,w)
        # The next line crops the image to only the object. Because of this, no scanning is required
        # and the image size can simply be set to the scanning size (plus offset) so that only one scan is needed
        #roi = crop_ct101_bb(image, bb, padding=conf["offset_padding"], dstSize=tuple(conf["window_dim"]))
        roi = crop_ct101_bb(image, bb, padding=conf["offset_padding"], dstSize=SW)
        # define the list of ROIs that will be described, based on whether or not the
        # horizontal flip of the image should be used
        rois = (roi, cv2.flip(roi, 1)) if conf["use_flip"] else (roi,)
        
        # loop over the ROIs
        for roi in rois:
        	# extract features from the ROI and update the list of features and labels
        	features = hog.describe(roi)
        	data.append(features)
        	labels.append(1)
    
        # update the progress bar
        #	pbar.update(i)
    return  data,labels



def import_from_XML(conf,hog,SW):
    data = []   
    labels = []
    ##print("Importing: {}".format(conf["image_dataset_XML"])) 
    doc = etree.parse(conf["image_dataset_XML"])
    MyXML=doc.find('images')
    ## widgets = ["Extracting: ", progressbar.Percentage(), " ", progressbar.Bar(), " ", progressbar.ETA()]
    ## pbar = progressbar.ProgressBar(maxval=len(doc.xpath(".//*")), widgets=widgets).start()
    # loop over the training paths
    #for (i, info) in enumerate(MyXML):
    i = 0
    for info in MyXML:
        # load the image, convert it to grayscale, and extract the image ID from the path
        imagename=conf["image_dataset"] + "\\" + info.get('file')
        #print("Working on file: {}".format(imagename))
        image = cv2.imread(imagename)
        #cv2.imshow("My Image",image)
        #cv2.waitKey(0)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        y=int(info[0].get('top'))
        x=int(info[0].get('left'))
        w=int(info[0].get('width'))
        h=int(info[0].get('height')) 
        bb =[int(y),int(y)+int(h),int(x),int(x)+int(w)]  # [ y h x w] % (Look into h may be top and top/y may actually be h
        #print("bb: {}".format(bb))
        #newimage=image[y:y+h,x:x+w]
        #roi = crop_ct101_bb(image, bb, padding=conf["offset"], dstSize=tuple(conf["window_dim"]))
        # The next line crops the image to only the object. Because of this, no scanning is required
        # and the image size can simply be set to the scanning size (plus offset) so that only one scan is needed
        #roi = crop_ct101_bb(image, bb, padding=conf["offset_padding"], dstSize=tuple(conf["image_resized"]))
        roi = crop_ct101_bb(image, bb, padding=conf["offset_padding"], dstSize=SW)
        ##print("The image size is {}.".format(roi.shape))
        ##cv2.imshow(imagename,roi)
        ##cv2.waitKey(0)         
        # define the list of ROIs that will be described, based on whether or not the
        # horizontal flip of the image should be used
        if conf["use_flip"]:
            rois = (roi, cv2.flip(roi, 1))
        else:
            rois = (roi,)
                        
        # loop over the ROIs
        for roi in rois:
        	# extract features from the ROI and update the list of features and labels
        	features = hog.describe(roi)
        	data.append(features)
        	labels.append(1)
        
        	# update the progress bar
        	#pbar.update(i)
        i=i+1
    return  data,labels

In [252]:
widths = []
heights = []
print(conf["image_annotations"])
p=glob.glob(conf["image_annotations"] + "/*.mat")
print(p)
# loop over all annotations paths
for p in glob.glob(conf["image_annotations"] + "/*.mat"):
	# load the bounding box associated with the path and update the width and height
	# lists
    (y, h, x, w) = io.loadmat(p)["box_coord"][0]
    widths.append(w - x)
    heights.append(h - y)
    print(y)

# compute the average of both the width and height lists
(avgWidth, avgHeight) = (np.mean(widths), np.mean(heights))
print("[INFO] avg. width: {:.2f}".format(avgWidth))
print("[INFO] avg. height: {:.2f}".format(avgHeight))
print("[INFO] aspect ratio: {:.2f}".format(avgWidth / avgHeight))

M:\DataSets\SprintPhotos_Small
[]
[INFO] avg. width: nan
[INFO] avg. height: nan
[INFO] aspect ratio: nan


  ret = ret.dtype.type(ret / rcount)


In [253]:
import math
def GetAvgDimensions(conf):
    doc = etree.parse(conf["image_dataset_XML"])
    MyXML=doc.find('images')
    widths = []
    heights = []
    i = 0
    for info in MyXML:
        imagename=conf["image_dataset"] + "\\" + info.get('file')
        image = cv2.imread(imagename)
        #print("Reading {}".format(imagename))
        widths.append(int(info[0].get('width')))
        heights.append(int(info[0].get('height')) )

    (avgWidth, avgHeight) = (np.mean(widths), np.mean(heights))
    (stdW,stdH)=(np.std(widths),np.std(heights))
    #print("The length of widths is {}".format(len(widths)))
    newW=math.ceil(int(avgWidth/2)/4)*4
    newH=math.ceil(int(avgHeight/2)/4)*4
    print("[INFO] avg. width: {:.2f} +/- {:.2f}".format(avgWidth,stdW))
    print("[INFO] avg. height: {:.2f} +/- {:.2f}".format(avgHeight,stdH))
    print("[INFO] aspect ratio: {:.2f}".format(avgWidth / avgHeight))
    print("[INFO] The recommended Sliding Window Size is W:{}  H:{}".format(newW,newH))
    print("[INFO] Sliding Window Aspect Ratio {:.2f}".format(newW / newH))
    return tuple([newW,newH])

In [254]:
myJSONFile = os.getcwd() + "\\conf\\TrackBibs.json"
conf = Conf(myJSONFile)

In [255]:
SW=GetAvgDimensions(conf)

[INFO] avg. width: 304.14 +/- 106.22
[INFO] avg. height: 247.05 +/- 91.45
[INFO] aspect ratio: 1.23
[INFO] The recommended Sliding Window Size is W:152  H:124
[INFO] Sliding Window Aspect Ratio 1.23


In [256]:
# initialize the HOG descriptor along with the list of data and labels
hog = HOG(orientations=conf["orientations"], pixelsPerCell=tuple(conf["pixels_per_cell"]), 
          cellsPerBlock=tuple(conf["cells_per_block"]), normalize=conf["normalize"])

## Begin Training

In [264]:
print("[INFO] Begin Training...")
print("Image XML File:")
print(conf["image_dataset_XML"])
print("Image files are located in:")
print(conf["image_dataset"])
# Open dataset images and extract features for different scales. The extension will
# determine how the bounding box information is presented (.XML (as used by DLIB), or .Mat (as used by CalTech))
tmp=os.path.splitext(conf["image_dataset_XML"]) #TODO  Need to come back to
if tmp[1]==".xml":
    print("Using XML Format")
# Run ExtractImageInfoFromXML_Hood.py
    # Note: The sliding window size is calculated from the images. The value in the json file is bypassed
    data,labels=import_from_XML(conf,hog,SW)
else:
# Run ExtractImageInfoFromMatlab_Hood.py
    print("Using Matlab Format")
    data,labels=import_with_Matlab(conf,hog,SW)
lenPositiveFeatures=len(data)
print("Finished")    
print("There are {} feature vectors and each vector contains {} elements for a total of {} elements.".format(len(data),len(data[0]),len(data) * len(data[0])))

[INFO] Begin Training...
Image XML File:
M:\DataSets\SprintPhotos_Small\sprints.xml
Image files are located in:
M:\DataSets\SprintPhotos_Small
Using XML Format
Finished
There are 21 feature vectors and each vector contains 39960 elements for a total of 839160 elements.


## Begin Training Images that Do Not Contain the Object (Negative Images)

[INFO] describing distraction ROIs...


In [269]:
import glob
print("[INFO] describing distraction ROIs...")
#dstPaths = list(os.listdir(conf["image_distractions"]))
dstPaths=glob.glob(conf["image_distractions"] + "\\*.jpg")
#print(files)

#widgets = ["Extracting: ", progressbar.Percentage(), " ", progressbar.Bar(), " ", progressbar.ETA()]
#pbar = progressbar.ProgressBar(maxval=conf["num_distraction_images"], widgets=widgets).start()
# loop over the desired number of distraction images
patches=[]
for i in np.arange(0, conf["num_distraction_images"]):
    # randomly select a distraction images, load it, convert it to grayscale, and
    # then extract random pathces from the image
    image = cv2.imread(random.choice(dstPaths))
    ##image = resize(image,width=int(conf["max_image_width"]))
    image = cv2.resize(image,SW)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    patches.append(image)
    # extract_patches_2d is a convienent ROI sampling implementation in scikit-learn
    ##patches = extract_patches_2d(image, tuple(conf["window_dim"]),max_patches=conf["num_distractions_per_image"])
    ##patches = extract_patches_2d(image, tuple(SW),max_patches=conf["num_distractions_per_image"])
# loop over the patches,
for patch in patches:
    # extract features from the patch, then update teh data and label list
    ##features = hog.describe(patch)
    features = hog.describe(patch)
    data.append(features)
    labels.append(-1)


    # update the progress bar
    #pbar.update(i)
print("Finished")    
print("There are now {} feature vectors and each vector contains {} elements for a total of {} elements.".format(len(data),len(data[0]),len(data) * len(data[0])))
print("{} Positive features and {} Negative features".format(lenPositiveFeatures),len(patches))

[INFO] describing distraction ROIs...
Finished
There are now 121 feature vectors and each vector contains 39960 elements for a total of 4835160 elements.


In [270]:
# dump the dataset to file
#pbar.finish()
print("[INFO] dumping features and labels to file...")
dump_dataset(data, labels, conf["features_path"], "features")

[INFO] dumping features and labels to file...
Finished Dumping Data


faces_folder: Path to Main photos folder
myTraining:  Path to Training Set of Photos
myTesting:   Path to Testing Set (Not used for training, just for quantifying performance)
myDetector:  I Think: Filename of the Model that is created and then used.

In [9]:
faces_folder ="B:\\DataSets\\2016_USATF_Sprint_TrainingDataset"
myTrainingFilename="trainingset_small.xml"
myTestingFilename="trainingset_small.xml"
myDetector="detector.svm"

### Load the dlib object to simple object detections.
     The train_simple_object_detector() function has a bunch of options, all of which come with reasonable default values.  The next few lines goes over some of these options.
### Select the C Value
    # The trainer is a kind of support vector machine and therefore has the usual
    # SVM C parameter.  In general, a bigger C encourages it to fit the training
    # data better but might lead to overfitting.  You must find the best C value
    # empirically by checking how well the trained detector works on a test set of
    # images you haven't trained on.  Don't just leave the value set at 5.  Try a
    # few different C values and see what works best for your data.

In [10]:
#def train_object_detector(faces_folder,myTraining,myTesting,myDetector,myDetector2):
options = dlib.simple_object_detector_training_options()
options.add_left_right_image_flips = True
options.C = 5
# Tell the code how many CPU cores your computer has for the fastest training.
# Note: DLIB does not use the GPU
options.num_threads = 4 
options.be_verbose = True

In [11]:
training_xml_path = os.path.join(faces_folder, myTrainingFilename)
print(training_xml_path)
testing_xml_path = os.path.join(faces_folder, myTestingFilename)
print(testing_xml_path)

B:\DataSets\2016_USATF_Sprint_TrainingDataset\trainingset_small.xml
B:\DataSets\2016_USATF_Sprint_TrainingDataset\trainingset_small.xml


### Begin Training
    This function does the actual training.  It will save the final detector to the file specified by myDetectorFileName.  The input is an XML file that lists the images in the training dataset and also contains the positions of the face boxes.  


In [12]:
dlib.train_simple_object_detector(training_xml_path, myDetector, options)

RuntimeError: 
Error! An impossible set of object boxes was given for training. All the boxes 
need to have a similar aspect ratio and also not be smaller than about 400 
pixels in area. The following images contain invalid boxes: 
  B:/DataSets/2016_USATF_Sprint_TrainingDataset/dsc_4062.jpg


In [None]:
print("")  # Print blank line to create gap from previous output

In [None]:
print("Training accuracy: {}".format(dlib.test_simple_object_detector(training_xml_path, myDetector)))

In [None]:
print("Testing accuracy: {}".format(dlib.test_simple_object_detector(testing_xml_path, myDetector)))