Skip to content

Saadorj/vehicle-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vehicle Detection

The objective of this project is to train a classifier to detect cars, and stream-line it to work on a video input. In the next sections of this write-up we cover the code provided in the jupyter notebook, which represents the pipeline for detecting the images.

The helper functions below were re-utilized from the course material:

color_hist(img, nbins=nbins, bins_range=bins_range)

bin_spatial(img, size=spatial_size)

extract_features(imgs, color_space=color_space, spatial_size=spatial_size,
                     hist_bins=hist_bins, orient=orient,
                     pix_per_cell=pix_per_cell, cell_per_block=cell_per_block,
                     hog_channel=hog_channel, spatial_feat=spatial_feat, 
                     hist_feat=hist_feat, 
                     hog_feat=hog_feat)

slide_window(img, x_start_stop=[None, None], y_start_stop=[None, None],
                 xy_window=(64, 64), xy_overlap=(0.5, 0.5),window_list=None)

search_windows(img, windows, clf, scaler, color_space=color_space,
                   spatial_size=spatial_size, hist_bins=hist_bins,
                   hist_range=hist_range, orient=orient,
                   pix_per_cell=pix_per_cell, cell_per_block=cell_per_block,
                   hog_channel=hog_channel, spatial_feat=spatial_feat,
                   hist_feat=hist_feat, hog_feat=hog_feat)

single_img_features(img, color_space=color_space, spatial_size=spatial_size,
                     hist_bins=hist_bins, orient=orient,
                     pix_per_cell=pix_per_cell, cell_per_block=cell_per_block,
                     hog_channel=hog_channel, spatial_feat=spatial_feat, 
                     hist_feat=hist_feat, 
                     hog_feat=hog_feat)

add_heat(heatmap, bbox_list)

apply_threshold(heatmap, threshold)

draw_labeled_bboxes(img, labels)

First, we pre-process images (if needed) and extract Spatial and HOG features from a labeled training set of cars and non-cars image. Since the images are in PNG format and we use the matplotlib.image imread function, which normalizes pixel intensity values, there would be no need for normalizing this set of images, unlike the case for the test set and video stream.

After that, we go over normalization of the features and fitting a Scaler to the data, before training a linear SVM model using these normalized features. This normalization will guarantee the same range of values of all inputs to the SVM model.

Then we describe the pipeline for detecting vehicles in a test set: We are provided with a group of test images from a camera mounted on the front of a car. We discuss how the above methods will be used to design a sliding window search method that searches for cars in a ROI in the image and show the results in terms of a heat map.

Finally, we apply the pipeline on a video stream, and go over extension methods for stability of the bounding boxes around detected vehicles in consecutive frames.

HOG and Spatial Features


There are multiple features that can be used, such as SIFT, SURF and HOG features, in addition to color-dependent/intensity-dependent features. Another option is to rely on a deep learning model that automatically comes up with its set of features.

In this work, I use HOG features to train the classifier. HOG features produce a distinctive signature of gradient orientations for an object, and the performance of the classifier highly relies on how distinctive these features are. Link to the paper for more details about how it works. Using this feature requires experimenting with the parameters: The number of bins for the orientation histogram, pixels per cell etc. These parameters need to be chosed in a way to produce good features.

from skimage.feature import hog
hog(img, orientations=orient,
	pixels_per_cell=(pix_per_cell, pix_per_cell),
    cells_per_block=(cell_per_block, cell_per_block),
    transform_sqrt=True,
    visualise=vis, feature_vector=feature_vec)

The figure below shoes 3 random car images, and their corresponding HOG features for each channel in the YCrCb color space:

{cars}

{cars_hog}

And here are some HOG features for 3 random non-car images

{noncars}

{noncars_hog}

In addition to HOG we use Spatial features, which is the binned color features.

def bin_spatial(img, size=spatial_size):
    features = cv2.resize(img, size).ravel()
    return features

The parameters used are as follows:

color_space = 'YCrCb'  # Can be RGB, HSV, LUV, HLS, YUV, YCrCb
orient = 18  # HOG orientations
pix_per_cell = 8  # HOG pixels per cell
cell_per_block = 2  # HOG cells per block
hog_channel = "ALL"  # Can be 0, 1, 2, or "ALL"
spatial_size = (64, 64)  # Spatial binning dimensions
nbins = 32
hist_bins = 64  # Number of histogram bins
hist_range = bins_range = (0,256)
spatial_feat = True  # Spatial features on or off
hist_feat = False  # Histogram features on or off
hog_feat = True  # HOG features on or off

First, all the car/non-car images are read:

car = glob.glob('vehicles/*/*.png')
noncar = glob.glob('non-vehicles/*/*.png')
total = car + noncar
cars = []
notcars = []
for image in car:
    cars.append(image)
for image in noncar:
    notcars.append(image)

Training images are of size 64x64x3. After that we get the HOG features for each image using the helper method:

def extract_features(imgs, color_space=color_space, spatial_size=spatial_size,
                     hist_bins=hist_bins, orient=orient,
                     pix_per_cell=pix_per_cell, cell_per_block=cell_per_block,
                     hog_channel=hog_channel, spatial_feat=spatial_feat, 
                     hist_feat=hist_feat, 
                     hog_feat=hog_feat):

The extracted features are pickeld in feat.pkl for easier retrieval on next iterations.

Model Training


Before training the model, we standardize the features by removing the mean and scaling them to unit variance. For that we use:

# Create the training data
X = np.vstack((car_features, notcar_features)).astype(np.float64)
from sklearn.preprocessing import StandardScaler
X_scaler = StandardScaler().fit(X)
scaled_X = X_scaler.transform(X)

After that we split the data into 85% training and 15% validation:

X_train, X_test, y_train, y_test = train_test_split(scaled_X
                                                    , y
                                                    , test_size=0.15
                                                    , random_state=20)

Now we create a model object, train it and see the validation score:

# Create a Linear SVC Object
svc = LinearSVC()
    
# Fit the model to the training data
t = time.time()
svc.fit(X_train, y_train)
t2 = time.time()
print(round(t2 - t, 2), 'Seconds to train SVC...')

# Check the score of the SVC on the holdout set
print('Test Accuracy of SVC = ', round(svc.score(X_test, y_test), 4))

# Check the prediction time for a single sample
t = time.time()

The model has about 99% accuracy on the validation set.

Sliding-Window Search Pipeline

The view field of the car, captured by the installed camera on the front contains a wide area spanning the street, trees on the side and the sky. To decrease the time spent searching for cars, we limit the region of interest to fall in the bottom half of the image.

Also, cars have different sclales. The sliding window method will create a set of windows, with a specific overlap (90% in both dimensions) and size (80x80). Each window gets normalized before HOG features are extracted. Then we standardize these features and feed them to the trained model to make a prediciton. The search function returns a list of bounding boxes for which the model predicted a vehicle. The figure below shows how the slide_window method works to produce a set of boxes, on different scales, to be searched for cars:

{boxes}

And here is the corresponsing code:

img = mpimg.imread("test_images/test1.jpg")
img_norm = img.astype(np.float32)/255
windows_80 = helpers.slide_window(img_norm, x_start_stop=[None, None], y_start_stop=[400,600]
                                          , xy_window=(80, 80), xy_overlap=(0.50, 0.50))
windows_128 = helpers.slide_window(img_norm, x_start_stop=[None, None], y_start_stop=[500,720], 
                    xy_window=(128, 128), xy_overlap=(0.50, 0.5))
windows = windows_80 + windows_128

This workflow results in two main issues:

  • False Positives: Windows where the model detected a car while none existed.
  • Combining multiple detections for the same car due to window overlap.
  • Stability of the bounding boxes between consecutive frames.

In order to reject false positives, we apply a heat map, which increments pixels that happen to fall within a bounding box of a vehicle (windows where the model predicted a vehicle), and then we apply a threshold there, by assigning zero value to all pixels except the ones that have been classified as part of a vehicle more than x times, where x is a threshold.

def add_heat(heatmap, bbox_list):
    for box in bbox_list:
        heatmap[box[0][1]:box[1][1], box[0][0]:box[1][0]] += 1
    return heatmap


def apply_threshold(heatmap, threshold):
    heatmap[heatmap <= threshold] = 0
    return heatmap

For the overlapping detections for the same car, such as in the image below, we would like to combine all these detections into one bounding box. This is achived by having the bounding box surround the heat area associated with that car, since all these overlapping windows have the same label. This is implemented in the method

def get_hot_windows(image, previous=None, count=0)

{overlap}

When the pipeline applied on the set of test images, the following bounding boxes are produced. The original image with the bounding box - after tackling the issue of overlapping windows - is to the left. In the middle, the heat for each pixel is shown, based on the number of positive window each pixel has been part of. To the right, we apply the scipy function

scipy.ndimage.measurements.label(heat)

On the heat map, which provides a label for each detection: aka each squarely-connected group of non-zero pixels. Each label is a specific level of gray scale.

{heat}

Application on Video Stream


Here is a link to the video output.

The video results from applying the whole pipeline as described above, including using the implemetation to combine the overlapping windows into one box. In addition, a Car object is created, that stores previous predictions and allows for averaging the bounding box in the current frame with the previous predictions.

This helps in stabilizing the predictions from one frame to another, instead of the bounding box moving constantly due to significant difference in location from one frame to another.

The methods:

def average_bboxes(image,detected=None)
def get_hot_windows(image, previous=None, count=0)

And the Vehicle() class implement this logic.

In the average_bboxes method, the logic:

    if detected is None:
        detected = Vehicle()
        
    if len(detected.previousHeat)<detected.averageCount:
        for i in range(detected.averageCount):
            detected.previousHeat.append(np.copy(heatmap).astype(np.float))
        
    detected.previousHeat[detected.iteration%detected.averageCount] = heatmap
    total = np.zeros(np.array(detected.previousHeat[0]).shape)
    
    for value in detected.previousHeat:
        total += np.array(value)
    
    averageHeatMap = total/detected.averageCount
    
    averageHeatMap = apply_threshold(averageHeatMap,2)

Checks for previous frame detections for the Vehicle object, which is created on the first frame, and averages the heat maps detected in the current frame with the previous ones, producing a more stable detection.

Discussion


The model had many false detections before applying a threshold of 2. I believe this can be partially attributed to using Spatial features, which bins the intensity value. Using this feature can be misleading as intensity values are not very distincive a.k.a there can be a non-car window that have the distribution of intensity values as a car.

HOG is more distinctive since the gradient will take high values on edges, emphasizing the shape of the car. Using it by itself might produce better results. But a problem that could face HOG is cars in shadow areas, or bright cars in bright areas, since this will make the edges harder to detect and the gradients will take smaller values. This was seen in some images in the training set, like the figure below - a dark car in a dark area, we see that the gradients do not emphasize the shape of a car:

{dark_car} {hog_dark_car}

About

Detect Vehicles in a Video Stream

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published