The Project
---

The goals / steps of this project are the following:

* Perform a Histogram of Oriented Gradients (HOG) feature extraction on a labeled training set of images and train a classifier Linear SVM classifier
* Optionally, you can also apply a color transform and append binned color features, as well as histograms of color, to your HOG feature vector. 
* Note: for those first two steps don't forget to normalize your features and randomize a selection for training and testing.
* Implement a sliding-window technique and use your trained classifier to search for vehicles in images.
* Run your pipeline on a video stream (start with the test_video.mp4 and later implement on full project_video.mp4) and create a heat map of recurring detections frame by frame to reject outliers and follow detected vehicles.
* Estimate a bounding box for vehicles detected.

Here are links to the labeled data for [vehicle](https://s3.amazonaws.com/udacity-sdc/Vehicle_Tracking/vehicles.zip) and [non-vehicle](https://s3.amazonaws.com/udacity-sdc/Vehicle_Tracking/non-vehicles.zip) examples to train your classifier.  These example images come from a combination of the [GTI vehicle image database](http://www.gti.ssr.upm.es/data/Vehicle_database.html), the [KITTI vision benchmark suite](http://www.cvlibs.net/datasets/kitti/), and examples extracted from the project video itself.   You are welcome and encouraged to take advantage of the recently released [Udacity labeled dataset](https://github.com/udacity/self-driving-car/tree/master/annotations) to augment your training data.  

Some example images for testing your pipeline on single frames are located in the `test_images` folder.  To help the reviewer examine your work, please save examples of the output from each stage of your pipeline in the folder called `ouput_images`, and include them in your writeup for the project by describing what each image shows.    The video called `project_video.mp4` is the video your pipeline should work well on.  

**As an optional challenge** Once you have a working pipeline for vehicle detection, add in your lane-finding algorithm from the last project to do simultaneous lane-finding and vehicle detection!

**If you're feeling ambitious** (also totally optional though), don't stop there!  We encourage you to go out and take video of your own, and show us how you would implement this project on a new video!

In [1]:
import numpy as np
import cv2
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import SGDClassifier
from sklearn.svm import LinearSVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import FeatureUnion
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.linear_model import SGDClassifier

from moviepy.editor import VideoFileClip
from scipy.ndimage.measurements import label

from skimage.feature import hog
import glob
import time

import keras

%matplotlib inline

Using TensorFlow backend.


In [2]:
# Define a function that takes an image,
# start and stop positions in both x and y, 
# window size (x and y dimensions),  
# and overlap fraction (for both x and y)
def slide_window(img, x_start_stop=[None, None], y_start_stop=[None, None], 
                    xy_window=(64, 64), xy_overlap=(0.5, 0.5)):
    # If x and/or y start/stop positions not defined, set to image size
    if x_start_stop[0] == None:
        x_start_stop[0] = 0
    if x_start_stop[1] == None:
        x_start_stop[1] = img.shape[1]
    if y_start_stop[0] == None:
        y_start_stop[0] = 0
    if y_start_stop[1] == None:
        y_start_stop[1] = img.shape[0]
    # Compute the span of the region to be searched    
    xspan = x_start_stop[1] - x_start_stop[0]
    yspan = y_start_stop[1] - y_start_stop[0]
    # Compute the number of pixels per step in x/y
    nx_pix_per_step = np.int(xy_window[0]*(1 - xy_overlap[0]))
    ny_pix_per_step = np.int(xy_window[1]*(1 - xy_overlap[1]))
    # Compute the number of windows in x/y
    nx_buffer = np.int(xy_window[0]*(xy_overlap[0]))
    ny_buffer = np.int(xy_window[1]*(xy_overlap[1]))
    nx_windows = np.int((xspan-nx_buffer)/nx_pix_per_step) 
    ny_windows = np.int((yspan-ny_buffer)/ny_pix_per_step) 
    # Initialize a list to append window positions to
    window_list = []
    # Loop through finding x and y window positions
    # Note: you could vectorize this step, but in practice
    # you'll be considering windows one by one with your
    # classifier, so looping makes sense
    for ys in range(ny_windows):
        for xs in range(nx_windows):
            # Calculate window position
            startx = xs*nx_pix_per_step + x_start_stop[0]
            endx = startx + xy_window[0]
            starty = ys*ny_pix_per_step + y_start_stop[0]
            endy = starty + xy_window[1]
            
            # Append window position to list
            window_list.append(((startx, starty), (endx, endy)))
    # Return the list of windows
    return window_list

# Define a function to draw bounding boxes
def draw_boxes(img, bboxes, color=(0, 0, 255), thick=6):
    # Make a copy of the image
    imcopy = np.copy(img)
    # Iterate through the bounding boxes
    for bbox in bboxes:
        # Draw a rectangle given bbox coordinates
        cv2.rectangle(imcopy, bbox[0], bbox[1], color, thick)
    # Return the image copy with boxes drawn
    return imcopy


In [3]:
# Define a function you will pass an image 
# and the list of windows to be searched (output of slide_windows())
def search_windows(img, windows, clf, scaler=None, color_space='RGB', 
                    spatial_size=(32, 32), hist_bins=32, 
                    hist_range=(0, 256), orient=9, 
                    pix_per_cell=8, cell_per_block=2, 
                    hog_channel=0, spatial_feat=True, 
                    hist_feat=True, hog_feat=True):

    #1) Create an empty list to receive positive detection windows
    on_windows = []
    #2) Iterate over all windows in the list
    for window in windows:
        #3) Extract the test window from original image
        test_img = cv2.resize(img[window[0][1]:window[1][1], window[0][0]:window[1][0]], (64, 64))      
        #4) Extract features for that window using single_img_features()
        features = single_img_features(test_img, color_space=color_space, 
                            spatial_size=spatial_size, hist_bins=hist_bins, 
                            orient=orient, pix_per_cell=pix_per_cell, 
                            cell_per_block=cell_per_block, 
                            hog_channel=hog_channel, spatial_feat=spatial_feat, 
                            hist_feat=hist_feat, hog_feat=hog_feat)
        #5) Scale extracted features to be fed to classifier
        #test_features = scaler.transform(np.array(features).reshape(1, -1))
        #6) Predict using your classifier
        prediction = clf.predict(test_features)
        #7) If positive (prediction == 1) then save the window
        if prediction == 1:
            on_windows.append(window)
    #8) Return windows for positive detections
    return on_windows
    



In [4]:
# also from lesson materials
def add_heat(heatmap, bbox_list):
    # Iterate through list of bboxes
    for box in bbox_list:
        # Add += 1 for all pixels inside each bbox
        # Assuming each "box" takes the form ((x1, y1), (x2, y2))
        heatmap[box[0][1]:box[1][1], box[0][0]:box[1][0]] += 1

    # Return updated heatmap
    return heatmap# Iterate through list of bboxes
    
def apply_threshold(heatmap, threshold):
    # Zero out pixels below the threshold
    heatmap[heatmap <= threshold] = 0
    # Return thresholded map
    return heatmap

def draw_labeled_bboxes(img, labels):
    # Iterate through all detected cars
    for car_number in range(1, labels[1]+1):
        # Find pixels with each car_number label value
        nonzero = (labels[0] == car_number).nonzero()
        # Identify x and y values of those pixels
        nonzeroy = np.array(nonzero[0])
        nonzerox = np.array(nonzero[1])
        # Define a bounding box based on min/max x and y
        bbox = ((np.min(nonzerox), np.min(nonzeroy)), (np.max(nonzerox), np.max(nonzeroy)))
        # Draw the box on the image
        cv2.rectangle(img, bbox[0], bbox[1], (0,0,255), 6)
    # Return the image
    return img

Time to write my own code!

This part is divided into:

1.  Writing our own classifier
2.  Find boxes

In [5]:
# Read in cars and notcars
"""
images = glob.glob('*.jpeg')
cars = []
notcars = []
for image in images:
    if 'image' in image or 'extra' in image:
        notcars.append(image)
    else:
        cars.append(image)
"""

#cars = glob.glob('smallset/vehicles_smallset/*.jpeg')
#notcars = glob.glob('./smallset/non-vehicles_smallset/*.jpeg')

cars = glob.glob('largeset/vehicles/*/*.png')+glob.glob('smallset/vehicles_smallset/*.jpeg')
notcars = glob.glob('largeset/non-vehicles/*/*.png')+glob.glob('./smallset/non-vehicles_smallset/*.jpeg')

In [6]:
car_imgs = [mpimg.imread(x) for x in cars]
notcar_imgs = [mpimg.imread(x) for x in notcars]

In [7]:
X = np.vstack((car_imgs, notcar_imgs)).astype(np.float64)
y = np.zeros(len(car_imgs)+len(notcar_imgs))
y[:len(car_imgs)] = 1

In [8]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.05, random_state=42)

In [9]:
X_train.shape

(19076, 64, 64, 3)

In [11]:
from keras.applications.mobilenet import *
from keras.applications.imagenet_utils import preprocess_input
from keras.models import Model

In [13]:
base_model = MobileNet(weights='imagenet')

In [15]:
mobilenet_model = Model(input=base_model.input, output=base_model.get_layer('global_average_pooling2d_2').output)

In [33]:
def single_img_features(single_image):
    """
    Convert to image features using mobilenet
    """
    single_image = cv2.resize(single_image, (224, 224)) 
    x = keras.preprocessing.image.img_to_array(single_image)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    
    return x

In [34]:
image = mpimg.imread("test_images/test6.jpg")

In [37]:
# change the extract_features to a scikit learn compatible pipeline
class FeatureCreator(BaseEstimator, TransformerMixin):
    def fit(self, x, y=None):
        return self
    
    def transform(self, X):
        all_images = np.vstack([single_img_features(single_x)
                for single_x in X])
        return mobilenet_model.predict(all_images)

In [None]:
# Use a linear SVC 
model = Pipeline([
    ('feats', FeatureUnion([('create_feats', FeatureCreator())])), 
    ('scaler', StandardScaler()),
    ('linear svm', LinearSVC())])
model.fit(X_train, y_train)

In [None]:
yhat_train = model.predict(X_train)
metric = accuracy_score(y_train, yhat_train)
print("Accuracy Train Rate: {}".format(metric))

In [None]:
yhat_test = model.predict(X_test)
metric = accuracy_score(y_test, yhat_test)
print("Accuracy Test Rate: {}".format(metric))

Find cars in a box...

Goal: make a bunch of rectangles and return all rectangles which look like a car. 

In [None]:
def search_windows_pipeline(img, windows, model):
    #1) Create an empty list to receive positive detection windows
    on_windows = []
    #2) Iterate over all windows in the list
    for window in windows:
        #3) Extract the test window from original image
        test_img = cv2.resize(img[window[0][1]:window[1][1], window[0][0]:window[1][0]], (64, 64))      
        # feed into pipeline
        prediction = model.predict([test_img])
        #7) If positive (prediction == 1) then save the window
        if prediction == 1:
            on_windows.append(window)
    #8) Return windows for positive detections
    return on_windows

In [None]:
image = mpimg.imread("test_images/test6.jpg")
windows = slide_window(image, x_start_stop=[None, None], y_start_stop=[350, 500], 
                    xy_window=(64, 64), xy_overlap=(0.0, 0.0))

hot_windows = search_windows_pipeline(image, windows, model)  
window_img = draw_boxes(image, hot_windows, color=(0, 0, 255), thick=6)                    
plt.imshow(window_img)

In [None]:
windows = slide_window(image, x_start_stop=[None, None], y_start_stop=[350, 500], 
                    xy_window=(96, 96), xy_overlap=(0.5, 0.5))

hot_windows = search_windows_pipeline(image, windows, model)  
window_img = draw_boxes(image, hot_windows, color=(0, 0, 255), thick=6)                    
plt.imshow(window_img)

In [None]:
windows = slide_window(image, x_start_stop=[None, None], y_start_stop=[400, 550], 
                    xy_window=(128, 128), xy_overlap=(0.6, 0.6))

hot_windows = search_windows_pipeline(image, windows, model)  
window_img = draw_boxes(image, hot_windows, color=(0, 0, 255), thick=6)                    
plt.imshow(window_img)

In [None]:
windows = slide_window(image, x_start_stop=[None, None], y_start_stop=[400, 720], 
                    xy_window=(192, 192), xy_overlap=(0.7, 0.7))

hot_windows = search_windows_pipeline(image, windows, model)  
window_img = draw_boxes(image, hot_windows, color=(0, 0, 255), thick=6)                    
plt.imshow(window_img)

In [None]:
# this one suggests we need to have some limits on x start stop!
windows = slide_window(image, x_start_stop=[None, None], y_start_stop=[400, 720], 
                    xy_window=(256, 256), xy_overlap=(0.8, 0.8))

hot_windows = search_windows_pipeline(image, windows, model)  
window_img = draw_boxes(image, hot_windows, color=(0, 0, 255), thick=6)                    
plt.imshow(window_img)

In [None]:
# Adding limits here appears to solve it - the idea is that a car that is close to you shouldn't be in the same "lane" 
# otherwise it might mean you're about to crash! - also it is clear that it is 
# picking up the very bottom of the black car - which might trigger a higher incidence
# of false positives
windows = (slide_window(image, x_start_stop=[1000, None], y_start_stop=[400, 720], 
                        xy_window=(256, 256), xy_overlap=(0.8, 0.8)) +
              slide_window(image, x_start_stop=[0, 280], y_start_stop=[400, 720], 
                        xy_window=(256, 256), xy_overlap=(0.8, 0.8)))

hot_windows = search_windows_pipeline(image, windows, model)  
window_img = draw_boxes(image, hot_windows, color=(0, 0, 255), thick=6)                    
plt.imshow(window_img)

In [None]:
windows = slide_window(image, x_start_stop=[None, None], y_start_stop=[400, 550], 
                    xy_window=(96, 96), xy_overlap=(0.5, 0.5))

hot_windows = search_windows_pipeline(image, windows, model)  
window_img = draw_boxes(image, hot_windows, color=(0, 0, 255), thick=6)                    

# lets draw head boxes...
heat = np.zeros_like(image[:,:,0])
heat = add_heat(heat, hot_windows)    
# Apply threshold to help remove false positives
heat = apply_threshold(heat,1)
# Visualize the heatmap when displaying    
heatmap = np.clip(heat, 0, 255)

_, ((ax1, ax2)) = plt.subplots(1, 2, figsize=(15,7))
ax1.imshow(window_img)
ax2.imshow(heatmap, cmap='hot')

In [None]:
# combine all windows and draw the head map
windows = (
#slide_window(image, x_start_stop=[None, None], y_start_stop=[400, 550], 
#    xy_window=(64, 64), xy_overlap=(0.0, 0.0)) +
#slide_window(image, x_start_stop=[None, None], y_start_stop=[400, 550], 
#    xy_window=(96, 96), xy_overlap=(0.5, 0.5)) +
slide_window(image, x_start_stop=[None, None], y_start_stop=[400, 550], 
    xy_window=(128, 128), xy_overlap=(0.3, 0.3)) +
slide_window(image, x_start_stop=[None, None], y_start_stop=[400, 720], 
    xy_window=(192, 192), xy_overlap=(0.5, 0.5)) +
slide_window(image, x_start_stop=[1000, None], y_start_stop=[400, 720], 
    xy_window=(256, 256), xy_overlap=(0.5, 0.5)) +
slide_window(image, x_start_stop=[0, 280], y_start_stop=[400, 720], 
    xy_window=(256, 256), xy_overlap=(0.5, 0.5))
)
windows = (slide_window(image, x_start_stop=[730, 1280], y_start_stop=[380, 550], 
                       xy_window=(96, 96), xy_overlap=(0.75, 0.75)))

hot_windows = search_windows_pipeline(image, windows, model)  
window_img = draw_boxes(image, hot_windows, color=(0, 0, 255), thick=6)                    

# lets draw head boxes...
heat = np.zeros_like(image[:,:,0])
heat = add_heat(heat, hot_windows)    
# Apply threshold to help remove false positives
heat = apply_threshold(heat,1)
# Visualize the heatmap when displaying    
heatmap = np.clip(heat, 0, 255)

_, ((ax1, ax2)) = plt.subplots(1, 2, figsize=(15,7))
ax1.imshow(window_img)
ax2.imshow(heatmap, cmap='hot')

In [None]:
# apply the label as suggested...
labels = label(heatmap)
draw_img = draw_labeled_bboxes(np.copy(image), labels)
print("{} cars found".format(len(labels)))
plt.imshow(draw_img)

Run a full pipeline!

In [None]:
def process_image(image, model=model):
    # process everything based on the model pipeline used above
    windows = (slide_window(image, x_start_stop=[730, 1280], y_start_stop=[380, 550], 
                           xy_window=(96, 96), xy_overlap=(0.75, 0.75)))
    
    xss = [730, 1280]
    yss = [380, 550]
    windows = (
    slide_window(image, x_start_stop=xss, y_start_stop=yss, 
        xy_window=(64, 64), xy_overlap=(0.6, 0.6)) +
    slide_window(image, x_start_stop=xss, y_start_stop=yss, 
        xy_window=(96, 96), xy_overlap=(0.75, 0.75)) 
    #slide_window(image, x_start_stop=xss, y_start_stop=yss, 
    #    xy_window=(128, 128), xy_overlap=(0.6, 0.6)) +
    #slide_window(image, x_start_stop=xss, y_start_stop=yss, 
    #    xy_window=(192, 192), xy_overlap=(0.7, 0.7)) +
    #slide_window(image, x_start_stop=[1000, None], y_start_stop=yss, 
    #    xy_window=(256, 256), xy_overlap=(0.0, 0.0)) +
    #slide_window(image, x_start_stop=[0, 280], y_start_stop=yss, 
    #    xy_window=(256, 256), xy_overlap=(0.0, 0.0))
    )


    hot_windows = search_windows_pipeline(image, windows, model)  
    window_img = draw_boxes(image, hot_windows, color=(0, 0, 255), thick=6)                    

    # lets draw head boxes...
    heat = np.zeros_like(image[:,:,0])
    heat = add_heat(heat, hot_windows)    
    # Apply threshold to help remove false positives
    heat = apply_threshold(heat,1)
    # Visualize the heatmap when displaying    
    heatmap = np.clip(heat, 0, 255)

    labels = label(heatmap)
    draw_img = draw_labeled_bboxes(np.copy(image), labels)    
    return draw_img

In [None]:
plt.imshow(process_image(image, model))

In [None]:
# run for all images in test
test_images = glob.glob('./test_images/test*.jpg')

fig, axs = plt.subplots(3, 2, figsize=(16,14))
axs = axs.ravel()

for i, im in enumerate(test_images):
    axs[i].imshow(process_image(mpimg.imread(im)))

In [36]:
#test_out_file = 'test_video_20170813.mp4'
#clip_test = VideoFileClip('test_video.mp4')
#clip_test_out = clip_test.fl_image(process_image)
#%time clip_test_out.write_videofile(test_out_file, audio=False)

[MoviePy] >>>> Building video test_video_20170813.mp4
[MoviePy] Writing video test_video_20170813.mp4


 97%|███████████████████████████████████████████████████████████████████████████████▉  | 38/39 [00:18<00:00,  2.11it/s]


[MoviePy] Done.
[MoviePy] >>>> Video ready: test_video_20170813.mp4 

Wall time: 20.6 s


In [37]:
#test_out_file = 'project_video_20170813.mp4'
#clip_test = VideoFileClip('project_video.mp4')
#clip_test_out = clip_test.fl_image(process_image)
#%time clip_test_out.write_videofile(test_out_file, audio=False)

[MoviePy] >>>> Building video project_video_20170813.mp4
[MoviePy] Writing video project_video_20170813.mp4


100%|█████████████████████████████████████████████████████████████████████████████▉| 1260/1261 [11:59<00:00,  1.71it/s]


[MoviePy] Done.
[MoviePy] >>>> Video ready: project_video_20170813.mp4 

Wall time: 12min 2s
