# Self-Driving Car Engineer Nanodegree


## Project: **Vehicle Detection** 
***

In this project, classic object detection framework, i.e., sliding window + image pyramid region proposer and HOG feature + linear SVM detector will be implemented and used for vehicle detection.

---



---

## Set Up Session

In [1]:
# Configuration file:
from vehicle_detection.utils.conf import Conf
# IO utilities:
import random
import glob
import matplotlib.image as mpimg
from vehicle_detection.utils.dataset import to_hdf5, read_hdf5
import pickle
# Image processing:
import numpy as np
import cv2
from vehicle_detection.extractors import ReshapeTransformer,ColorHistogramTransformer, HOGTransformer, TemplateTransformer
# Visualization:
import matplotlib.pyplot as plt
%matplotlib inline

## Load Configuration

In [2]:
conf = Conf("conf/vehicles.json")

## Explore Dataset
***

First, let's explore the dataset for vehicle classifier building.

After viewing samples from the dataset, we know that **all the cars in images have been clearly segmented**. Thus **HOG features can be extracted directly from input image**.

We still need to **set the window size for HOG extractor**. Besides, dataset composition should also be evaluated(e.g., whether the dataset is imbalanced) so as to select the proper algorithm for classifier building.

The two stats can be attained from the following code:

In [None]:
# Vehicle images:
vehicle_filenames = glob.glob(conf.vehicle_dataset)
print(
    "[  Vehicle Images  ]: Num--{}, Dimensions--{}".format(
        len(vehicle_filenames),
        np.array(
            [mpimg.imread(vehicle_filename).shape for vehicle_filename in vehicle_filenames]
        ).mean(axis = 0)
    )
)
# Non-vehicle images:
non_vehicle_filenames = glob.glob(conf.non_vehicle_dataset)
print(
    "[Non-Vehicle Images]: Num--{}, Dimensions--{}".format(
        len(non_vehicle_filenames),
        np.array(
            [mpimg.imread(non_vehicle_filename).shape for non_vehicle_filename in non_vehicle_filenames]
        ).mean(axis = 0)
    )
)

From the above output we know that:

**1. Window size for HOG extractor should be set as 64-by-64;**

**2. There are 8792 positive images and 8968 negative images in training dataset. The dataset is approximately balanced.**

---

Next let's try to identify the best color space for vehicle & non-vehicle color feature extraction

In [None]:
# Set up session:
from vehicle_detection.detectors.image_processing import resize
from vehicle_detection.utils.visualization import plot_3d

In [None]:
# Utilities for color space exploration:
def parse_conversion(color_space):
    """
    """
    if color_space == "HSV":
        return (cv2.COLOR_BGR2HSV, ("H", "S", "V"))
    elif color_space == "Lab":
        return (cv2.COLOR_BGR2Lab, ("L*", "a*", "b*"))
    else:
        return (cv2.COLOR_BGR2RGB, ("R", "G", "B"))

def plot_pixel_distribution(image_filename, color_space):
    # Read:
    image_BGR = cv2.imread(image_filename)    
    
    # Parse conversion:
    (conversion, channels) = parse_conversion(color_space)

    # Convert subsampled image to desired color space(s):
    img_RGB = cv2.cvtColor(image_BGR, cv2.COLOR_BGR2RGB)  # OpenCV uses BGR, matplotlib likes RGB
    img_color_space = cv2.cvtColor(image_BGR, conversion)
    colors = img_RGB / 255.  # scaled to [0, 1], only for plotting

    # Plot and show:
    plot_3d(img_color_space, colors, axis_labels=channels)
    plt.show()

def explore_pixel_distribution(vehicle_filenames, non_vehicle_filenames, color_space):
    import random
    # Vehicles:
    plot_pixel_distribution(random.choice(vehicle_filenames), color_space)
    # Non-vehicles:
    plot_pixel_distribution(random.choice(non_vehicle_filenames), color_space)

In [None]:
explore_pixel_distribution(vehicle_filenames, non_vehicle_filenames, "HSV")

## Build Training Dataset

***

Now let's build the dataset for vehicle classifier training.

I have wrapped skimage's hog descriptor as a sklearn Pipeline interface-complied class HOGTransformer

Based on previous experience, I choose the following parameters for HOG descriptor:

    1. orientations: 9
    2. pixels_per_cell: (4, 4)
    3. cells_per_block: (2, 2)
    4. transform_sqrt: True, use sqrt normalization
    5. block_norm: L1

The extracted dataset will be saved to local file system as HDF5 file for easy further access.

Below is the helper function for HOG feature extraction. Simple augmentation through horizontal flipping is implemented to generate more training data

In [None]:
# Utilities:
def downsample(
    image_filenames, 
    sampling_percentange
):
    """ Sample image files
    """
    # Down-sample:
    image_filenames = np.random.choice(
        image_filenames, 
        int(sampling_percentange * len(image_filenames))
    )
    
    return image_filenames

def load_images(
    image_filenames,
    image_size,
    augmentation=True
):
    """ Load images
    """
    features = []
    
    # Extract features:
    for image_filename in image_filenames:
        # Load and convert to grayscale:
        object_image = cv2.resize(
            cv2.imread(image_filename),
            image_size,
            interpolation = cv2.INTER_AREA
        )
        # Prepare ROIs:
        ROIs = (object_image, cv2.flip(object_image, 1)) if augmentation else (object_image,)
        # Extract features:
        for ROI in ROIs:
            features.append(ROI)
    
    return features

In [None]:
# Should dataset be extracted:
if conf.generate_dataset:
    # Load images:
    vehicle_images = load_images(
        downsample(vehicle_filenames, sampling_percentange=conf.sampling_percentange),
        tuple(conf.hog_window_size),
        conf.augmentation
    )
    non_vehicle_images = load_images(
        downsample(non_vehicle_filenames, sampling_percentange=conf.sampling_percentange),
        tuple(conf.hog_window_size),
        conf.augmentation
    )
    # Training set:
    X_train = np.array(vehicle_images + non_vehicle_images)
    y_train = np.array([1] * len(vehicle_images) + [-1] * len(non_vehicle_images))
    indices = np.arange(len(X_train))
    np.random.shuffle(indices)
    X_train, y_train = X_train[indices], y_train[indices]
    # Shape:
    X_train = X_train.reshape(tuple(conf.shape_serialized))
    # Dataset info:
    print(X_train.shape)
    print(y_train.shape)

## Build Classifier

***

Here I choose to implement logistic regression & linear SVM using SGDClassifier because the dimensions of training dataset,(35520, 8100), is formidable. Use SVC will lead to a very slow training process.

In [None]:
# Cross validation:
from sklearn.model_selection import StratifiedShuffleSplit
# Classifier:
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.preprocessing import StandardScaler
from xgboost import XGBClassifier
from sklearn.svm import LinearSVC
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import SGDClassifier
from sklearn.calibration import CalibratedClassifierCV
# Evaluation metric:
from sklearn.metrics import accuracy_score
from sklearn.metrics import make_scorer
# Hyperparameter tuning:
from sklearn.model_selection import GridSearchCV

In [None]:
# Model 1--Linear SVC:
def get_linear_svc():
    # Model:
    model = Pipeline(
        [
            # Deserializer:
            ('des', ReshapeTransformer(conf.shape_deserialized)),
            # Feature extractor:
            ('vec', FeatureUnion(
                [
                    ("hog", HOGTransformer(
                        color_space = conf.hog_color_space,
                        shape_only = conf.hog_shape_only,
                        orientations = conf.hog_orientations,
                        pixels_per_cell = tuple(conf.hog_pixels_per_cell),
                        cells_per_block = tuple(conf.hog_cells_per_block),
                        transform_sqrt = conf.hog_normalize,
                        block_norm = str(conf.hog_block_norm)
                    )),
                ]
            )),
            # Preprocessor:
            ('scl', StandardScaler()),
            # Classifier:
            ('clf', LinearSVC(
                penalty='l2', 
                loss=conf.classifier_loss,
                C=conf.classifier_C,
                max_iter=2000
            ))
        ]
    )

    # Hyperparameters:
    params = {
        # VEC--hog:
        #"vec__hog__pixels_per_cell": ((8,8), (16, 16)),
        # CLF--learning rate:
        #"clf__loss": ("hinge", "squared_hinge"),
        # CLF--regularization:
        #"clf__penalty": ("l1", "l2")
        "clf__C": (5e-4, 1e-3)
    }
    
    return (model, params)

In [None]:
# Model 2--XGBoost:
def get_xgboost():
    # Model:
    model = Pipeline(
        [
            # Deserializer:
            ('des', ReshapeTransformer(conf.shape_deserialized)),
            # Feature extractor:
            ('vec', FeatureUnion(
                [
                    # 2. Shape--HOG:
                    ("hog", HOGTransformer(
                        color_space = conf.hog_color_space,
                        shape_only = conf.hog_shape_only,
                        orientations = conf.hog_orientations,
                        pixels_per_cell = tuple(conf.hog_pixels_per_cell),
                        cells_per_block = tuple(conf.hog_cells_per_block),
                        transform_sqrt = conf.hog_normalize,
                        block_norm = str(conf.hog_block_norm)
                    )),
                ]
            )),
            # Preprocessor:
            ('scl', StandardScaler()),
            # Classifier:
            ('clf', XGBClassifier(
                max_depth=8, 
                learning_rate=0.1, 
                n_estimators=1024,
                nthread=4
            ))
        ]
    )

    # Hyperparameters:
    params = {
        # VEC--hog:
        #"vec__hog__pixels_per_cell": ((8,8), (16, 16)),
        # CLF--learning rate:
        #"clf__learning_rate": (0.1, 0.3),
    }
    
    return (model, params)

In [None]:
# Model 3--Logistic regression:
def get_logistic():
    # Model:
    model = Pipeline(
        [
            # Deserializer:
            ('des', ReshapeTransformer(conf.shape_deserialized)),
            # Feature extractor:
            ('vec', FeatureUnion(
                [
                    # 2. Shape--HOG:
                    ("hog", HOGTransformer(
                        color_space = conf.hog_color_space,
                        shape_only = conf.hog_shape_only,
                        orientations = conf.hog_orientations,
                        # Optimal--(8, 8):
                        pixels_per_cell = tuple(conf.hog_pixels_per_cell),
                        cells_per_block = tuple(conf.hog_cells_per_block),
                        # Optimal--True:
                        transform_sqrt = conf.hog_normalize,
                        block_norm = str(conf.hog_block_norm)
                    )),
                ]
            )),
            # Preprocessor:
            ('scl', StandardScaler()),
            # Classifier:
            ('clf', LogisticRegression(
                penalty='l2', 
                C=1.0,
                n_jobs=4 
            ))
        ]
    )

    # Hyperparameters:
    params = {
        # VEC--hog:
        #"vec__hog__pixels_per_cell": ((8,8), (16, 16)),
        # CLF--learning rate:
        #"clf__loss": ("hinge", "squared_hinge"),
        # CLF--regularization:
        #"clf__penalty": ("l1", "l2")
        "clf__C": (1e-3, 1e-1)
    }
    
    return (model, params)

In [None]:
# Create cross-validation sets from the training data
cv_sets_training = StratifiedShuffleSplit(
    n_splits = 3, 
    test_size = 0.20, 
    random_state = 42
).split(X_train, y_train)

# Model 1: Linear SVC
(model, params) = get_linear_svc()
# Model 2: XGBoost:
#(model, params) = get_xgboost()
# Model 3: Logistic
#(model, params) = get_logistic()

# Make an scorer object
scorer = make_scorer(accuracy_score)

# Perform grid search on the classifier using 'scorer' as the scoring method
grid_searcher = GridSearchCV(
    estimator = model,
    param_grid = params,
    scoring = scorer,
    cv = cv_sets_training,
    n_jobs = 2,
    verbose = 10
)

# Fit the grid search object to the training data and find the optimal parameters
grid_fitted = grid_searcher.fit(X_train, y_train)

# Get parameters & scores:
best_parameters, score, _ = max(grid_fitted.grid_scores_, key=lambda x: x[1])

# Display result:
print(
    "[Best Parameters]: {}\n[Best Score]: {}".format(
        best_parameters, score
    )
)

In [None]:
print("[Train & Calibrate Best Model]: ...")
# Get the best model
best_model = grid_fitted.best_estimator_
best_model.set_params(**best_parameters)

# Train on whole dataset with best parameters and probability calibration:
best_model_calibrated = CalibratedClassifierCV(best_model, cv=3)
best_model_calibrated.fit(X_train, y_train)
print("[Train & Calibrate Best Model]: Done.")

# Save model:
with open(conf.classifier_path, 'wb') as model_pkl:
    pickle.dump(best_model_calibrated, model_pkl)

## Vehicle Detection

In [3]:
# Set up session:
from vehicle_detection.detectors import SlidingWindowPyramidDetector
from vehicle_detection.detectors import non_maxima_suppression
from vehicle_detection.detectors import heatmap_filtering

### Create Detector:

In [4]:
# Initialize detector:
detector = SlidingWindowPyramidDetector(
    conf
)

### Test on Static Images

In [5]:
# Utilities:
def detect_vehicle(image, detector, heat_thresh=None):    
    # Detect:
    bounding_boxes = detector.detect(
        image
    )
    
    # Heatmap filtering:
    if not heat_thresh is None:
        bounding_boxes = heatmap_filtering(image, bounding_boxes, heat_thresh)
        
    # Draw:
    canvas = image.copy()
    for bounding_box in bounding_boxes:
        (top, bottom, left, right) = bounding_box
        cv2.rectangle(
            canvas,
            (left, top), (right, bottom),
            (0, 255, 0),
            6
        )
        
    return canvas

In [15]:
# Set up session:
from os.path import join, basename, splitext

for image_filename in glob.glob(conf.test_dataset)[-1:]:
    # Load:
    image = cv2.imread(image_filename)
    
    # Detect:
    image_raw = detect_vehicle(image, detector, None)
    image_filtered = detect_vehicle(image, detector, 2)#conf.heat_thresh)
    
    # Save:
    name, ext = splitext(basename(image_filename))
    for process_type, image_processed in zip(("raw", "filtered"), (image_raw, image_filtered)):
        cv2.imwrite(
            join(
                conf.output_path, 
                "{}-{}{}".format(
                    name,
                    process_type,
                    ext
                )
            ),
            image_processed
        )
    
    print("[{}]: Done".format(name))

## Test on Videos

In [11]:
# Import everything needed to edit/save/watch video clips
from moviepy.editor import VideoFileClip
from scipy.ndimage.measurements import label
from collections import deque
from multiprocessing import Pool
from moviepy.editor import concatenate_videoclips
from IPython.display import HTML

In [7]:
# Static variable decorator:
def static_vars(**kwargs):
    def decorate(func):
        for k in kwargs:
            setattr(func, k, kwargs[k])
        return func
    return decorate

# Frame processor:
@static_vars(
    TEMPORAL_FILTER_LEN=conf.spatial_filtering_filter_len,
    bounding_boxes_queue=deque(), 
    heatmap_accumulator = np.zeros(
        tuple(conf.spatial_filtering_frame_size), 
        dtype=np.int
    )
)
def process_frame(frame):
    """ Detect vehicles in given frame
    """
    # Format:
    frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
    
    # Detect:
    bounding_boxes_current = detector.detect(frame)
    
    # Spatial filtering:
    bounding_boxes_current = heatmap_filtering(
        frame, 
        bounding_boxes_current, 
        conf.heat_thresh
    )

    # Temporal filtering:
    if len(process_frame.bounding_boxes_queue) == process_frame.TEMPORAL_FILTER_LEN:
        # Remove left one:
        for bounding_box in process_frame.bounding_boxes_queue.popleft():
            (top, bottom, left, right) = bounding_box
            process_frame.heatmap_accumulator[top:bottom, left:right] -= 1
    
    # Append:
    process_frame.bounding_boxes_queue.append(bounding_boxes_current)
        
    # Aggregate heat:
    for bounding_box in bounding_boxes_current:
        (top, bottom, left, right) = bounding_box
        process_frame.heatmap_accumulator[top:bottom, left:right] += 1
    
    # Filter:
    heatmap = process_frame.heatmap_accumulator.copy()
    heat_thresh = int(0.8 * len(process_frame.bounding_boxes_queue))
    heatmap[heatmap <= heat_thresh] = 0

    # Label it:
    labelled, num_components = label(heatmap)

    # Identify external bounding boxes:
    bounding_boxes_filtered = []
    for component_id in range(1, num_components + 1):
        # Find pixels with each car_number label value
        nonzero = (labelled == component_id).nonzero()
        # Identify x and y values of those pixels
        nonzero_y, nonzero_x = np.array(nonzero[0]), np.array(nonzero[1])
        # Define a bounding box based on min/max x and y
        bounding_boxes_filtered.append(
            (
                np.min(nonzero_y),
                np.max(nonzero_y),
                np.min(nonzero_x),
                np.max(nonzero_x)
            )
        )
    
    # Draw:
    for bounding_box in bounding_boxes_filtered:
        (top, bottom, left, right) = bounding_box
        cv2.rectangle(
            frame,
            (left, top), (right, bottom),
            (0, 255, 0),
            6
        )
        
    return cv2.resize(
        cv2.cvtColor(frame, cv2.COLOR_BGR2RGB),
        (960, 540)
    )    

In [8]:
def video_process_worker(worker_id):
    # Specify input & output:
    input_filename = video_project_input
    output_filename = video_project_output.format(worker_id + 1)
    
    # Get workload:
    start, end = 10*worker_id, 10*(worker_id + 1)
    
    # Process:
    clip_project = VideoFileClip(input_filename).subclip(start, end)
    clip_project_detected = clip_project.fl_image(process_frame)
    clip_project_detected.write_videofile(output_filename, audio=False)

### Test Video, Shorter One

In [None]:
# IO config:
video_test_input = "test_video.mp4"
video_test_output = "output_videos/test_video_detected.mp4"

In [None]:
### Process:
clip_test = VideoFileClip(video_test_input)
clip_test_detected = clip_test.fl_image(process_frame)
%time clip_test_detected.write_videofile(video_test_output, audio=False)

In [None]:
# Display:
HTML(
    """
    <video width="960" height="540" controls>
      <source src="{0}">
    </video>
    """.format(video_test_output)
)

### Project Video, Longer One

In [9]:
# IO config:
video_project_input = "project_video.mp4"
video_project_output = "output_videos/project_video_detected_{}.mp4"

In [10]:
# Process--parallel:
pool = Pool(5)
pool.map(video_process_worker, range(5))

[MoviePy] >>>> Building video output_videos/project_video_detected_3.mp4
[MoviePy] Writing video output_videos/project_video_detected_3.mp4


  0%|          | 0/251 [00:00<?, ?it/s]

[MoviePy] >>>> Building video output_videos/project_video_detected_4.mp4
[MoviePy] Writing video output_videos/project_video_detected_4.mp4


  0%|          | 0/251 [00:00<?, ?it/s]

[MoviePy] >>>> Building video output_videos/project_video_detected_2.mp4
[MoviePy] Writing video output_videos/project_video_detected_2.mp4


  0%|          | 0/251 [00:00<?, ?it/s]

[MoviePy] >>>> Building video output_videos/project_video_detected_1.mp4
[MoviePy] Writing video output_videos/project_video_detected_1.mp4


  0%|          | 0/251 [00:00<?, ?it/s]

[MoviePy] >>>> Building video output_videos/project_video_detected_5.mp4
[MoviePy] Writing video output_videos/project_video_detected_5.mp4


100%|█████████▉| 250/251 [3:11:04<00:47, 47.26s/it]  


[MoviePy] Done.
[MoviePy] >>>> Video ready: output_videos/project_video_detected_1.mp4 



100%|█████████▉| 250/251 [3:11:56<00:45, 45.84s/it]


[MoviePy] Done.
[MoviePy] >>>> Video ready: output_videos/project_video_detected_3.mp4 



100%|█████████▉| 250/251 [3:12:09<00:45, 45.12s/it]


[MoviePy] Done.
[MoviePy] >>>> Video ready: output_videos/project_video_detected_5.mp4 



100%|█████████▉| 250/251 [3:12:54<00:42, 42.49s/it]


[MoviePy] Done.
[MoviePy] >>>> Video ready: output_videos/project_video_detected_2.mp4 



100%|█████████▉| 250/251 [3:13:05<00:41, 41.96s/it]


[MoviePy] Done.
[MoviePy] >>>> Video ready: output_videos/project_video_detected_4.mp4 



[None, None, None, None, None]

In [13]:
# Merge all clips:
clips = [VideoFileClip(video_project_output.format(id + 1)) for id in range(5)]
concat_clip = concatenate_videoclips(clips, method="chain")
%time concat_clip.write_videofile(video_project_output.format(0), audio=False)

[MoviePy] >>>> Building video output_videos/project_video_detected_0.mp4
[MoviePy] Writing video output_videos/project_video_detected_0.mp4


100%|█████████▉| 1250/1251 [00:12<00:00, 97.79it/s]


[MoviePy] Done.
[MoviePy] >>>> Video ready: output_videos/project_video_detected_0.mp4 

CPU times: user 1.11 s, sys: 760 ms, total: 1.87 s
Wall time: 13.2 s


In [None]:
# Display:
HTML(
    """
    <video width="960" height="540" controls>
      <source src="{0}">
    </video>
    """.format(video_project_output)
)

## Improve the draw_lines() function

**At this point, if you were successful with making the pipeline and tuning parameters, you probably have the Hough line segments drawn onto the road, but what about identifying the full extent of the lane and marking it clearly as in the example video (P1_example.mp4)?  Think about defining a line to run the full length of the visible lane based on the line segments you identified with the Hough Transform. As mentioned previously, try to average and/or extrapolate the line segments you've detected to map out the full extent of the lane lines. You can see an example of the result you're going for in the video "P1_example.mp4".**

**Go back and modify your draw_lines function accordingly and try re-running your pipeline. The new output should draw a single, solid line over the left lane line and a single, solid line over the right lane line. The lines should start from the bottom of the image and extend out to the top of the region of interest.**

Now for the one with the solid yellow lane on the left. This one's more tricky!

In [None]:
yellow_output = 'test_videos_output/solidYellowLeft.mp4'
## To speed up the testing process you may want to try your pipeline on a shorter subclip of the video
## To do so add .subclip(start_second,end_second) to the end of the line below
## Where start_second and end_second are integer values representing the start and end of the subclip
## You may also uncomment the following line for a subclip of the first 5 seconds
##clip2 = VideoFileClip('test_videos/solidYellowLeft.mp4').subclip(0,5)
clip2 = VideoFileClip('test_videos/solidYellowLeft.mp4')
yellow_clip = clip2.fl_image(process_image)
%time yellow_clip.write_videofile(yellow_output, audio=False)

In [None]:
HTML("""
<video width="960" height="540" controls>
  <source src="{0}">
</video>
""".format(yellow_output))

## Writeup and Submission

If you're satisfied with your video outputs, it's time to make the report writeup in a pdf or markdown file. Once you have this Ipython notebook ready along with the writeup, it's time to submit for review! Here is a [link](https://github.com/udacity/CarND-LaneLines-P1/blob/master/writeup_template.md) to the writeup template file.


## Optional Challenge

Try your lane finding pipeline on the video below.  Does it still work?  Can you figure out a way to make it more robust?  If you're up for the challenge, modify your pipeline so it works with this video and submit it along with the rest of your project!

In [None]:
challenge_output = 'test_videos_output/challenge.mp4'
## To speed up the testing process you may want to try your pipeline on a shorter subclip of the video
## To do so add .subclip(start_second,end_second) to the end of the line below
## Where start_second and end_second are integer values representing the start and end of the subclip
## You may also uncomment the following line for a subclip of the first 5 seconds
clip3 = VideoFileClip('test_videos/challenge.mp4').subclip(0,5)
# clip3 = VideoFileClip('test_videos/challenge.mp4')
challenge_clip = clip3.fl_image(process_image)
%time challenge_clip.write_videofile(challenge_output, audio=False)

In [None]:
HTML("""
<video width="960" height="540" controls>
  <source src="{0}">
</video>
""".format(challenge_output))