# Self-Driving Car Engineer Nanodegree

## Vehicle Detection and Tracking

### Introduction
In this project the goal is to detect and mark vehicles in a video. Following the outline of the project I implemented a pipeline with:
1. Extract features from an image using gradient, color and spatial information.
2. Optimize feature extraction using different parameters
3. Train and analyze various classifiers
4. Optimize the chosen classifier
5. Implement a sliding window search
6.  Implement SOMETHING ELSE

## Data handling
As in the former projects I implemented a Data-Handling class to access training data for the classifier, load the project video and store the resulting video. Since the processing of the data takes some time I store the processed data in a separate file. These allows a fast training of various classifier later.

In [1]:
'''
Created on Nov 4, 2017

@author: andreas
'''

import glob
from moviepy.editor import VideoFileClip
import pickle

class DataHandling:
    def __init__(self):
        self.carDataPathes = "D:/Andreas/Programming/Python/UdacitySelfDrivingCar/Term1Projects/Project5/Data/vehicles/*"
        self.nonCarDataPathes = "D:/Andreas/Programming/Python/UdacitySelfDrivingCar/Term1Projects/Project5/Data/non-vehicles/*"
        self.projectVideoPath = "D:/Andreas/Programming/Python/UdacitySelfDrivingCar/Term1Projects/Project5/CarND-Vehicle-Detection/project_video.mp4"
        self.processedDataPath = "D:/Andreas/Programming/Python/UdacitySelfDrivingCar/Term1Projects/Project5/Data/ProcessedData.p"
        self.features = 'features'
        self.labels = 'labels'
        self.totalDataCount = None
        
        
    def GetFiles(self, path):
        directories = [dirs for dirs in glob.glob(path)]
        files =[]
        for dir in directories:
            string = dir+"/*.png"
            string = string.replace("\\", "/")
            currentFiles = [files for files in glob.glob(string)]
            files = files + currentFiles
        return files
    
    def GetCarData(self):
        carData = self.GetFiles(self.carDataPathes)
        #print ("Number of vehicle images: ",len(carData))
        return carData
    
    def GetNonCarData(self):
        nonCarData = self.GetFiles(self.nonCarDataPathes)
        #print ("Number of non-vehicle images: ", len(nonCarData))
        return nonCarData

    def LoadProjectVideo(self):
        return VideoFileClip(self.projectVideoPath)


    def SavePreProcessedData(self, features, labels):
        dataPickle = {}
        dataPickle[self.features] = features 
        dataPickle[self.labels] = labels
        pickle.dump( dataPickle, open( self.processedDataPath, "wb" ) )

    def LoadPreProcessedData(self):
        dataPickle = pickle.load(open( self.processedDataPath, "rb" ) )
        features = dataPickle[self.features]
        labels = dataPickle[self.labels]
        return features, labels 

## Feature Processing 
For the feature extraction I tweaked the parameters in the code samples provided by Udacity in the project lesson. In a more involved study I looped over parameter list. The feature processing class allows the computation of 
- spatial features
- color features, using one or all color channels
- the histogram of gradients

For each parameter set of these computation I introduced named tuples, and defined default values.
In addition it allows to choose a color space (HSV, LUV, YUV, BGR, RGB). The exploration of the parameters is done in section "Parameters for feature processing".
 

In [2]:
'''
Created on Nov 5, 2017

@author: andreas
'''

import cv2
import numpy as np
from skimage.feature import hog
from collections import namedtuple
from sklearn.preprocessing import StandardScaler

HogParameters = namedtuple("HogParameters", "isOn orientationsCount pixelsPerCell cellsPerBlock visualize channel")
ColorParameters = namedtuple("ColorParameters", "isOn binCount")
SpatialParameters = namedtuple("SpatialParameters", "isOn spatialSize")

DefaultHogParameters = HogParameters(isOn = True, orientationsCount = 18, pixelsPerCell = (16,16), cellsPerBlock = (2,2), visualize = False, channel = "ALL")
DefaultColorParameters = ColorParameters(isOn = True, binCount = 64)
DefaultSpatialParameters = SpatialParameters(isOn =True, spatialSize = (8,8))


class FeatureProcessing:
    def __init__(self, hogParameters, colorParameters, spatialParameters, colorSpace = "HSV"):
        self.hogParameters = hogParameters
        self.colorParameters = colorParameters
        self.spatialParameters = spatialParameters 
        self.colorSpace = colorSpace
        self.data = DataHandling()
        self.normalizedFeatures= None
        self.labels = None
        self.PrintParameters()
    
    def PrintParameters(self):
        print("Colorspace is ", self.colorSpace)
        for name, value in self.hogParameters._asdict().items():
            print("Hog parameter '"+name+"' is: ", value)
        for name, value in self.colorParameters._asdict().items():
            print("Color parameter '"+name+"' is: ", value)
        for name, value in self.spatialParameters._asdict().items():
            print("Spatial parameter '"+name+"' is: ", value)

    
    def ComputeSpatialFeatures(self, featureImage):
        features = cv2.resize(featureImage, self.spatialParameters.spatialSize).ravel()
        return features
    
    def ComputeColorHistogramFeatures(self, featureImage):
        binsRange = (0,256)
        binCount = self.colorParameters.binCount

        channel0 = np.histogram(featureImage[:,:,0], bins=binCount, range = binsRange) 
        channel1 = np.histogram(featureImage[:,:,1], bins=binCount, range = binsRange)
        channel2 = np.histogram(featureImage[:,:,2], bins=binCount, range = binsRange)
        
        features = np.concatenate((channel0[0], channel1[0], channel2[0]))
        return features
    
    
    def GetHogFeatures(self,featureImage):
        pixelsPerCell = self.hogParameters.pixelsPerCell
        cellsPerBlock = self.hogParameters.cellsPerBlock
        orientationsCount = self.hogParameters.orientationsCount
        visualize = self.hogParameters.visualize
        featureVector = True
        if(visualize):
            features, hog_image = hog(featureImage, orientations=orientationsCount, pixels_per_cell=pixelsPerCell, cells_per_block=cellsPerBlock, transform_sqrt=True, visualise=visualize, feature_vector=featureVector)
        else:
            features = hog(featureImage, orientations=orientationsCount, pixels_per_cell=pixelsPerCell, cells_per_block=cellsPerBlock, transform_sqrt=True, visualise=visualize, feature_vector=featureVector)
            
        return features

    
    def ComputeGradientFeatures(self, featureImage):
        channel = self.hogParameters.channel
        features = []
        if(channel == "ALL"):
            hogFeatures = []
            
            for channel in range(featureImage.shape[2]):
                hogFeatures.append(self.GetHogFeatures(featureImage[:,:,channel]))
            hogFeatures = np.ravel(hogFeatures)
        else:
            hogFeatures = self.GetHogFeatures(featureImage[:,:,channel])
        
        
        return hogFeatures
         
    
    def SetColorSpace(self,image, colorSpace):
        featureImage = image
        if(colorSpace == "HSV"):
            featureImage = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
        elif(colorSpace == "LUV"):
            featureImage = cv2.cvtColor(image, cv2.COLOR_BGR2LUV)
        elif(colorSpace == "YUV"):
            featureImage = cv2.cvtColor(image, cv2.COLOR_BGR2YUV)
        elif(colorSpace == "RGB"):
            featureImage = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        elif(colorSpace == "HLS"):
            featureImage = cv2.cvtColor(image, cv2.COLOR_BGR2HLS)
        
        return featureImage
    
    def ComputeFeatures(self, imagesPathes, show=False):
        features = []
        
        for imagePath in imagesPathes:
            image = cv2.imread(imagePath)
            if(show):
                print(image.shape)
                cv2.imshow('title',image)
                cv2.waitKey(0)
            colorImage = self.SetColorSpace(image, self.colorSpace)
            
            spatialFeatures = self.ComputeSpatialFeatures(colorImage)
            colorFeatures = self.ComputeColorHistogramFeatures(colorImage)
            gradientFeatures = self.ComputeGradientFeatures(colorImage)
            
            features.append(np.concatenate((gradientFeatures, colorFeatures, spatialFeatures)))
        return features
            
    def NormalizeFeatures(self, features):
        scaler = StandardScaler().fit(features)
        return scaler.transform(features)
    
    def ComputeAllFeaturesAndLabels(self, storeData = False):
        carData = self.data.GetCarData()
        nonCarData = self.data.GetNonCarData()
        
        
        carFeatures = self.ComputeFeatures(carData)
        nonCarFeatures = self.ComputeFeatures(nonCarData)
        
        allFeatures = np.vstack((carFeatures, nonCarFeatures)).astype(np.float64)
        self.normalizedFeatures = self.NormalizeFeatures(allFeatures)
        self.labels = np.hstack((np.ones(len(carFeatures)), np.zeros(len(nonCarFeatures))))
        
        assert(len(self.labels) == len(self.normalizedFeatures))
        if(storeData):
            self.data.SavePreProcessedData(self.normalizedFeatures, self.labels)

        return self.normalizedFeatures, self.labels


  

## Classification
The classification class defined below, implements the following features:
- Setting a classifier with default parameter values out of the list: LinearSVC, SVC (RBF kernel), SVC (poly kernel), SVC (sigmoid kernel), RandomForests, DecisionTrees, AdaBoost, NaiveBayes and NearestNeighbor
- Splitting the processed vehicle and non-vehicle data into a training and a data set
- Searching for suitbable parameters for classifiers using GridSearchCV and RandomizedSearchCV
- Training the classifier 
- Running the classifier on the test data and calculating the measures accuracy, precision, recall and F1


In [14]:
'''
Created on Nov 5, 2017

@author: andreas
'''

import scipy
from scipy.stats import randint as sp_randint
from sklearn.svm import *
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors.classification import KNeighborsClassifier
from sklearn.utils import shuffle
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV


from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_recall_fscore_support
from sklearn.model_selection import train_test_split
import time
import itertools

class Classifier:
    
    def __init__(self, classifierType = 'LinearSVC'):
        self.classifierType = classifierType
        self.classifier = None
        self.trainingFeatures = None
        self.testFeatures = None
        self.trainingLabels = None
        self.testLabels = None
        self.data = DataHandling()
        self.SetClassifierType()

    def SetRandomCVClassifier(self):
        if(self.classifierType == "LinearSVC"):
            self.classifier = LinearSVC()
            parameterDistribution = {'C': scipy.stats.expon(scale=100) }
        #    self.classifier = RandomizedSearchCV(self.classifier, param_distributions=parameterDistribution , n_iter=100)
        elif(self.classifierType == "RandomForest"):
            self.classifier = RandomForestClassifier()
            parameterDistribution = {"n_estimators": sp_randint(10,100), "min_samples_split": sp_randint(2, 100),
              "min_samples_leaf": sp_randint(2, 10),
              "criterion": ["gini", "entropy"]}
        
        self.classifier = RandomizedSearchCV(self.classifier, param_distributions=parameterDistribution, n_iter=100)

    def SetClassifierType(self):
        if(self.classifierType == "LinearSVC"):
            self.classifier = LinearSVC()
        elif(self.classifierType == "SVC RBF"):
            self.classifier = SVC(kernel="rbf", C=1000, gamma=10)
        elif(self.classifierType == "SVC Poly"):
            self.classifier = SVC(kernel="poly", gamma=0.1)
        elif(self.classifierType == "SVC Sig"):
            self.classifier = SVC(kernel="sigmoid", gamma=0.1)
        elif(self.classifierType == "RandomForest"):
            self.classifier = RandomForestClassifier(min_samples_split=10, n_estimators = 60)
        elif(self.classifierType == "DecisionTree"):
            self.classifier = DecisionTreeClassifier(min_samples_split=10)
        elif(self.classifierType == "NearestNeighbor"):
            self.classifier = KNeighborsClassifier()
        elif(self.classifierType == "NaiveBayes"):
            self.classifier = GaussianNB()
        elif(self.classifierType == "AdaBoost"):
            self.classifier = AdaBoostClassifier(learning_rate=0.01)
        
    
        
            
    def SetTrainingAndTestData(self, hogParameters, colorParameters, spatialParameters, colorSpace, reprocess=False, storeData =True):
        if(reprocess):
            featureProcessing = FeatureProcessing(hogParameters, colorParameters, spatialParameters, colorSpace = colorSpace)
            features, labels = featureProcessing.ComputeAllFeaturesAndLabels(storeData)
            random = 0
        else:
            random = np.random.randint(0,100)
            features, labels = self.data.LoadPreProcessedData()
            features, labels = shuffle(features, labels, random_state= random)

        self.trainingFeatures, self.testFeatures, self.trainingLabels, self.testLabels = train_test_split(features, labels, test_size=0.2, random_state=1675637+random)

        
    
    def TrainClassifier(self):
        start = time.time()
        self.classifier.fit(self.trainingFeatures, self.trainingLabels)
        end = time.time()
        print ('Time to train classifier ' + self.classifierType + " [s]: ", round(end-start,2))
    
    def TestClassifier(self, details =True):
        #someLabels = self.classifier.decision_function(self.testFeatures)
        predictedLabels = self.classifier.predict(self.testFeatures)
        predictedLabelsTraining = self.classifier.predict(self.trainingFeatures)
        print('Training accuracy score of classifier in % = ', round(accuracy_score(self.trainingLabels, predictedLabelsTraining), 8)*100)
        print('Test accuracy score of classifier in % = ', round(accuracy_score(self.testLabels, predictedLabels), 8)*100)
        if(details ==True):
            print('Precision, recall, F-Score () = ', precision_recall_fscore_support(self.testLabels, predictedLabels))
            print("Best parameters: ", self.classifier.best_params_)


## Parameters for feature processing
Search for the best feature processing parameters I scanned systematically over a wide range of parameters using predefined lists. The method below uses subsets of these predefined lists, since the all scans took several days of run time. 
To evaluate the feature processing variants I used the accuracy score of the LinearSVC classifier.

In [4]:
def ExploreFeatureProcessing():
    MyColorSpaceList = ["HSV", "LUV"]
    OrientationCountList = [9, 72]        
    PixelsPerCellList = [16]
    CellsPerBlockList = [4]
    ChannelList = ['ALL', 0]
    
    HogParametersList = [OrientationCountList, PixelsPerCellList, CellsPerBlockList, ChannelList]
    HogParametersTupleList =  list(itertools.product(*HogParametersList))
    
    
    BinCountList = [32, 128] 
    SpatialSizeList = [16]
            
    MyColorSpace = None
    for color in MyColorSpaceList:
        MyColorSpace = color
        for hogParameters in HogParametersTupleList:
            orientation = hogParameters[0] 
            pixelsPerCellTuple = (hogParameters[1],hogParameters[1])
            cellsPerBlockTuple = (hogParameters[2],hogParameters[2])
            channelChoosen = hogParameters[3]
            for binCountElement in BinCountList:
                for spatialSizeElement in SpatialSizeList:
                    spatialSizeTuple = (spatialSizeElement, spatialSizeElement)
                    MyHogParameters = HogParameters(isOn = True, orientationsCount = orientation, pixelsPerCell = pixelsPerCellTuple, cellsPerBlock = cellsPerBlockTuple, visualize = False, channel = channelChoosen)
                    MyColorParameters = ColorParameters(isOn = True, binCount = binCountElement)
                    MySpatialParameters = SpatialParameters(isOn =True, spatialSize = spatialSizeTuple)
    
                    print("------------------------")
                    classifier = Classifier()
                    classifier.SetTrainingAndTestData(MyHogParameters, MyColorParameters, MySpatialParameters, MyColorSpace, True, False)
                    classifier.TrainClassifier()
                    classifier.TestClassifier(False)      
                    print("------------------------")  
ExploreFeatureProcessing()

------------------------
Colorspace is  HSV
Hog parameter 'isOn' is:  True
Hog parameter 'orientationsCount' is:  9
Hog parameter 'pixelsPerCell' is:  (16, 16)
Hog parameter 'cellsPerBlock' is:  (4, 4)
Hog parameter 'visualize' is:  False
Hog parameter 'channel' is:  ALL
Color parameter 'isOn' is:  True
Color parameter 'binCount' is:  32
Spatial parameter 'isOn' is:  True
Spatial parameter 'spatialSize' is:  (16, 16)
Time to train classifier LinearSVC [s]:  25.46
Training accuracy score of classifier in % =  100.0
Test accuracy score of classifier in % =  99.01464
------------------------
------------------------
Colorspace is  HSV
Hog parameter 'isOn' is:  True
Hog parameter 'orientationsCount' is:  9
Hog parameter 'pixelsPerCell' is:  (16, 16)
Hog parameter 'cellsPerBlock' is:  (4, 4)
Hog parameter 'visualize' is:  False
Hog parameter 'channel' is:  ALL
Color parameter 'isOn' is:  True
Color parameter 'binCount' is:  128
Spatial parameter 'isOn' is:  True
Spatial parameter 'spatialSi

From the parameter scans I chose my setting to be:
- Color space = LUV
- HOG parameters: orientationsCount = 72, pixelsPerCell = (16,16), cellsPerBlock = (4,4), channel = 'ALL'
- Color parameters: binCount = 128
- Spatial parameters: spatialSize = (8,8)

## Choice of classifier
In the next step I used the fixed parameter set from above and applied a the list of classifiers:
LinearSVC, SVC (RBF kernel), SVC (poly kernel), SVC (sigmoid kernel), RandomForests, DecisionTrees, AdaBoost, NaiveBayes and NearestNeighbor. For each classifier I experimented with different parameters. Below I run each classifier once using the  processed data set.


In [4]:
def ExploreClassifiers():
    myColorSpace = "LUV"
    myHogParameters = HogParameters(isOn = True, orientationsCount = 72, pixelsPerCell = (16,16), cellsPerBlock = (4,4), visualize = False, channel = 'ALL')
    myColorParameters = ColorParameters(isOn = True, binCount = 128)
    mySpatialParameters = SpatialParameters(isOn =True, spatialSize = (8,8))
    classifierList = ["LinearSVC",  "RandomForest",  "DecisionTree", "NearestNeighbor", "NaiveBayes", "AdaBoost", "SVC RBF", "SVC Poly", "SVC Sig"]
    for classifier in classifierList:
        print("------------------------")
        classifier = Classifier(classifier)
        classifier.SetTrainingAndTestData(myHogParameters, myColorParameters, mySpatialParameters, myColorSpace)
        classifier.TrainClassifier()
        classifier.TestClassifier(False)      
        print("------------------------")  
        
ExploreClassifiers()

------------------------
Time to train classifier LinearSVC [s]:  40.24
Training accuracy score of classifier in % =  100.0
Test accuracy score of classifier in % =  99.634009
------------------------
------------------------
Time to train classifier RandomForest [s]:  29.58
Training accuracy score of classifier in % =  100.0
Test accuracy score of classifier in % =  99.127252
------------------------
------------------------
Time to train classifier DecisionTree [s]:  63.38
Training accuracy score of classifier in % =  99.75366
Test accuracy score of classifier in % =  97.297297
------------------------
------------------------
Time to train classifier NearestNeighbor [s]:  4.2
Training accuracy score of classifier in % =  99.373592
Test accuracy score of classifier in % =  98.986486
------------------------
------------------------
Time to train classifier NaiveBayes [s]:  1.06
Training accuracy score of classifier in % =  93.334741
Test accuracy score of classifier in % =  92.989865

The above rsults show that, all classifiers work on the training data with the samllest accucary around 85%. The test data predictions show that many classifiers yield very good results, the best results provided by "LinearSVC", "RandomForests". In the above sample I apparently forgot to provide the degree of the polynomial when using "SVC Poly" and got identical results as for "Linear SVC". Due to the long runtime of "SVC poly" I did not rerun polynomial kernels here. From the above results I chose "LinearSVC" and "RandomForests" for optimization.

## Classifier optimization
For the optimizatio I used the 'RandomizedSearchCV' provided by sklearn. In addition I looked into other performance measures, like precision, recall and the F1 score. For "LinearSCV" I just used the C-parameter whose value controls soft (low C value) and hard (high C value) classification. For  "RandomForest" classifier Ifound by experimenting a dependency to the number of trees ("n_estimators"), the minimum number of samples required at a leaf node ("min_samples_leaf"), the minimum number of samples required to split an internal node ("min_samples_split") and the information gain criterion ("criterion"). 'RandomizedSearchCV' runs with 100 iterations.

In [16]:
def OptimizeClassifiers():
    myColorSpace = "LUV"
    myHogParameters = HogParameters(isOn = True, orientationsCount = 72, pixelsPerCell = (16,16), cellsPerBlock = (4,4), visualize = False, channel = 'ALL')
    myColorParameters = ColorParameters(isOn = True, binCount = 128)
    mySpatialParameters = SpatialParameters(isOn =True, spatialSize = (8,8))
    classifierList = ["LinearSVC", "RandomForest"]
    for classifier in classifierList:
        print("------------------------")
        classifier = Classifier(classifier)
        classifier.SetRandomCVClassifier()
        classifier.SetTrainingAndTestData(myHogParameters, myColorParameters, mySpatialParameters, myColorSpace, False)
        classifier.TrainClassifier()
        classifier.TestClassifier(True)      
        print("------------------------")  
OptimizeClassifiers()
OptimizeClassifiers()

------------------------
Time to train classifier LinearSVC [s]:  411.69
Training accuracy score of classifier in % =  100.0
Test accuracy score of classifier in % =  99.493243
Precision, recall, F-Score () =  (array([ 0.99507119,  0.99478563]), array([ 0.99507119,  0.99478563]), array([ 0.99507119,  0.99478563]), array([1826, 1726], dtype=int64))
Best parameters:  {'C': 3.6832461817201274}
------------------------
------------------------
Time to train classifier RandomForest [s]:  3754.64
Training accuracy score of classifier in % =  99.985923
Test accuracy score of classifier in % =  99.493243
Precision, recall, F-Score () =  (array([ 0.99114064,  0.99885452]), array([ 0.99888393,  0.99090909]), array([ 0.99499722,  0.99486594]), array([1792, 1760], dtype=int64))
Best parameters:  {'criterion': 'entropy', 'n_estimators': 91, 'min_samples_split': 13, 'min_samples_leaf': 4}
------------------------
------------------------
Time to train classifier LinearSVC [s]:  432.23
Training accur

The "OptimizeClassifiers" method is called twice above and shows that the optimization results in similar performance results as the original parameters. In my opinion this observation may have two different reasons:
1. The dependency between the classifier parameters and the performance of the classifier is rather small for the problem in hand. 
2. 100 iterations for the randomized scan of the defined parameter space are not enough to find the best parameter combination, in particular for the "RandomForest" classifier

As a consequence of these observations I chose the LinearSVC with default settings for the classification of video images.
## Summary and outlook
The project was very interesting. In particular experimenting with different computer vision techniques was very educational.  My result for the project video is very good in my opinion, and the result for the challenge video is good too but could be improved. I also tried the hard challenge and found the name chosen appropriate. I would be very interested how to master the  changing light conditions. On youtube I looked up some results but so far I did not find a satisfying performance for the hard challenge.
### Changes after review
Through the changes inspired by the review, the performance regarding the project video has been improved but for the challenge video some additional adaptions are needed.