In this exercise,first I'll use the functions I defined in previous exercises, namely, bin_spatial(), color_hist(), and extract_features() then read in my car and non-car images and extract the color features for each. All that remains is to define a labels vector, shuffle and split the data into training and testing sets, scale the feature vectors to zero mean and unit variance, and finally, define a classifier and train it!

My labels vector **y** in this case will just be a binary vector indicating whether each feature vector in our dataset corresponds to a car or non-car (1's for cars, 0's for non-cars). Given lists of car and non-car features (the output of extract_features()) I can define a labels vector like this:

~~~python
import numpy as np
# Define a labels vector based on features lists
y = np.hstack((np.ones(len(car_features)), 
              np.zeros(len(notcar_features))))
~~~

Next, we'll stack my feature vectors like before:

~~~python
# Create an array stack of feature vectors
X = np.vstack((car_features, notcar_features)).astype(np.float64)
~~~

And now I am ready to shuffle and split the data into training and testing sets. To do this I'll use the Scikit-Learn train_test_split() function, but it's worth noting that recently, this function moved from the sklearn.cross_validation package (in sklearn version <=0.17) to the sklearn.model_selection package (in sklearn version >=0.18).

~~~python
from sklearn.cross_validation import train_test_split
~~~

train_test_split() performs both the shuffle and split of the data and I'll call it like this (here choosing to initialize the shuffle with a different random state each time):

~~~python
# Split up data into randomized training and test sets
rand_state = np.random.randint(0, 100)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=rand_state)
~~~

Now that I have split into training and test sets, I can scale my features. It's important to do the scaling after splitting the data, otherwise I are allowing the scaler to peer into your test data!

~~~python

from sklearn.preprocessing import StandardScaler
# Fit a per-column scaler only on the training data
X_scaler = StandardScaler().fit(X_train)
# Apply the scaler to both X_train and X_test
scaled_X_train = X_scaler.transform(X_train)
scaled_X_test = X_scaler.transform(X_test)
~~~

**Warning: when dealing with image data that was extracted from video, you may be dealing with sequences of images where your target object (vehicles in this case) appear almost identical in a whole series of images. In such a case, even a randomized train-test split will be subject to overfitting because images in the training set may be nearly identical to images in the test set. For the subset of images used in the next several quizzes, this is not a problem, but to optimize your classifier for the project, you may need to worry about time-series of images!**

Now, I am ready to define and train a classifier! Here I will try a Linear Support Vector Machine. To define and train my classifier it takes just a few lines of code

~~~python

from sklearn.svm import LinearSVC
# Use a linear SVC (support vector classifier)
svc = LinearSVC()
# Train the SVC
svc.fit(scaled_X_train, y_train)
~~~

Then I can check the accuracy of your classifier on the test dataset like this: 

~~~python
print('Test Accuracy of SVC = ', svc.score(scaled_X_test, y_test))
~~~

Or I can make predictions on a subset of the test data and compare directly with ground truth:

~~~python
print('My SVC predicts: ', svc.predict(scaled_X_test[0:10].reshape(1, -1)))
print('For labels: ', y_test[0:10])
~~~

In the exercise below to see how the classifier accuracy and training time vary with the feature vector input.

In [4]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import cv2
import glob
import time
from sklearn.svm import LinearSVC
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import train_test_split

In [6]:
def bin_spatial(img,size=(32,32)):
    features=cv2.resize(img,size).ravel()
    return features

In [132]:
def color_hist(img,nbins=32,bins_range=(0,256)):
    
    channel1_hist=np.histogram(img[:,:,0],bins=nbins,range=bins_range)
    channel2_hist=np.histogram(img[:,:,1],bins=nbins,range=bins_range)
    channel3_hist=np.histogram(img[:,:,2],bins=nbins,range=bins_range)
    
    hist_features=np.concatenate((channel1_hist[0],channel2_hist[0],channel3_hist[0]))
    return hist_features


In [133]:
def get_dataset(CarDirectory,NonCarDirectory):
    CarImages=[]
    NonCarImages=[]
    CarImages=glob.glob(CarDirectory,recursive=True)
    NonCarImages=glob.glob(NonCarDirectory,recursive=True)
    
    data_dict={}
    
    data_dict['CarImages']=CarImages
    data_dict['NonCarImages']=NonCarImages
    
    # Define a key in data_dict "n_cars" and store the number of car images
    data_dict["n_cars"] = len(CarImages)
    # Define a key "n_notcars" and store the number of notcar images
    data_dict["n_notcars"] = len(NonCarImages)
    # Read in a test image, either car or notcar
    example_img = mpimg.imread(CarImages[0])
    # Define a key "image_shape" and store the test image shape 3-tuple
    data_dict["image_shape"] = example_img.shape
    # Define a key "data_type" and store the data type of the test image.
    data_dict["data_type"] = example_img.dtype
    return data_dict

In [134]:
def extract_features(imgs, cspace='RGB', spatial_size=(32, 32),
                        hist_bins=32, hist_range=(0, 256)):
    # Create a list to append feature vectors to
    features = []
    # Iterate through the list of images
    for file in imgs:
        # Read in each one by one
        image = mpimg.imread(file)
        # apply color conversion if other than 'RGB'
        if cspace != 'RGB':
            if cspace == 'HSV':
                feature_image = cv2.cvtColor(image, cv2.COLOR_RGB2HSV)
            elif cspace == 'LUV':
                feature_image = cv2.cvtColor(image, cv2.COLOR_RGB2LUV)
            elif cspace == 'HLS':
                feature_image = cv2.cvtColor(image, cv2.COLOR_RGB2HLS)
            elif cspace == 'YUV':
                feature_image = cv2.cvtColor(image, cv2.COLOR_RGB2YUV)
        else: feature_image = np.copy(image)      
        # Apply bin_spatial() to get spatial color features
        spatial_features = bin_spatial(feature_image, size=spatial_size)
        # Apply color_hist() also with a color space option now
        hist_features = color_hist(feature_image, nbins=hist_bins, bins_range=hist_range)
        # Append the new feature vector to the features list
        features.append(np.concatenate((spatial_features, hist_features)))
    # Return list of feature vectors
    return features

In [135]:
CarDirectory='../dataset/vehicles_smallset/*/*.jpeg'
NonCarDirectory='../dataset/non-vehicles_smallset/*/*.jpeg'

In [136]:
data_dict=get_dataset(CarDirectory,NonCarDirectory)

In [157]:
print('Your function returned a count of', 
      data_dict["n_cars"], ' cars and', 
      data_dict["n_notcars"], ' non-cars')

print('of size: ',data_dict["image_shape"], ' and data type:', 
      data_dict["data_type"])

Your function returned a count of 1196  cars and 1125  non-cars
of size:  (64, 64, 3)  and data type: uint8


In [158]:
car_features = extract_features(data_dict['CarImages'], cspace='RGB',
                                spatial_size=(32, 32),hist_bins=32, hist_range=(0, 256))

In [159]:
notcar_features = extract_features(data_dict['NonCarImages'], cspace='RGB',
                                   spatial_size=(32, 32),hist_bins=32, hist_range=(0, 256))

In [160]:
print(len(notcar_features),notcar_features[0].shape)
print(len(car_features),car_features[0].shape)

1125 (3168,)
1196 (3168,)


In [161]:
X = np.vstack((car_features, notcar_features)).astype(np.float64)                        
print(X.shape)

(2321, 3168)


In [162]:
# Create an array stack of feature vectors
y=np.hstack((np.ones(len(car_features)),np.zeros(len(notcar_features))))
print(y.shape)

(2321,)


In [163]:
# Split up data into randomized training and test sets
rand_sate=np.random.randint(0,100)
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=rand_sate)


In [164]:
print(X_train.shape,y_train.shape)

(1856, 3168) (1856,)


In [165]:
print(X_test.shape,y_test.shape)

(465, 3168) (465,)


In [166]:
# Fit a per-column scaler only on the training data
X_scaler=StandardScaler().fit(X_train)
# Apply the scaler to X_train and X_test

X_train=X_scaler.transform(X_train)
X_test=X_scaler.transform(X_test)

In [167]:
# Use a linear SVC 
svc = LinearSVC()
# Check the training time for the SVC
t=time.time()
svc.fit(X_train,y_train)
print(round(time.time()-t,2),'Second to train SVC ...')

2.38 Second to train SVC ...


In [168]:
# Check the score of the SVC
print('Test the accuracy of the SVC= ', round(svc.score(X_test,y_test)))

Test the accuracy of the SVC=  1.0


In [169]:
# Check the prediction time for a single sample
t=time.time()
n_predicts=10
print('my SVC predicts: ',svc.predict(X_test[0:n_predicts]))
print('for these ', n_predicts, 'labels: ',y_test[0:n_predicts])
print(round(time.time()-t,2),'Second to train SVC ...')

my SVC predicts:  [0. 0. 1. 0. 1. 1. 0. 1. 1. 0.]
for these  10 labels:  [1. 0. 1. 0. 1. 1. 0. 1. 1. 0.]
0.0 Second to train SVC ...
