# In the exercise below, first i am going to tune the HOG Classifier.

### Cross-validation with GridSearchCV

GridSearchCV uses 3-fold cross validation to determine the best performing parameter set. GridSearchCV will take in a training set and divide the training set into three equal partitions. The algorithm will train on two partitions and then validate using the third partition. Then GridSearchCV chooses a different partition for validation and trains with the other two partitions. Finally, GridSearchCV uses the last remaining partition for cross-validation and trains with the other two partitions.

By default, GridSearchCV uses accuracy as an error metric by averaging the accuracy for each partition. So for every possible parameter combination, GridSearchCV calculates an accuracy score. Then GridSearchCV will choose the parameter combination that performed the best.


### Scikit-learn Cross Validation Example

Here's an example from the sklearn [documentation](http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html) for implementing GridSearchCV:

~~~python
    from sklearn.model_selection import GridSearchCV
    from sklearn import svm
    parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
    svr = svm.SVC()
    clf = grid_search.GridSearchCV(svr, parameters)
    clf.fit(iris.data, iris.target)
~~~
Let's break this down line by line.

    parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}

A dictionary of the parameters, and the possible values they may take. In this case, they're playing around with the kernel (possible choices are 'linear' and 'rbf'), and C (possible choices are 1 and 10).

Then a 'grid' of all the following combinations of values for (kernel, C) are automatically generated:

    ('rbf', 1) 	('rbf', 10)
    ('linear', 1) 	('linear', 10)

Each is used to train an SVM, and the performance is then assessed using cross-validation.

    svr = svm.SVC()
    
This looks kind of like creating a classifier, But note that the "clf" isn't made until the next line--this is just saying what kind of algorithm to use. Another way to think about this is that the "classifier" isn't just the algorithm in this case, it's algorithm plus parameter values. Note that there's no monkeying around with the kernel or C; all that is handled in the next line.

    clf = grid_search.GridSearchCV(svr, parameters)
    
This is where the first bit of magic happens; the classifier is being created. I pass the algorithm (svr) and the dictionary of parameters to try (parameters) and it generates a grid of parameter combinations to try.

    clf.fit(iris.data, iris.target)
    
And the second bit of magic. The fit function now tries all the parameter combinations, and returns a fitted classifier that's automatically tuned to the optimal parameter combination. I can now access the parameter values via 
    
    clf.best_params_.


In [51]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import cv2
import glob
import time
from sklearn.svm import LinearSVC
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import train_test_split
from skimage.feature import hog
from sklearn.model_selection import GridSearchCV
from sklearn import svm

In [32]:
def get_hog_features(img,orient,pix_per_cell,cell_per_block,vis=False,feature_vec=True):
        
        if vis==True:
            features,hog_image=hog(img,orientations=orient,
                                     pixels_per_cell=(pix_per_cell,pix_per_cell),
                                     cells_per_block=(cell_per_block,cell_per_block),
                                     block_norm="L2-Hys",
                                     transform_sqrt=True,
                                     visualise=vis,
                                     feature_vector=feature_vec)
            return features,hog_image
        else:
            features=hog(img,orientations=orient,
                                     pixels_per_cell=(pix_per_cell,pix_per_cell),
                                     cells_per_block=(cell_per_block,cell_per_block),
                                     block_norm="L2-Hys",
                                     transform_sqrt=True,
                                     visualise=vis,
                                     feature_vector=feature_vec)
            return features
    

In [33]:
def extract_features(imgs,cspace='RGB',orient=9,pix_per_cell=8,cell_per_block=2,hog_channel=0):
    features=[]
    for file in imgs:
        image=mpimg.imread(file)
        if cspace!='RGB':
            if cspace=='HSV':
                feature_image=cv2.cvtColor(image,cv2.COLOR_RGB2HSV)
            if cspace=='LUV':
                feature_image=cv2.cvtColor(image,cv2.COLOR_RGB2LUV)
            if cspace=='HLS':
                feature_image=cv2.cvtColor(image,cv2.COLOR_RGB2HLS)                
            if cspace=='YUV':
                feature_image=cv2.cvtColor(image,cv2.COLOR_RGB2YUV) 
            if cspace=='YCrCb':
                feature_image=cv2.cvtColor(image,cv2.COLOR_RGB2YCrCb)
        else:
            feature_image=np.copy(image)
        
        
        if hog_channel=="ALL":
            hog_features= []
            for channel in range(feature_image.shape[2]):
                hog_features.append(get_hog_features(feature_image[:,:,channel],
                                                    orient,pix_per_cell,cell_per_block,
                                                    vis=False,feature_vec=True))
            hog_features=np.ravel(hog_features)
        else:
            hog_features=get_hog_features(feature_image[:,:,hog_channel],
                                                    orient,pix_per_cell,cell_per_block,
                                                    vis=False,feature_vec=True)
        features.append(hog_features)
        
    return features

In [34]:
def get_dataset(CarDirectory,NonCarDirectory):
    CarImages=[]
    NonCarImages=[]
    CarImages=glob.glob(CarDirectory,recursive=True)
    NonCarImages=glob.glob(NonCarDirectory,recursive=True)
    
    data_dict={}
    
    data_dict['CarImages']=CarImages
    data_dict['NonCarImages']=NonCarImages
    
    # Define a key in data_dict "n_cars" and store the number of car images
    data_dict["n_cars"] = len(CarImages)
    # Define a key "n_notcars" and store the number of notcar images
    data_dict["n_notcars"] = len(NonCarImages)
    # Read in a test image, either car or notcar
    example_img = mpimg.imread(CarImages[0])
    # Define a key "image_shape" and store the test image shape 3-tuple
    data_dict["image_shape"] = example_img.shape
    # Define a key "data_type" and store the data type of the test image.
    data_dict["data_type"] = example_img.dtype
    return data_dict

In [35]:
CarDirectory='../dataset/vehicles_smallset/*/*.jpeg'
NonCarDirectory='../dataset/non-vehicles_smallset/*/*.jpeg'
data_dict=get_dataset(CarDirectory,NonCarDirectory)

In [36]:
print('Your function returned a count of', 
      data_dict["n_cars"], ' cars and', 
      data_dict["n_notcars"], ' non-cars')

print('of size: ',data_dict["image_shape"], ' and data type:', 
      data_dict["data_type"])

Your function returned a count of 1196  cars and 1125  non-cars
of size:  (64, 64, 3)  and data type: uint8


In [37]:
colorspace = 'HLS' # Can be RGB, HSV, LUV, HLS, YUV, YCrCb
orient = 9
pix_per_cell = 8
cell_per_block = 2
hog_channel = "ALL" # Can be 0, 1, 2, or "ALL"

In [38]:
t=time.time()
car_features=extract_features(data_dict['CarImages'],cspace=colorspace,
                              orient=orient,pix_per_cell=pix_per_cell,
                             cell_per_block=cell_per_block,
                             hog_channel=hog_channel)
print(round(time.time()-t, 2), 'Seconds to extract HOG features...')


8.38 Seconds to extract HOG features...


In [39]:
t=time.time()
Noncar_features=extract_features(data_dict['NonCarImages'],cspace=colorspace,
                              orient=orient,pix_per_cell=pix_per_cell,
                             cell_per_block=cell_per_block,
                             hog_channel=hog_channel)
print(round(time.time()-t, 2), 'Seconds to extract HOG features...')

7.94 Seconds to extract HOG features...


In [40]:
# Create an array stack of feature vectors
X=np.vstack((car_features,Noncar_features)).astype(np.float64)

In [41]:
# Define the labels vector
y=np.hstack((np.ones(len(car_features)),np.zeros(len(Noncar_features))))

In [42]:
print(X.shape)
print(y.shape)

(2321, 5292)
(2321,)


In [43]:
rand_state=np.random.randint(0,100)
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=rand_state)


In [44]:
X_scaler=StandardScaler().fit(X_train)
X_train=X_scaler.transform(X_train)
X_test=X_scaler.transform(X_test)

# GridSearchCV

In [57]:
t=time.time()
svc=svm.SVC()
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
clf = GridSearchCV(svc, parameters)
clf.fit(X_train,y_train)
print(round(time.time()-t,2),'Second to to find SVC parmaeters..')
clf.best_params_

75.11 Second to to find SVC parmaeters..


{'C': 10, 'kernel': 'rbf'}

In [58]:
t=time.time()
svc=svm.SVC(C=10.0, kernel='rbf')
svc.fit(X_train,y_train)
print(round(time.time()-t,2),'Second to train SVC..')

4.07 Second to train SVC..


In [59]:
print('Test Accuracy of SVC',round(svc.score(X_test,y_test),5))

Test Accuracy of SVC 0.97849


In [60]:
# Check the prediction time for a single sample
t=time.time()
n_predict = 10
print('My SVC predicts:', svc.predict(X_test[0:n_predict]))
print('For those',n_predict,'Labels:',y_test[0:n_predict])
print(round(time.time()-t,5),'Second to predict SVC..')


My SVC predicts: [0. 1. 0. 0. 0. 1. 0. 1. 0. 1.]
For those 10 Labels: [0. 1. 0. 0. 0. 1. 0. 1. 0. 1.]
0.03124 Second to predict SVC..
