# Traffic sign classification

## 1.Prepare the extracted feature
### Last session you have been introduced a lot of features. These features all can be used for classification.
It is certain that the pixel itself also can be regarded as a kind of feature.
The tutorial of classification using raw pixel can be found [here](./GuideLine_For_ML_IJCAI_color.ipynb).

###  Get Feature
Here we just got a pre-extracted feature, the canny feature. 
Note that here we can change this "canny feature" to any features whatever you like :)

We have already got the features with name "CannyFeature.npy", and the corresponding class files "Label.txt".
So the next is to load the feature data and the label.

In [18]:
# Load some packages 
import numpy as np
# Load features
CannyFeature = np.load("./CannyFeature.npy")
# Print the shape of the feature
print ("The shape of the pre-extracted feature is:", CannyFeature.shape)
Label = np.loadtxt("./label.txt", dtype=np.int)
print ("The label is:\n", Label)

The shape of the pre-extracted feature is: (359, 10000)
The label is:
 [ 11  16  30  18  16  35  42  34 101 108 108 108 108 108 108 108  16  16
  68  68  51  51 105  41  51  16  16  16  16 105 105  41 103  51  68  68
  43  16  16  83  83 118 118 118 101 101  34  34  51  51 104  51  41  41
  41 104  16  16  69  16  69 118  69  69  68  68  87  87  68  87  68  87
  68  87  69  16  16  69  16  16  69  16  41  69  41  41  69  41  41 101
 101 101 101 101  51  51  51  51  51 118 118 118 118 118  47  34  44  40
 102 107  41 101  71  51  47  47  47  47  47  47  47  71  51  43  47  50
 104  41  50  50  41  41  51  50  51 104  51  40  50  41  51  50  51  42
 108  11  11  16  35  41  71  51  71  71  68  35  35 103 118 118 103  51
  51  34  42  41  51  51 116  41  51  35  16  40  40 112  16  35  16 102
  41 101  42  40  34 116  34  66  54  40  18 116  51 105  16  41 104  43
  15  11  16 105 118  15  51  35  64  35  35  29  15  42  47  41 118 112
  35  35 118 118  22  22  40 115 118 112  51  51 102 

Now we can use these features and label to train a classifier!

### Building Classifier
Here is the steps for the task:
+ Standardize our data
+ Split the data into training set and testing set
+ Use the training set to train a model
+ Evaluate the model

First we preprocess our data, including standardizing:

In [19]:
# Load the StandardScaler package
from sklearn.preprocessing import StandardScaler
# Standardize the data
DataScaler = StandardScaler().fit(CannyFeature)
X_scaled = DataScaler.transform(CannyFeature)

Then we start the second step, split the data:

In [20]:
# Load the train_test_split package
from sklearn.model_selection import train_test_split
# Split the total data, 33%of which is regarded as the testing set
X_train, X_test, y_train, y_test = train_test_split(X_scaled, Label,\
                                                   test_size=0.33, random_state=42)

Now we can train a model. For example, Random Forest

In [21]:
# Load the Random Forest model
from sklearn.ensemble import RandomForestClassifier as rf
rfclf = rf(n_estimators=500, max_features=20, random_state=42)

rfclf.fit(X_train, y_train)

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features=20, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=500, n_jobs=None,
            oob_score=False, random_state=42, verbose=0, warm_start=False)

We can also use the skills learned from the first workshop to search for better parameters of the Random Forest.

For example, the grid search method with cross validation:

In [22]:
# Load the GridSearchCV model for optimization
from sklearn.model_selection import GridSearchCV
# Instantiate the random forest model
rfclf = rf(random_state=42)
# Set the parameters of the random forest for searching
rfparams = {'n_estimators': [10,20,30,50,100,200,500,1000],
            'max_features': [10, 20, 50, 100]}

# Instantiate a grid search with cross validation model to optimize the random forest model with the parameters
clf = GridSearchCV(rfclf, rfparams, n_jobs=-1, cv=5, verbose=1)
# Use the training set to fit the model
clf.fit(X_train, y_train)

Fitting 5 folds for each of 32 candidates, totalling 160 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    4.6s
[Parallel(n_jobs=-1)]: Done 160 out of 160 | elapsed:   25.2s finished


GridSearchCV(cv=5, error_score='raise-deprecating',
       estimator=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators='warn', n_jobs=None,
            oob_score=False, random_state=42, verbose=0, warm_start=False),
       fit_params=None, iid='warn', n_jobs=-1,
       param_grid={'n_estimators': [10, 20, 30, 50, 100, 200, 500, 1000], 'max_features': [10, 20, 50, 100]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=1)

We can use the best estimators generated by searching to predict the sign:

In [23]:
# Show the best results
print ("The best model params is", clf.best_params_)
print ("The score of the best model is", clf.best_score_)

# Get the best estimator
bestclf = clf.best_estimator_
# Use the best estimator to predict
y_pred = bestclf.predict(X_test)
print ("The predicted label is", y_pred)
# Calculate the accuracy 
from sklearn.metrics import accuracy_score
print ("The accuracy for this model is", accuracy_score(y_test, y_pred))

The best model params is {'max_features': 10, 'n_estimators': 500}
The score of the best model is 0.26666666666666666
The predicted label is [118 118 118  41  16  51  51  16  51  51  51  51 118  51  51  51  16  51
  51  51  51  51  51  51  51  51  51  51  16  51  41  51  51  16  51  51
  51  51  51  51  51  51  51  51  51  51  51  51  51  51 118  51  16 118
  51  51 118  51 118  51  41  51  51  51  68  51 118  51  51  51  51  51
  51  51  51  16 118  41  16  51  51  51  51  41  51  51  51  16  16  51
  51  51  41  51  16  51 118  51  51 118  16  51  51  51  51  51  51 104
  51  51  51  51  16  51  16  51 118  51  51]
The accuracy for this model is 0.18487394957983194


###  Conclusion
In this part we learn how to use machine learning tools to classify the traffic sign. Next session you will reach the state-of-the-art techniques for the classification problem.