# **Traffic Sign Recognition** 

## P2 Writeup

---

### **Build a Traffic Sign Recognition Project**

The goals / steps of this project are the following:
* Load the data set (see below for links to the project data set)
* Explore, summarize and visualize the data set
* Design, train and test a model architecture
* Use the model to make predictions on new images
* Analyze the softmax probabilities of the new images
* Summarize the results with a written report


[//]: # (Image References)

 <img src="./examples/visualization.jpg" /><p style="text-align: center;"> Visualization </p>
 <img src="./examples/grayscale.jpg" /><p style="text-align: center;"> Grayscaling </p> 
 <img src="./examples/random_noise.jpg " /><p style="text-align: center;"> Random Noise </p> 
 <img src="./examples/placeholder.png " /><p style="text-align: center;"> Traffic Sign 1 </p> 
 
 

## Rubric Points

---
### File included
- jupyter notebook of the code implementation, and pdf version as well
- images visualization and dataset for training, validation, and testing, 
- checkpoints of saved sessions for trained weight and bias, 
- new dataset for extra testing, etc
- accuracy hanging between high 80 to low 90%

Codes and files are uploaded to my github [CarND Project 2 page](https://github.com/chriskcheung/carnd_p2_traffic_sign_classifier/upload)



### Data Set Summary & Exploration

#### 1. Dataset used for training/validation/testing were provided. I used the numpy library to calculate the following summary statistics of the traffic signs data set:

    Number of training examples = 34799
    Number of validation examples = 12630
    Number of testing examples = 12630
    Image data shape = [32, 32, 3]
    Number of classes = 43

#### 2. Visualization of the dataset

Using random number function, I randomly printed out few exploratory visualization of the dataset to illustrate the images to make sure the dataset looked correct.

 <img src="./writeup images/class10.png" /><p style="text-align: center;"> 10 </p>
 <img src="./writeup images/class10gray.png" /><p style="text-align: center;"> 10 in Grayscale </p>



### Design and Test a Model Architecture

#### 1. Preprocessing data

At the beginning, I took the advise to normalized the data by using (x-128)/128. It seemed the dataset doesn't work well with 128. The result was not desirable and logits were far off from matching any class. So I decdied to use x/256 to normailze the training data to between 0 and 1. This worked well as and results started showing some matching images at least. I would explained this in the later section. I could have improve the preprocessing by turning the data to gray scale by squeezing the 3 color channel into 1. But given the result seemed to be acceptable, I stopped doing so. 

#### 2. Model architecture

Taking the advise from the project instructure, I started reusing the CNN model from lab and enhanced it for my final model. Below listed the layers involved in my traning model:

| 1st Layer        		|     Description	        					| 
|:---------------------:|:---------------------------------------------:| 
| Input         		| 32x32x3 RGB image   							| 
| Convolution 5x5     	| 1x1 stride, valid padding, outputs 32x32x6    |
| Add bias              | bias1 size of 6                               |
| RELU					| 												|
| Dropout				| 30%											|
| Max pooling	      	| 2x2 stride,  outputs 14x14x6  				|
| Convolution 5x5	    | 1x1 stride, valid padding, outputs 10x10x16   |
| Add bias              | bias2 size of 16                              |
| RELU					| 												|
| Dropout				| 30%											|
| Max pooling	      	| 2x2 stride,  outputs 5x5x16  		    		|
| Flatten               | from 5x5x6 to outputs 400                     |
| Fully connected		| flat x weight3 + bias3, outputs 120           |
| RELU					| 												|
| Dropout				| 30%											|
| Fully connected		| flat x weight4 + bias4, outputs 84            |
| RELU					| 												|
| Dropout				| 30%											|
| Fully connected(logits)| flat x weight5 + bias5, outputs 43           |
| Softmax cross entropy | 43 classes of probability distribution    	|
| Reduce mean			| 												|
|						|												|
 

#### 3. Training Model

To train the model, I started with the following hyperparameters:

    epochs = 10
    batch_size = 256
    learning rate = 0.1
    conv1 output size = 6
    conv2 output size = 16
    flatten output size = 120
    fullcon1 output size = 400
    fullcon2 output size = 84
    logits output size = 43
    training accuracy = mid 50% 

Without any improvement to the LeNet model from the Lab, the accuracy rised very slow from 0.21ish with less than 10% increase from each epochs and accuracy seemed to saturated around 0.5ish without being able to climb more. 



#### 4. Model architecture approach taken

At the beginning, I was taking the trial and error approach by plugging in different setting from high to low and observed results before fine tuning the model. First, I adjusted both my learning_rate and batch_size by increasing and decereasing the setting. Increasing the setting didn't seem to improve the accurracy but increased the traing time  due to larger dataset to feed to the model. When I decreased the setting, it started to show improvement. As a result, I tuned down the parameters to let the model learn slower with smaller batch and the accurracy started to climb. With only 10 epochs, the accurracy is hanging around mid 80%. I need to increase epochs from 10 to 20 to get enough of data to train the model in order to get to high 80% to 90%. 

    My model setting at the begining is:
    epochs = 20
    batch_size 64
    learning rate = 0.001
    conv1 output size = 6
    conv2 output size = 16
    flatten output size = 120
    fullcon1 output size = 400
    fullcon2 output size = 84
    logits output size = 43
    training accuracy = 87-91% 

As I started working on feeding new images to the model, I noticed that some classes higher chance to be able to classify than the other, which led me to question about the dataset that was provided for training. I used a simple numpy.unique function on the label y_train. Turned out that some classes had multiple times higher sample count than the others. 

    index,counts = np.unique(y_train, return_counts=True)
    print(index, "\n", counts)
    [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
     25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42] 
     [ 180 1980 2010 1260 1770 1650  360 1290 1260 1320 1800 1170 1890 1920  690
      540  360  990 1080  180  300  270  330  450  240 1350  540  210  480  240
      390  690  210  599  360 1080  330  180 1860  270  300  210  210]
  
which means the model was not getting a fair training on some images and may mistaken an image to wrong class. As a result, I went back to the model and added dropout layer to each convulate layer and fully connected layer.

I started with 5% dropout, keep_prob = 0.95, the improvement was not significant. Then I swipe throught the keep_prob from 0.95 down to 0.5 to observe the result. Accurracy started to turn around from its highest point around 30% dropour rate, or keep_prob = 0.7. As a result, I settled to 0.7 keep_prob.

    My model setting at this round is:
    epochs = 20
    batch_size 64
    learning rate = 0.001
    keep_prob = 0.7
    conv1 output size = 6
    conv2 output size = 16
    flatten output size = 120
    fullcon1 output size = 400
    fullcon2 output size = 84
    logits output size = 43
    training accuracy = high 80% to low 90%. 

Another improvement I did was to increase the depth of convulation and fully connected layer. I started with a small depth on each layer as mentioned above, i.e. started at 6 followed by 16, 120, 84, then 43 at outputs. One thing to watch out, as I increased the depth of each layer, the training time increased by exponentially, but accurracy improved by few percentage to low 90% within 10 epochs. It washed out the longer training time as few epochs was needed to reach the 90% accurracy. 

    My model setting at this round is:
    epochs = 2
    batch_size 64
    learning rate = 0.1
    conv1 output size = 64
    conv2 output size = 128
    flatten output size = 3200
    fullcon1 output size = 256
    fullcon2 output size = 84
    logits output size = 43
    traing accuracy = 91.2%

With 91.2% accuracy, it is still not enough to meeting the project requirement. I went back to the preprocessing step and took the advise from the project instruction to add one more step on top of the normalization. I turned the training dateset into grayscale using a function called rgb2gray(), by dot producting the 3 color channels array with a preset array [[0.299], [0.587], [0.144]].

    def rgb2gray(x):
        return np.dot(x, [[0.299], [0.587], [0.144]])

This dot product turned the training dataset from shape of 34799x32x32x3 to 34799x32x32x1.After updating the rest of the model to compliant with the modified shape, the training seemed to work much better, and reached 94.2% accuracy. 

    My final model setting is:
    epochs = 2
    batch_size 64
    learning rate = 0.1
    conv1 output size = 64
    conv2 output size = 128
    flatten output size = 3200
    fullcon1 output size = 256
    fullcon2 output size = 84
    logits output size = 43
    traing accuracy = 94.2%

After hitting 94.2% accuracy, it is time to stop and continue with the remaining project.



### Test Model on New Images

I downloaded additional German traffic signs from the link provided below.

    [] (http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset)

Instead of 5 I downloaded about 100 and picked 20 of them for testing my model. Here are the German traffic signs used:

 <img src="./writeup images/new images.png" />

The first 11 images mixed with low light conditions and shares the same outter shape, color and common letters with one digit in different, which might be difficult to classify from each other. Especially the early version of the model that had not added with grayscaling the image for training, there were only 2 to 3 out of 20 were able to be classified. Accuray was as low as 10%. With grayscaling the tranining dataset and this 20 tests images, it helped brought out the sign even under low light condition which provide the model the advantage to match the image. 

#### Model predictions vs results

Here are the results of the prediction with the final version of my model that had grayscaling support:

    prediction   :  [1  2  5  8  16  13  19  12  7  4  10  14  17  15  18  0  9  11  6  3]
    class image  :  [1, 2, 5, 8, 16, 13, 19, 12, 7, 4, 10, 14, 17, 15, 18, 0, 9, 11, 6, 3]

|	ID	|	Image			        |     Prediction	        					| 
|:-----:|:-------------------------:|:---------------------------------------------:| 
|	 1	| 30 km/h 	    			| 30 km/h 										|
|	 2	| 50 km/h 	    			| 50 km/h 										|
|	 5	| 80 km/h 	    			| 80 km/h 										|
|	 8	| 120 km/h 	    			| 120 km/h 										|
|	16	| Vehicles over 3.5 tons prohibited	| Vehicles over 3.5 tons prohibited		|
|	13	| Yield						| Yield											|
|	19	| Dangerous curve on the left	| Dangerous curve on the left				|
|	12	| Priority Road   			| Prority Road 									|
|	 7	| 100 km/h	    	  		| 100 km/h						 				|
|	 4	| 70 km/h 	    			| 70 km/h 										|
|	10	| No passing 3.5 tons vehicle	| No passing 3.5 tons vehicle				|
|	14	| Stop Sign					| Stop sign   									| 
|	17	| No entry 	    			| No entry 										|
|	15	| No vehicle				| No vehicle   									| 
|	18	| General caution			| General caution      							|
|	 0	| 20 km/h					| 20km/h      									|
|	 9	| No passing				| No passing      								|
|	11	| Right-of-way next intersection| Right-of-way next intersection			|
|	 6	| End of 80km/h				| End of 80km/h      							|
|	 3	| 60 km/h					| 60 km/h 		     							|

The model was able to correctly guess all 20 traffic signs, which gives an accuracy of 100%. 


#### Prediction on each of the five new images by softmax probabilities

Provide the top 5 softmax probabilities for each image along with the sign type of each probability. 
The code for making predictions on my final model is located in the 11th cell of the Ipython notebook.

For the first image, the model is relatively sure that this is a 80 km/h speed limit sign (probability of 2.576e-12 in scale of 43 classes). The top five soft max probabilities were

| Probability         	|     Prediction	        					| 
|:---------------------:|:---------------------------------------------:| 
| 2.47635181e-12    	| 80 km/h   									| 
| 1.64791090e-14    	| 60 km/h 										|
| 4.87679537e-16		| 50 km/h										|
| 4.27841437e-17		| No passing for 3.5 tons vehicle				|
| 1.39092786e-17	    | 120 km/h      								|


For the second image, the model is relatively sure that this is a Vehicles over 3.5 tons prohibited speed limit sign (probability of 2.576e-12 in scale of 43 classes). The top five soft max probabilities were

| Probability         	|     Prediction	        					| 
|:---------------------:|:---------------------------------------------:| 
| 6.15605078e-09		| Vehicles over 3.5 tons prohibited   			|
| 6.51597300e-16    	| 80 km/h 										|
| 3.04514538e-16		| 50 km/h										|
| 1.26701716e-16		| Roundabout mandatory							|
| 8.67156994e-17		| 60 km/h      									|


For the thrid image, the model is relatively sure that this is a Yield sign (probability of 6.18758433e-10 in scale of 43 classes). The top five soft max probabilities were

| Probability         	|     Prediction	        					| 
|:---------------------:|:---------------------------------------------:| 
| 6.18758433e-10		| Yield   										| 
| 5.17210328e-17		| No vehicle									|
| 5.10849475e-17    	| Ahead only									|
| 2.80259909e-17		| Priority road									|
| 1.62817186e-17 	    | Keep left      								|


For the fourth image, the model is relatively sure that this is a No vehicle sign (probability of 1 in scale of 43 classes). The top five soft max probabilities were

| Probability         	|     Prediction	        					| 
|:---------------------:|:---------------------------------------------:| 
| 1						| No vehicle					 				|
| 5.54273905e-10 	    | 50 km/h       								|
| 3.91470253e-15		| No passing									|
| 1.56490999e-15 	    | Keep right      								|
| 6.50165867e-16		| Priority road									|


For the fifth image, the model is relatively sure that this is a 80 km/h speed limit sign (probability of 4.53066438e-14 in scale of 43 classes). The top five soft max probabilities were

| Probability         	|     Prediction	        					| 
|:---------------------:|:---------------------------------------------:| 
| 4.53066438e-14		| 30 km/h										|
| 3.39941612e-16 	    | 50 km/h      									|
| 1.51703826e-16		| 20 km/h										|
| 3.69844595e-17 	    | Keep right      								|
| 2.00030834e-17		| 70 km/h										|

          
### (Optional) Visualizing the Neural Network (See Step 4 of the Ipython notebook for more details)
####1. Discuss the visual output of your trained network's feature maps. What characteristics did the neural network use to make classifications?