# **Traffic Sign Recognition** 

## Writeup

### You can use this file as a template for your writeup if you want to submit it as a markdown file, but feel free to use some other method and submit a pdf if you prefer.

---

**Build a Traffic Sign Recognition Project**

The goals / steps of this project are the following:
* Load the data set (see below for links to the project data set)
* Explore, summarize and visualize the data set
* Design, train and test a model architecture
* Use the model to make predictions on new images
* Analyze the softmax probabilities of the new images
* Summarize the results with a written report


[//]: # (Image References)

[image1]: ./examples/visualization.jpg "Visualization"
[image2]: ./examples/grayscale.jpg "Grayscaling"
[image3]: ./examples/random_noise.jpg "Random Noise"
[image4]: ./examples/placeholder.png "Traffic Sign 1"
[image5]: ./examples/placeholder.png "Traffic Sign 2"
[image6]: ./examples/placeholder.png "Traffic Sign 3"
[image7]: ./examples/placeholder.png "Traffic Sign 4"
[image8]: ./examples/placeholder.png "Traffic Sign 5"

[d_exploration]: ./report-images/dataset_exploration.png "Dataset Exploration"
[d_grey]: ./report-images/dataset_greyscale.png "Dataset Greyscale"
[d_traslation]: ./report-images/dataset_traslation.png "Dataset Traslation"
[d_rotation]: ./report-images/dataset_rotation.png "Dataset Rotation"
[d_blur]: ./report-images/dataset_blurred.png "Dataset Blur"
[d_equalization]: ./report-images/dataset_equalization.png "Dataset Equalization"
[d_distribution]: ./report-images/distribution_of_classes.png "Distribution of Classes"
[predictions]: ./report-images/predictions.jpg "Downloaded signs predictions"
[featuremap]: ./report-images/feature_map.jpg "Feature maps"

[sign1]: ./extra-signs/1.jpg "Speed limit 30km/h"
[sign2]: ./extra-signs/9.jpg "No passing"
[sign3]: ./extra-signs/12a.jpg "Priority road"
[sign4]: ./extra-signs/14.jpg "Stop"
[sign5]: ./extra-signs/23a.jpg "Slippery road"
[sign6]: ./extra-signs/28a.jpg "Children crossing"
[sign7]: ./extra-signs/4.jpg "Speed limit 70km/h"


## Rubric Points
### Here I will consider the [rubric points](https://review.udacity.com/#!/rubrics/481/view) individually and describe how I addressed each point in my implementation.  

---
### Writeup / README

You're reading it! and here is a link to my [project code](https://github.com/MC-8/SDC-P2-traffic-signs/blob/master/Traffic_Sign_Classifier.ipynb)

### Data Set Summary & Exploration

I used the pandas library to calculate summary statistics of the traffic
signs data set:

* The size of training set is 34799
* The size of the validation set is 4410
* The size of test set is 12630
* The shape of a traffic sign image is (32, 32, 3)
* The number of unique classes/labels in the data set is 43

Here is an exploratory visualization of the data set. The following image displays a sample of images present in the dataset. It can be noticed that the pictures, despite having the same resolution, have different quality. Some of them are easy (for a human) to identify while require a bit of eye-squeezing and imagination, due to lighting conditions, blur, etc.
![alt text][d_exploration]

The second image shows a normalized distribution of the classes in the three dataset. 
![alt text][d_distribution]

It can be noted that training, validation and test datasets have very similar distributions, which is a positive characteristic as the training, validation and set will be done under similar conditions. However, the low-representation of some classes may lead to less precise results. Data augmentation will is in this report employed to increase the number of images in the training set, which should increase the accuracy of our neural network.


### Design and Test a Model Architecture

When using images to train a neural network, there are at least a couple of consideration to do.
First, it is desirable to have quality images. Since this is not possible in the real world, it is more desirable to have a network that can recognize traffic signs even if the conditions are not so good. For examples, lighting conditions, blur caused by speed or fog,  and different viewing angles have all an significant impact on how well can we recognize a sign. 
Second, to facilitate the learning process, it may be helpful to reduce the complexity of the images in order for the network to discern important features and not be "distracted" by unnecessary features. For example, even if the color of traffic signs are distinctive of their tipology, is very well possible to correctly classify a sign even when the images is in greyscale.

In practice, converting images to greyscale achieves slight better accuracy. 
Here is an example of what the dataset looks like after applying a greyscale transformation.

![alt text][d_grey]

Due to the low amount of samples for some classes in the training set, I've decided to augment the dataset with random copies of images in the training dataset, but modified with some image processing algorithm. This way, an image with applied some digital filtering, is effectively a new image that can improve our network accuracy because it can add another "real-world" condition for our traffic sign.

Some image processing techniques (in addition to greyscale) that were considered (and are not limited to):
* Blurring
* Rotation
* Traslation

Here are example of the database after blur, rotation and translation are applied to random images in the dataset.

__Dataset sample with blurred images__
![alt text][d_blur]
A gaussian filter with random size beween 1 and 5 is applied to random images, and the result appended to the dataset.

__Dataset sample with rotated images__
![alt text][d_rotation]
A random rotation between -15 and 15 degrees is applied to random images, and the result appended to the dataset.

__Dataset sample with traslated images__
![alt text][d_traslation]
A random traslation up to 8 pixels in any direction is applied to random images, and the result appended to the dataset.

There are many other image processing techniques, such as warping and image flipping, that can be used. Since I obtained good results with the few techniques described, I decided to not implement more.

Additionally, I've experimented with histogram equalization, which is a technique that makes lighting and colors of the images more even across the dataset.
An example of an equalized dataset looks like this:

![alt text][d_equalization]

This was to test my assumption that images that an even dataset (from lighting point of view) reduces the number of features that the network should detect (a dark sign is the same as a bright sign). In practice, even if there is a slight improvement, I have not found significant gain over the training performed with the greyscale dataset.

The last step before feeding images to our network is to normalize the image data. Each pixel per channel has normally a value in [0, 255] but in order to improve the numerical accuracy of the learning algorithm, it is good practice to normalize data using a floating point reprensetation that maps the integer [0, 255] range to a floating point range of [-1, 1]. 


#### Network architecture

My final model consisted of the following layers:

| Layer         		|     Description	        					| 
|:---------------------:|:---------------------------------------------:| 
| Input         		| 32x32x3 RGB image   							| 
| Convolution 3x3     	| 1x1 stride, same padding, outputs 32x32x8 	|
| RELU					|												|
| Convolution 3x3     	| 1x1 stride, same padding, outputs 32x32x32 	|
| RELU					|												|
| Convolution 3x3     	| 1x1 stride, same padding, outputs 32x32x64 	|
| RELU					|												|
| Max pooling	      	| 2x2 stride,  outputs 16x16x64 				|
| Fully connected		| outputs 256        							|
| Dropout				| Keep probability = 90%						|
| Fully connected		| outputs 128        							|
| Fully connected		| outputs 64        							|
| Classifier			| 43 classes        							|
 
#### Model training

To train the model, I shuffled the image/label pairs, set a number of epochs of 50, and a batch size of 100.
An Adam optimizer with learning rate of 0.001 is a typical choice. The Adam optimizer is a more complex optimizer than the classic gradient descent, and it is well known in literature to produce good results in shorter time with respect to other optimizers. I experimented different learning rates and batch sizes, and this combination of hyperparameters achieved very good results.

#### Approach used to achieve the target validation accuracy

My final model results were:
* validation set accuracy of __98.2%__
* test set accuracy of __95.3%__

This is well above the assignment target of 93% and while it is not acceptable for real-world applications, it is a satisfactory result obtained at this stage while I am learnig several concepts in neural networks.

If an iterative approach was chosen:
The LeNet network was chosen as a starting point because it already accepts 32x32 and achieves a decent accuracy (88%) with traffic signs (even if it was not designed for that). LeNet works well with digits, but traffic signs have many more features than hand written digits, hence it was decided to adapt this network architecture, rather than build a network from scratch.
I tried several approaches:
* Increasing the number/size of convolutional layers
* Increasing the number/size of fully connected layers
* Add drop-out layers

Convolutional layers are very important when images are spatially correlated: traffic signs have many straight lines curves and there is a high correlation across adjacent pixels.
The use of drop-out layers helps to robustify the network and reduce overfitting.
98% of validation accuracy is a fairly good result for this exercise, definitely not for real-world applications, but from there, it is challenging to improve the accuracy by simply adding/reducing layers. Not only that, it was found that data augmentation was pivotal for increasing the accuracy above 93%: improving the quality/quantity/variety of the training set should boost the accuracy as well.
The test accuracy of 95.3% proves that the model is not overfitting, or at least it is robust enough to handle new data.
This will be further demostrated when testing the model on new images, as described in the next section.
 

### Test a Model on New Images

Here are seven traffic signs that I found on the web:

![alt text][sign1] ![alt text][sign2] ![alt text][sign3] 
![alt text][sign4] ![alt text][sign5] ![alt text][sign6] ![alt text][sign7]

The second image might be difficult to classify because there are two traffic signs in the same image. All the images were resized to 32x32 pixels and saved in jpg format. These images were fed to the network, which would classify them and calculate the accuracy based on labels I manually assigned them.

Here are the results of the prediction, where the second image was labelled as "No passing" because that is the sign closer to the observer point of view:

| Image			        |     Prediction	        					| 
|:---------------------:|:---------------------------------------------:| 
| Speed Limit 30km/h      		| Speed Limit 30km/h					| 
| No passing     			| No passing				|
| Priority road					|Priority road					|
| Stop	      		| Stop	      				|
| Slippery Road			| Slippery Road			|
| Children crossing			| Children crossing				|
| Speed Limit 70km/h			| Speed Limit 70km/h				|


The model was able to correctly guess 7 of the 7 traffic signs, which gives an accuracy of 100%. This is an ideal situation, but running the validation on this data set multiple times produced oscillating results between 5/7 and 7/7 images classified correctly. Some of the signs, in fact, are not always correctly classified. The speed limit 30km/h is sometimes classified as 70km/h speed limit, and the slippery road is sometimes classified as bycicle crossing.

The code for making predictions on my final model is located at the end of Step 3 in the Python notebook.
A visualization of the softmax probabilities is in the following image:
![alt text][predictions]

For all images except the first one the network is quite sure of its prediction (the numerical probability >99.999%).
For the 30km/s limit there is, as discussed earlier, significant chance (about 15%) that the sign is a 70km/h for the network. This may cause the autonomous car to get quite a speeding ticket!

### Visualizing the Neural Network
The feature map of the three convolutional network is shown in the following picture. From top to bottom the first to the third convolutional layers feature maps are shown.

![alt text][featuremap]

It can be noted that from the first layer, the distinctive features of the traffic signs are immediately picked up, like the round shape, the inner and outer edges, and the numbers 3 and 0. The second layer add a bit more details on the sign, while the last layer looks like a sparse set of random pixels.
Normally I would have expected the high level features to appear on later layers, but perhaps the traffic signs are actually simple enough for the network to pick up all the relevant features within the first two layers. Indeed, removing the last layer achieved lower accuracy, so there may be patterns in the 30km/h traffic sign that we, mere humans, cannot comprehend.


