# Project - Traffic sign classifier
Documentation and implementation by Kevin Hubert.

## Intro
This project is my implementation of the traffic-sign-classifier project during the 
Nanodegree program "Self-Driving car engineer" at Udacity.com.

The project contains the following part to fulfill all requirments:
* Load/Visualize provided data set
* Generate additional data for types with least data
* Visualize the distribution after data-generation
* Normalize the given and generated data
* Design and train a CNN-model architecture
* Evaluate the trained model with validation-/test-data
* Make predictions with the trained model for additional images (new images)
* Analyze the top-n predicted probabilities of the additional images
* Create a writeup and visualize results


## Load/Visualize provided data set
A basic dataset of 43 different types of traffic sign images was provided. All of those image have a size of 32x32 pixels colored in RGB-colorspace. A basic overview of some images:
![Examples of the traffic sign dataset](./figures/traffic_sign_dataset_example.png)

As you can see in the image-list above, the lightning conditions and even the entire quality of the images are various.
After creating a visualation about the distribution of the traffic-signs images, i was able to see the irregular amount of traffic signs images for each type:
![Traffic sign types by type (histogram) without generated data](./figures/histogram_without_generated_data.png)

Escpecially the types: 1, 18, 26, 28, 38 have comparably few images (~200)
On the other hand other types have huge amount of provided images like 2, 3, 11, 12, 37 which have >1800.
This can lead to different problems during the training and evaluation of the network, because the network will (may) more likely identify images as thoose types, which have relatively large amount of example.

This problem is due to the fact that the types with particularly many data sets, which correspond to the same types, are learned more diversely.
On the other hand, the types with relatively few examples are too rarely recognized because too few different representations are learned in addition to the known few example images.

## Generate additional data
As described in the previous chapter it is necessary to add additional data for those types, which has low amount of examples. To solve the problem it could be possible, to get more data, which will take a lot of time or otherwise use the exists image to generate more examples. I've decided to generate more data based on the given once to be able to complete the project in time.

Therefor i've written a function, which takes in an image and rotates/zoom it as given in parameters.
Because the rotation of a image creates black-pixels at the corners (there where not pixels are given) i'll crop x-pixels at the edges of the image and resize it to the required size of 32x32 pixel afterward. 
An visualize example can be seen here:
![Visualization of new generated traffic sign images based on existing one](./figures/variation_generation_visualized.png)

After finishing the function to create more variants of an image i've written a function to get the type with lowest amount of image and iterate through all of the given images to create additional images within the function described.
In the end the list of new generated images is appended to the dataset. At the moment i do this 10times. This may could be done even more then 10 times but already lead to good results in my case. Moreover i recommend to not do this too often, because my function would also use the generated data from previous iteration which will lead to just more rotate/zoomed images.

Within this functionality i've increased the amount of datasets from 35339 to 41189 images.
This means i've added +5850 ~16.5% of generate images. 

The histogram with generated data:
![Traffic sign types by type (histogram) with generated data](./figures/histogram_with_generated_data.png)


## Image preparation/normalization

In the first try i've just converted the images from RGB to grayscale, because this will reduce the amount of required neurons and so reduce the time to train the network. Moreover researched showed, that traffic-signs are more likely do be identified based on the given shapes instead of the colors. Last but not least due to the different lightning conditions, the saturation of the colors for the same type of traffic sign appears very variant.

After converting all images to grayscale i've decided to normalize the value-range from 0-255 (grayscale) to -1.0 - +1.0  (input for a neural network should be in the range from -1.0 - +1.0) using the following formula:
x = (x - 128) / 128

Here are some examples how the values are before vs after.

|Before|After      |
|:----:|:---------:|
|255   |0.9921875  |
|192   |0.5        |
|128   |0          |
|64    |-0.5       |
|0     |-1         |

In the next step i've used the Histogram equalization to increase the contrast of the image by spreading the most frequent intensity values. A quite good description about the histogram equalization can be found here: https://towardsdatascience.com/histogram-equalization-5d1013626e64

Here are some example using the normalization described above:
![Traffic sign images normalized simple](./figures/simple_image_normalization.png)


During more research about image preparation and normalization to improve the accurancy of my CNN i've discovered a stackoverflow post about the so called mean-substraction and standard-derivation-dividing.
In the first step each pixel in the image is subtracted by the mean pixel value of the grayscaled image in the next step this value is now divided by the standard-derivation of the mean-subtracted-pixels of the images.


In code i looks like this:

```python
def normalization_image_improved(rgb_image):
    grayscaled = cv2.cvtColor(rgb_image, cv2.COLOR_RGB2GRAY)
    mean_substracted = grayscaled - np.mean(grayscaled)
    return (mean_substracted / np.std(mean_substracted)).reshape(32,32,1)
```

Stackoverflow-Thread: https://stackoverflow.com/questions/45301648/normalize-the-validation-set-for-a-neural-network-in-keras

Using this normalization my accurancy of the network was improved by >5%.

The result of the normalization function is shown below:
> Keep in mind, that the mean subtraction and std-derivation-dividing does not affect the image-representation it converted they image-pixel-values (grayscaled 0-255) which will look equally when visualized using pyplot.

![Traffic sign images normalized within std-derivation and meansubstract](./figures/advanced_image_normalization.png)



## CNN Model architecture

I've used the following hyper-parameters during my model-training:
> EPOCHS = 25
>
> BATCH_SIZE = 64
>
> LEARN_RATE = 0.0006

By using a higher learn-rate i was able to reduce the amount of epochs but also not reached a accurancy with a value +99% for the training-data.

To train the model which is done by adjusting the weights/biases, i've choose the cross-entropy reduction using the stochastic gradient descent. For my sake the cross-entropy-calculation (based on the delta of the predicted vs correct result) is already part of the tensorflow library. As an optimizer i've used the AdamOptimizer which is a extension of the well known stochastic gradient descent which works especially good for deep-neural-networks.
More about the AdamOptimizer can be found here: https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/

Using the hyperparameters shown above my model reaches the following accurancies (+/- 0.02)
> Finished EPOCH 25/25 ...
>
> Validation Accuracy = 0.970
>
> Training Accuracy   = 0.996

As you can see there is a delta of 2.6 (99.6 - 97.0) this may could be reduced by increase the dropouts of the CNN.
On the other hand this may also reduces the training-accurancy. 

My model consisted of the following layers:

|Step| Layer         		|Description	        					              | 
|:--:|:--------------------:|:--------------------------------------------------------:| 
| 1  | Input         		|32x32x1 Grayscaled/Normalized image			           | 
| 2  | Convolution 5x5     	|1x1 stride, valid padding, outputs 28x28x8 	           |
| 3  | RELU (activation) 	|											               |
| 4  | Max pooling	      	|2x2 stride, outputs 14x14x8  				               |
| 5  | Convolution 5x5     	|1x1 stride, valid padding, outputs 10x10x24	           |
| 6  | RELU (activation) 	|											               |
| 7  | Max pooling	      	|2x2 stride, outputs 5x5x24   				               |
| 8  | Convolution 1x1     	|1x1 stride, valid padding, outputs 5x5x16. Reduces depth  |
| 9  | RELU (activation) 	|											               |
| 10 | Flatten    	      	|5x5x16 will be flatted to 400 		    	               |
| 11 | Dropout              | Only for training (0.6 keep probablity)                  |
| 12 | Full connected layer |In 400 Out 160								               |
| 13 | RELU		            | 									                       |
| 14 | Dropout				|Only for training (0.6 keep probablity)                   |
| 15 | Full connected layer |In 160 Out 129								               |
| 16 | RELU		            | 									                       |
| 17 | Full connected layer |In 129 Out 43  								           |
| 18 | Softmax              | Softmax                                                  |

Next to the table-oriented representation i've created a visualization within the online platform Draw.io
Please keep in mind, that the Diagram does not contains the activation-function and does also not contain the dropouts as described in the table above.
![CNN layers visualized](./figures/CNN_Architecture.png)

After each epoch i've evaluate the CNN within the training- and the validation-data. By comparing the values of accurancy it is more easy to see how well the model learns and even if the curves are close to each other or have a big delta (could be a sign for overfitting). While training my network i've viewed this values and added the 2 dropouts you can see in the model-architecture above to avoid overfitting.

The curve of accurancy for both the training- and validation-data:
![Learn rate for train and validation data](./figures/learning_rate.png)

My final accurancy for the test-data-set was:
test_accurancy >94.9%

## Additional traffic sign images
In the next step i've added 6 additional images of traffic signs from the internet.
The image sources are the following:
* Speed limit (30km/h) = https://media0.faz.net/ppmedia/aktuell/3322679104/1.6434912/mmobject-still_full/verkehrsberuhigt-tagsueber.jpg
* Slippery road - https://c8.alamy.com/comp/CF48A4/a-rural-slippery-road-sign-with-a-snow-covered-road-in-the-background-CF48A4.jpg
* Go straight or right - https://www.rhinocarhire.com/CorporateSite/media/Drive-Smart/Road-Signs/Mandatory-Signs/Germany-Mandatory-Sign-Driving-straight-ahead-or-turning-right-mandatory.png
* Road work - https://media.istockphoto.com/photos/german-road-sign-for-construction-works-picture-id532189779?k=6&m=532189779&s=612x612&w=0&h=iWNSAFHYi1CNFtDkLpgWEDWWK06viBf9gTEl5yWB_bo=
* Keep right - https://upload.wikimedia.org/wikipedia/commons/thumb/4/41/Keep_right_Portugal_20100107.jpg/170px-Keep_right_Portugal_20100107.jpg
* Stop - http://www.ilankelman.org/stopsigns/germany.jpg

In may case the name of the image represent the specific type. For sure this won't worked when adding mutliple images of the same type but this was not required in this case.
The added images can be seen here:
![Additional traffic signs](./figures/additional_traffic_signs.png)

After importing the images i've normalized them the same way i've normalized the image for the CNN trained above
When the normalization was done:
![Additional traffic signs normalized](./figures/addition_traffic_signs_normalized.png)

The accurancy for these image was 1.0 which means 100% were corrently identified.

In the last step i've processed the images using the the tensorflow.nn.top_k function which allows me to visualize the top_n predicted outputs of the CNN for the specific image.

This is especially interesting, because that way we can see which image was predicted with nearly 100% correctly or for which image the network was not absolutly sure. Moreover if the network missclassifies an image, it will visualize which types were predicted and how accurate.

The prediction for my chosen image are shown below:

![Top 5 prediction for additional traffic signs](./figures/top_n_prediction_for_additional_trafficsigns_barchart.png)


## Possible improvments

1.) For sure the image i've found on the internet for the "Additional traffic sign images"-chapter were quite simple to identify because they were front-faced-captured and have good lightning conditions. It could be interesting to make own traffic sign images and let the network predict the images. This could show if the CNN even works good when it dark/raining etc.

2.) Additional training data may improve the CNN training even more. As you can see in the accurancy shown above the following values were reached:

> Training: ~99.5%
>
> Validation: ~97.0%
>
> Test: ~94.9%

I think this values can even be increase, when the training-dataset contains more image especially for those types were i've generated additional data. Because the generated data helps improving network for sure but more image within different lightning conditions may even increase the diversity.

3.) Higher resolution. All provided images have a resolution of 32x32 pixels. Even my created network will only work within 32x32 images. I can imagine that a higher resolution may lead to better results. On the other hand a higher resolution would dramatically increase the required training-time and even the prediction would take more time.

4.) Increase dropout to reduce the delta between the accurancy for the training-/validation-data as described before.

5.) Combine multiple CNNs. May it could be clever to combine multiple networks to reduce the wrong-predictions. An example could be to create a network just for type-groups (e.g. Speed-Limit-Sign) and then use the already created network for the detailed prediction. Using a combination it may could indicate discrepancies when e.g. the first networks says type "Stop" and the detailed network says "Speed limit 130"
