# **Traffic Sign Recognition** 

## Writeup

---

The goals / steps of this project are the following:
* Load the data set (see below for links to the project data set)
* Explore, summarize and visualize the data set
* Design, train and test a model architecture
* Use the model to make predictions on new images
* Analyze the softmax probabilities of the new images
* Summarize the results with a written report


[//]: # (Image References)



## Rubric Points
### Here I will consider the [rubric points](https://review.udacity.com/#!/rubrics/481/view) individually and describe how I addressed each point in my implementation.  

---
### Writeup / README

#### 1. Provide a Writeup / README that includes all the rubric points and how you addressed each one. You can submit your writeup as markdown or pdf. You can use this template as a guide for writing the report. The submission includes the project code.

You're reading it! and here is a link to my [project code](https://github.com/GiorGio82/P3-Traffic-Sign-Classifier/blob/master/Traffic_Sign_Classifier.ipynb)

 <a name="15sings"> <img src="./writeup_images/Some_example_sings_3x5.png" alt="drawing" style="width:500px;"/> </a>

### Data Set Summary & Exploration

#### 1. Provide a basic summary of the data set. In the code, the analysis should be done using python, numpy and/or pandas methods rather than hardcoding results manually.

I used the pandas library to calculate summary statistics of the traffic
signs data set:

* The size of training set is 34799 RGB images 
* The size of the validation set is 4410 images
* The size of test set is 12630
* The shape of a traffic sign image is squared with dimention 32x32 and it has 3 channels (color images)
* The number of unique classes/labels in the data set is 43

#### 2. Include an exploratory visualization of the dataset.

A simple exploratory visualization of the data set uses an horizontal bar chart. Each plot shows the distrbution of classes, sorted by class sizes. The different plots indicate that the training data set is relatively unbalanced with some classes counting as little as 180 images while the class with most of the images counts 2010 images. All plots show the same unbalance for the other datasets (validation and testing) and all plot show a similar distribution.
<a name="train_orig_distrib"> <img src="./writeup_images/Training_data_distrib.png" alt="drawing" style="width:800px;"/> </a>

| Validation dataset distrib. | Test dataset distrib.       |
|:---------------------------:|:---------------------------:|
|<a name="valid_distrib"> <img src="./writeup_images/Validation_data_distrib.png" alt="drawing"/> </a>| <a name="test_distrib"> <img src="./writeup_images/Test_data_distrib.png" alt="drawing"/> </a>|


### Design and Test a Model Architecture

#### 1. Describe how you preprocessed the image data. What techniques were chosen and why did you choose these techniques? Consider including images showing the output of each preprocessing technique. Pre-processing refers to techniques such as converting to grayscale, normalization, etc. (OPTIONAL: As described in the "Stand Out Suggestions" part of the rubric, if you generated additional data for training, describe why you decided to generate additional data, how you generated the data, and provide example images of the additional data. Then describe the characteristics of the augmented training set like number of images in the set, number of images for each class, etc.)

As a first step, I decided to read the original article from [LeCun](http://yann.lecun.com/exdb/publis/pdf/sermanet-ijcnn-11.pdf) because to me converting traffic signs grayscale sounded counterintuitive. The authors confirms my initial doubts but show empirically why grayscale images boost the model performance. To convert the datasets to grayscale I used the function `cv.cvtColor(image, cv.COLOR_BGR2GRAY)` from opencv that was already used in previous projects. An example of transformation from RGB to grayscale is shown in [Fig.N](#preprocessing_one_image). 

As learned in the classroom, pixel normalisation and centering around zero are essential pre-processing steps to help the model to learn. Normalisation makes sure that all the features of the dataset have similar range, while centering helps the optimiser. I tested different approaches for normalisation and centering and obrained the best results with a 'global normalisation'. That is, first I calculated both the mean and std of the entire dataset and then I subtracted the pixels by the mean divided by the std. The centring and normalisation process is done by the function `normalize_input`.

The next step is applying histogram equalization to all images to improve their contrast and visibility. To do so I have used the openCV class [CLAHE](https://docs.opencv.org/master/d5/daf/tutorial_py_histogram_equalization.html). The [figure XX](#clahe_effect) shows a (color) image that is particularly dark next to the grayscale version and the same image after applying histogram equalization filter. The function used to convert to grayscale and to apply histogram equalization to an image dataset is `applyEqualizationAndGray`.

<a name="clahe_effect"> <img src="./writeup_images/histogram_equalization_image_12358.png" alt="drawing" style="width:400px;"/> </a>

I decided to generate additional data because the performance (validation and testing accuracy) that obtained with the original training set were below the target threshold of 93% and the results were showing overfitting behaviour (training accuracy higher than validation and testing accuracy). [Overfitting can be contrasted in different ways](https://towardsdatascience.com/deep-learning-3-more-on-cnns-handling-overfitting-2bd5d99abe5d) such as data augmentation and introducing dropout. My first thought was to augment the traning data to balance the distribution of the dataset so that all classes have equal size (equal to the largest class of the original set). Then, when the dataset is balanced, I can use the same filters to augment the entire dataset by a certain percent. To do so I explored different filters and preprocessing functions from the [imaug](https://github.com/aleju/imgaug) and from openCV libraries. I also made sure that the new images introduced to the datasets were not a duplicate of an existing image.

To find out what preprocessing filters I should use, I tested different filters combinations from the many available, generated new datasets of different sizes and run the model again. The function `augment` takes 5 inputs: (1) a table that indicates what classes need to be balanced (depending on their original size), (2) the dataset to augment, (3) the corresponding label array, (4) the sequence of filters to apply and (5) the augmentation factor in percent, where 0 means 'only balancing the classes, no additional augmenting' and 100 means 'balance all classes and augment by 100%, that is double the size of each class'. For example with augment factor = 100, a class with 180 images (the smallest class), gets balanced to the largest class first (that has 2010 images) and then it is also doubled so that the total number of unique images of the class will be 4020. With augment factor 100, all classes will have 4020 images as shown in the image below:

<a name="train_balanced_distrib"> <img src="./writeup_images/Training_balanced_aug_factor100_data_distrib.png" alt="drawing" style="width:800px;"/> <figcaption> Fig.XX - Effect of histogram equalization on a dark image.</figcaption></a>


The  `augment` function randomly selects a simple combination of one or two filters to apply to each image, it applies it and then it checks if the new processed-image already exists in the dataset, if it already exists, the image is discarded and a new image is generated. I tried several different filter combinations and I observed that applying filters that altered the image "too much" resulted in poor learning performance. For example filters suchas MedianBlur, GaussianBlur, AverageBlur or AdditiveGaussianNoise altered the images so much that the model performed very poorly. Also I decided not to flip the images. The final set of filters is shown in the code snipped below, it is very simple but it result in good performance (more than 96% of accuracy on the testing dataset):

```
iaa.SomeOf((1, 2),
    [
        # crop images by -10% to 10% of their height/width
        sometimes(iaa.CropAndPad(
            percent=(-0.1, 0.1),
            pad_cval=(0, 255)
        )),
        sometimes(iaa.Affine(
            # scale between 80% and 120% of the image (per axis)
            scale={"x": (0.8, 1.2), "y": (0.8, 1.2)},
            # translate by -20 to +20 percent (per axis)
            translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)}, 
            # rotate by -45 to +45 degrees
            rotate=(-45, 45), 
            # shear by -20 to +20 degrees
            shear=(-20, 20),  
        )),
    ],
    random_order=True
)
```

The [figure](#preprocessing_one_image) shows one picture form the training next to its grayscale version. The remaining 7 images are  copies of the grayscale image where one specific filter was applied. The first row from the top shows the unprocessed image, the same image in grayscale and the image scaled down by 50%. The first image on the left of the middle row is translated by 20% (on both axes), the second image is rotated by 45 degrees and the third is sheard by 20 degrees. The first figure on the bottom row is cropped and padded, the second shows the CLAHE histogram equilization and the last image is centered and normalised (to plot the image the distribution is shifte from [-1,1] to [0,1] with 0.5 mean)

<a name="preprocessing_one_image">  <img src="./writeup_images/Preprocessing_img_553.png" alt="drawing" style="width:600px;"/> <figcaption> Fig.N - Pre-processing techniques used to augment the training dataset.</figcaption> </a>


The difference between the original data set and the augmented data set is the following ... 


#### 2. Describe what your final model architecture looks like including model type, layers, layer sizes, connectivity, etc.) Consider including a diagram and/or table describing the final model.

My final model consisted of the following layers:

| Layer         		|     Description	        					| 
|:---------------------:|:---------------------------------------------:| 
| Input         		| 32x32x3 RGB image   							| 
| Convolution 3x3     	| 1x1 stride, same padding, outputs 32x32x64 	|
| RELU					|												|
| Max pooling	      	| 2x2 stride,  outputs 16x16x64 				|
| Convolution 3x3	    | etc.      									|
| Fully connected		| etc.        									|
| Softmax				| etc.        									|
|						|												|
|						|												|
 


#### 3. Describe how you trained your model. The discussion can include the type of optimizer, the batch size, number of epochs and any hyperparameters such as learning rate.

To train the model, I used an ....

#### 4. Describe the approach taken for finding a solution and getting the validation set accuracy to be at least 0.93. Include in the discussion the results on the training, validation and test sets and where in the code these were calculated. Your approach may have been an iterative process, in which case, outline the steps you took to get to the final solution and why you chose those steps. Perhaps your solution involved an already well known implementation or architecture. In this case, discuss why you think the architecture is suitable for the current problem.

My final model results were:
* training set accuracy of ?
* validation set accuracy of ? 
* test set accuracy of ?

If an iterative approach was chosen:
* What was the first architecture that was tried and why was it chosen?
* What were some problems with the initial architecture?
* How was the architecture adjusted and why was it adjusted? Typical adjustments could include choosing a different model architecture, adding or taking away layers (pooling, dropout, convolution, etc), using an activation function or changing the activation function. One common justification for adjusting an architecture would be due to overfitting or underfitting. A high accuracy on the training set but low accuracy on the validation set indicates over fitting; a low accuracy on both sets indicates under fitting.
* Which parameters were tuned? How were they adjusted and why?
* What are some of the important design choices and why were they chosen? For example, why might a convolution layer work well with this problem? How might a dropout layer help with creating a successful model?

If a well known architecture was chosen:
* What architecture was chosen?
* Why did you believe it would be relevant to the traffic sign application?
* How does the final model's accuracy on the training, validation and test set provide evidence that the model is working well?
 

### Test a Model on New Images

#### 1. Choose five German traffic signs found on the web and provide them in the report. For each image, discuss what quality or qualities might be difficult to classify.

Here are five German traffic signs that I found on the web:

![alt text][image4] ![alt text][image5] ![alt text][image6] 
![alt text][image7] ![alt text][image8]

The first image might be difficult to classify because ...

#### 2. Discuss the model's predictions on these new traffic signs and compare the results to predicting on the test set. At a minimum, discuss what the predictions were, the accuracy on these new predictions, and compare the accuracy to the accuracy on the test set (OPTIONAL: Discuss the results in more detail as described in the "Stand Out Suggestions" part of the rubric).

Here are the results of the prediction:

| Image			        |     Prediction	        					| 
|:---------------------:|:---------------------------------------------:| 
| Stop Sign      		| Stop sign   									| 
| U-turn     			| U-turn 										|
| Yield					| Yield											|
| 100 km/h	      		| Bumpy Road					 				|
| Slippery Road			| Slippery Road      							|


The model was able to correctly guess 4 of the 5 traffic signs, which gives an accuracy of 80%. This compares favorably to the accuracy on the test set of ...

#### 3. Describe how certain the model is when predicting on each of the five new images by looking at the softmax probabilities for each prediction. Provide the top 5 softmax probabilities for each image along with the sign type of each probability. (OPTIONAL: as described in the "Stand Out Suggestions" part of the rubric, visualizations can also be provided such as bar charts)

The code for making predictions on my final model is located in the 11th cell of the Ipython notebook.

For the first image, the model is relatively sure that this is a stop sign (probability of 0.6), and the image does contain a stop sign. The top five soft max probabilities were

| Probability         	|     Prediction	        					| 
|:---------------------:|:---------------------------------------------:| 
| .60         			| Stop sign   									| 
| .20     				| U-turn 										|
| .05					| Yield											|
| .04	      			| Bumpy Road					 				|
| .01				    | Slippery Road      							|


For the second image ... 

### (Optional) Visualizing the Neural Network (See Step 4 of the Ipython notebook for more details)
#### 1. Discuss the visual output of your trained network's feature maps. What characteristics did the neural network use to make classifications?




---
# Readme
---

## Project: Build a Traffic Sign Recognition Program
[![Udacity - Self-Driving Car NanoDegree](https://s3.amazonaws.com/udacity-sdc/github/shield-carnd.svg)](http://www.udacity.com/drive)

Overview
---
In this project, you will use what you've learned about deep neural networks and convolutional neural networks to classify traffic signs. You will train and validate a model so it can classify traffic sign images using the [German Traffic Sign Dataset](http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset). After the model is trained, you will then try out your model on images of German traffic signs that you find on the web.

We have included an Ipython notebook that contains further instructions 
and starter code. Be sure to download the [Ipython notebook](https://github.com/udacity/CarND-Traffic-Sign-Classifier-Project/blob/master/Traffic_Sign_Classifier.ipynb). 

We also want you to create a detailed writeup of the project. Check out the [writeup template](https://github.com/udacity/CarND-Traffic-Sign-Classifier-Project/blob/master/writeup_template.md) for this project and use it as a starting point for creating your own writeup. The writeup can be either a markdown file or a pdf document.

To meet specifications, the project will require submitting three files: 
* the Ipython notebook with the code
* the code exported as an html file
* a writeup report either as a markdown or pdf file 

Creating a Great Writeup
---
A great writeup should include the [rubric points](https://review.udacity.com/#!/rubrics/481/view) as well as your description of how you addressed each point.  You should include a detailed description of the code used in each step (with line-number references and code snippets where necessary), and links to other supporting documents or external references.  You should include images in your writeup to demonstrate how your code works with examples.  

All that said, please be concise!  We're not looking for you to write a book here, just a brief description of how you passed each rubric point, and references to the relevant code :). 

You're not required to use markdown for your writeup.  If you use another method please just submit a pdf of your writeup.

The Project
---
The goals / steps of this project are the following:
* Load the data set
* Explore, summarize and visualize the data set
* Design, train and test a model architecture
* Use the model to make predictions on new images
* Analyze the softmax probabilities of the new images
* Summarize the results with a written report

### Dependencies
This lab requires:

* [CarND Term1 Starter Kit](https://github.com/udacity/CarND-Term1-Starter-Kit)

The lab environment can be created with CarND Term1 Starter Kit. Click [here](https://github.com/udacity/CarND-Term1-Starter-Kit/blob/master/README.md) for the details.

### Dataset and Repository

1. Download the data set. The classroom has a link to the data set in the "Project Instructions" content. This is a pickled dataset in which we've already resized the images to 32x32. It contains a training, validation and test set.
2. Clone the project, which contains the Ipython notebook and the writeup template.
```sh
git clone https://github.com/udacity/CarND-Traffic-Sign-Classifier-Project
cd CarND-Traffic-Sign-Classifier-Project
jupyter notebook Traffic_Sign_Classifier.ipynb
```

### Requirements for Submission
Follow the instructions in the `Traffic_Sign_Classifier.ipynb` notebook and write the project report using the writeup template as a guide, `writeup_template.md`. Submit the project code and writeup document.

## How to write a README
A well written README file can enhance your project and portfolio.  Develop your abilities to create professional README files by completing [this free course](https://www.udacity.com/course/writing-readmes--ud777).



# Interesting resources


---
## http://datahacker.rs/lenet-5-implementation-tensorflow-2-0/
https://www.kaggle.com/jwjohnson314/a-starter-lenet5-dropout-data-augmentation

# training loss and validation loss
https://stackoverflow.com/questions/48226086/training-loss-and-validation-loss-in-deep-learning

# overfitting and underfitting
https://programming-review.com/machine-learning/overfitting

# dropout
https://machinelearningmastery.com/dropout-regularization-deep-learning-models-keras/

- Generally, use a small dropout value of 20%-50% of neurons with 20% providing a good starting point. A probability too low has minimal effect and a value too high results in under-learning by the network.

- Use a larger network. You are likely to get better performance when dropout is used on a larger network, giving the model more of an opportunity to learn independent representations.

- Use dropout on incoming (visible) as well as hidden units. Application of dropout at each layer of the network has shown good results.

- Use a large learning rate with decay and a large momentum. Increase your learning rate by a factor of 10 to 100 and use a high momentum value of 0.9 or 0.99.

- Constrain the size of network weights. A large learning rate can result in very large network weights. Imposing a constraint on the size of network weights such as max-norm regularization with a size of 4 or 5 has been shown to improve results.

https://keras.io/api/layers/regularization_layers/dropout/ 
"rate: Float between 0 and 1. Fraction of the input units to drop."

# dropout on convolutional layers
https://towardsdatascience.com/dropout-on-convolutional-layers-is-weird-5c6ab14f19b2

# other students

https://github.com/prateeksawhney97/Traffic-Sign-Classifier-Project-P3/blob/master/Traffic-Sign-Classifier-Writeup.pdf

# augmentations
#image_flipr = np.fliplr(image)
sources 

https://datascience.stackexchange.com/questions/28426/train-accuracy-vs-test-accuracy-vs-confusion-matrix

https://towardsdatascience.com/data-preprocessing-and-network-building-in-cnn-15624ef3a28b

https://towardsdatascience.com/deep-learning-3-more-on-cnns-handling-overfitting-2bd5d99abe5d

https://github.com/navoshta/traffic-signs

# augmentations
#image_flipr = np.fliplr(image)
sources 

https://datascience.stackexchange.com/questions/28426/train-accuracy-vs-test-accuracy-vs-confusion-matrix

https://towardsdatascience.com/data-preprocessing-and-network-building-in-cnn-15624ef3a28b

https://github.com/aleju/imgaug

# overfitting

https://towardsdatascience.com/deep-learning-3-more-on-cnns-handling-overfitting-2bd5d99abe5d


https://github.com/navoshta/traffic-signs


# tensorflow 2
http://datahacker.rs/lenet-5-implementation-tensorflow-2-0/

https://www.tensorflow.org/guide/keras/train_and_evaluate

https://www.machinecurve.com/index.php/2020/02/21/how-to-predict-new-samples-with-your-keras-model/

# tensorboard 
https://www.tensorflow.org/tensorboard/get_started

# dropout
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout












# processing my own pitcures
https://machinelearningmastery.com/how-to-manually-scale-image-pixel-data-for-deep-learning/
https://machinelearningmastery.com/how-to-save-a-numpy-array-to-file-for-machine-learning/

Model: "c132_c264_c3128_k3_p40.3_p50.2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
layer1 (Conv2D)              (None, 30, 30, 32)        320       
_________________________________________________________________
maxPool1 (MaxPooling2D)      (None, 15, 15, 32)        0         
_________________________________________________________________
layer2 (Conv2D)              (None, 13, 13, 64)        18496     
_________________________________________________________________
maxPool2 (MaxPooling2D)      (None, 7, 7, 64)          0         
_________________________________________________________________
layer3 (Conv2D)              (None, 5, 5, 128)         73856     
_________________________________________________________________
maxPool3 (MaxPooling2D)      (None, 3, 3, 128)         0         
_________________________________________________________________
flatten (Flatten)            (None, 

Epoch 1/2000
Epoch 2/2000
Epoch 3/2000
Epoch 4/2000
Epoch 5/2000
Epoch 6/2000
Epoch 7/2000
Epoch 8/2000
Epoch 9/2000
Epoch 10/2000
Epoch 11/2000
Epoch 12/2000
Epoch 13/2000
Epoch 14/2000
Epoch 15/2000
Epoch 16/2000
Epoch 17/2000
Epoch 18/2000
INFO:tensorflow:Assets written to: ./models/c132_c264_c3128_k3_p40.3_p50.2-20210503-210510/assets
test loss, test acc: [0.15234412252902985, 0.9614410400390625]
Own test loss, Own test acc: [0.22780448198318481, 0.8846153616905212]


In [None]:
def wrong_prediction(a,b):
    if (a==b):
        return False
    else:
        return True

def pred_details(predictions,features, labels):
    results = []
    for i in range(len(predictions)):
        #print(np.argmax(predictions[i]),np.argmax(y_test[i]))
        pred={'name':signnames[np.argmax(predictions[i])],
              'number':np.argmax(predictions[i]),
              'certainty':max(predictions[i]),
              'image':features[i].squeeze(),
              'correct_class':signnames[np.argmax(labels[i])],
              'predictions':predictions[i],
              'wrong_classification':wrong_prediction(np.argmax(labels[i]),np.argmax(predictions[i]))}
        results.append(pred)
    return results
    
    
def plot_test_images(results,only_wrong_preds = False,interactive=False):
    plt.ion() # turn on interactive mode
    for pred in results:
        if (only_wrong_preds):
            if pred['wrong_classification']:            
                print("Predicted class:", pred['name'])
                print("Instead of: ", pred['correct_class'])
                print("Certainty:", pred['certainty'])
                print("Correct prediction:", not pred['wrong_classification'])
                fig = plt.figure(figsize = (3, 3)) 
                #fig.subplots_adjust(left = 0, right = 1, bottom = 0, top = 1, hspace = 0.05, wspace = 0.05)        

                plt.imshow(pred['image'],cmap='gray')
                plt.show()
                if (interactive):
                    _ = input("Press [enter] to continue.")
                print("--------------------------------------------------------------------------------------\n")        
        else:
            print("Predicted class:", pred['name'])
            print("Certainty:", pred['certainty'])
            print(pred['predictions'])
            print("Correct prediction:", not pred['wrong_classification'])
            fig = plt.figure(figsize = (3, 3)) 
            #fig.subplots_adjust(left = 0, right = 1, bottom = 0, top = 1, hspace = 0.05, wspace = 0.05)        

            plt.imshow(pred['image'],cmap='gray')
            plt.show()
            if (interactive):
                _ = input("Press [enter] to continue.")
            print("--------------------------------------------------------------------------------------\n")
    






In [None]:

# Generate predictions (probabilities -- the output of the last layer)
# on new data using `predict`
print("Generate predictions for all samples")
predictions = loaded_model.predict(X_own_test,verbose=1)
print("predictions shape:", predictions.shape)
details = pred_details(predictions,X_own_test,y_own_test)

plot_test_images(details,only_wrong_preds=True,interactive=False)


In [None]:
print(y_own_test[0])

In [None]:
# Save the model
filepath = './saved_model'
save_model(model, filepath)