# ** Build a Traffic Sign Recognition Project **

The goals / steps of this project are the following:

• Load the data set (see below for links to the project data set)
• Explore, summarize and visualize the data set
• Design, train and test a model architecture
• Use the model to make predictions on new images
• Analyze the softmax probabilities of the new images
• Summarize the results with a written report
• Rubric Points

### README 
This is my project README. This file provides an overview of the project and details of each step taken to answer asked questions. The code part of project can be found at below link:


### Data Set Summary & Exploration

Provided data set for the project is already devided to three sets; training, validation and test set. Before trying out to build a model and make prediction, let's get more insight into the data; with measuring the length of each data set, I could see the size of each set and find out how many sample points are in each set. Below you can see each set's size;

Number of training examples = 34799
Number of testing examples = 12630
Number of validation examples = 4410
Image data shape = (32, 32, 3)
Number of classes = 43

### Include an exploratory visualization of the dataset.

To further understand the data, I visulized few of the images and also explored on the distribution of different classes in each set. From the histograms it is clear that the data sets do not have equal samples of all classes and some of the classes have more occurances than the others. Though, between training, test and validation set, this distribution is more or less the same.

### Design and Test a Model Architecture

I used training set to train the model and then measure its performance on the validation set and tweaked the model until desired accuracy was achieved. 

The first thing I did was to normalize the data. In the previous lessons we learned that you can normalize the data to have a zero mean and reasonable varianve by x=(x-128)/128 but in this case it does not yield to the best result and validation accuracy; I changed the normalization to x=x/255 where 255 is the maximum value each pixel can have. I completely skipped over making 0 to be the mean; I believe the reason for lower performance with zero mean is due to use of ReLues as activation function; shifting the mean to zero means some of the pixel values will be negative; then after applying the weights and biases, the logits may still stay negative. Then, when activation function is applied, all the negative values turn to zero and do not activate the node. Therefore, some of the nodes do not contribute to the network and the performance stays low.


#### the approach to finding a solution.

I followed the same methodolgy used in class, using 2 convolution layers and 3 fully connected layers. I used pooling after each convolution layer to help reduce the complexity and challenges in working with lots of data and reduce the size of features. Initially, I used the same set up as what we used in calss which resulted in a poor performance (about 80% on validation set). The biggest improvement came when I started normalizing the input data, as described above (brought the performance to about 92%). 

Later, I started changing the size of the filters and number of them in the convolution layers. The only difference with what we used in class is the filter size and number of used filters; I used larger filters with higher depth in convolution layers which resulted in more nodes in the first fully connected layer. Larger filters means the model tries to learn from a larger area of the picture and can discover the relationship between more pixels but it comes in the expense of larger model and more complexity. I also used more filters to be able to capture more feautures, specially in the first layers. This improved the performance on the validation set to more than 94%.

##### why this approach?
I believe this is a good approach for this problem because it's another image recognition and classification problem; similar to the Mnist data set, this data is consisted of relatively simple images which are consisting of curves and lines. A regular NN can't easily capture features like lines and curves but a CNN looks at the relationships between the pixels and that helps in distinguishing between different signs while finding the similarities between them (for example, a lot of the images have a circle around the sign but the enclosed information are different).

We could expand this approach and add more convolutional layers; I did not try it but considering the 94% accuracy on validation set, I worry adding more layers (if we keep the size of existing ones unchanged) could result in overfitting!

My final model consisted of the following layers:

| Layer | Description |Output Size|
| -------- | ---------- | -----|
|first|convolution|26x26x8|
||pooling|13x13x8|
|second|convolution|8x8x22|
||pooling|4x4x22|
||flaten|352|
|third|fully connected|120|
|fourth|fully connected|84|
|fifth|fully connected|43|

Lastly, softmax was applied to get probabilities of belonging to each class.


To train the model, I used Adam optimizer; in the first epoch, weights and biases were randomly assigned, then at the end of layers and after softmax, the loss (collective error of model) was calculated; then using the optimizer, derivative of error respect to each weight and bias was calculated to find out how much each one of them contributed to the error and how they should be modified. Then, weights and biases were updated and evertything repeated until good validation accuracy was obtained. 


My final model results were:

training set accuracy of 99.9%

validation set accuracy of 94%

test set accuracy of 93.6%

web test accuracy of 80%

### Test a Model on New Images

I chose randomly five pictures from German Traffic Data set and ran the trained model. The prediction for all 5 was correct. The results are summarized below

#### New images found in provided test data set
Here are the results of the prediction:


|Image |	Prediction|
|----|----|
|No vehicles|No vehicles|
|Speed limit (50km/h)|Speed limit (50km/h)|
|Speed limit (80km/h)|Speed limit (120km/h)|
|Speed limit (50km/h)|Speed limit (50km/h)|
|Priority road|Priority road|

Although the model performed well on 80% of these images, the test set accuracy is 93%! There are some images which include two signs and those are hard to predict. Also, there are a lot of similarities between images; for example many of them include a red circle around the image which makes a lot of pixels to look similar. This is especially pronounced when the image quality is low or the sign occupies a small portion of the image; I think the latest has been part of the reason in the wrong prediction in above sampled images from test set.

#### New images found from the web
reference: https://en.wikipedia.org/wiki/Road_signs_in_Germany

On the images downloaded from web, the only wrong prediction comes from a sign which the model has not seen in the training set;

|Image |	Prediction|
|----|----|
|Traffic signals|Traffic signals|
|Keep right|Keep right|
|Priority road|Priority road|
|Length limit (10 meters)|Go straight or left|
|No entry|No entry|

Regarding the sample set from test dataset, on the first two and the last two predictions the model is certain since the softmax probabilities are 100%. On the remaining image, the one which prediction is wrong, the model is only 53% certain this is the right class and the right answer is not among the top 5 probabilities.

#### Difficulties of model on the test set
The pictures I found on the web are relatively clean pictures and in better quality than my training set. So, it should not be super hard for the model to classify them, though one of the challenges I can see is the black background in these pictures while the trained model mainly have real streets' background. Also, there were some (in case of 5 samples, 1 picture) which model has not seen before and was very different from other 43 classes. Of course the model failed to correctly classify it. 

Also, if number of occurances of these pictures were low in the training set, the model would have less chances to learn about them and therefore, will have higher chances of predicting it wrong.