# Traffic sign classifier

## Machine Learning Project: Build a Traffic Sign Classifier

---
**Disclaimer**

This project aims to be a study and a reworking of [this](https://github.com/lfiaschi/udacity-traffic-sign-classifier) notebook.

In [12]:
# import cv2 # resize the images
import numpy as np
import pandas as pd
import os # to work with directories
from random import shuffle # shuffle data

DATASET_DIR = '/datasets/GTSRB/Images/'
ANNOTATION_FILE = './signnames.csv'

# IMAGE_SIZE = 50
# LR = 1e-3
# MODEL_NAME = 'trafficsigns--{}--{}.model'.format(LR, "2conv-basic")

Load the csv file with the annotations

In [13]:
signnames = pd.read_csv(ANNOTATION_FILE)
signnames.set_index('ClassId',inplace=True) 
print(signnames[:5])

def get_name_from_label(label):
    """Return the textual name corresponding to the numeric class id
    
    this functions look for a correspondence between the numeric class id 
    and the textual name of the class.
    
    :param label: the numeric class id
    :type label: int
    :returns: the textual name of the class
    
    :Example:

    >>> get_name_from_label(0)
    Speed limit (20km/h)   
    """    
    return signnames.loc[label].SignName

                     SignName
ClassId                      
0        Speed limit (20km/h)
1        Speed limit (30km/h)
2        Speed limit (50km/h)
3        Speed limit (60km/h)
4        Speed limit (70km/h)


The images are divided into folders based on their category. The *load_dataset* function create a list of all the images labeled with the name of their folder.

In [14]:
def load_dataset(path):
    """Load a dataset of images given a path
    
    this function look for images on the subfolders of the given path and label 
    them with the name of the folder where the image is stored
    
    :param path: the path where the images divided into folders are stored
    :returns: a numpy array    
    """
    dataset = []
    for subdir, dirs, files in os.walk(path): # all file on the dataset folder
        for img in files: # one image by one
            label = os.path.basename(subdir) # obtain the img label (name of the folder)
            imgPath = os.path.join(path, label, img) # the path of the img
            #img = cv2.resize(cv2.imread(path, cv2.IMREAD_GRAYSCALE), (IMAGE_SIZE, IMAGE_SIZE))
            
            label = int(label) # remove the zeros ahead the name of the folder
            #training_data.append([np.array(img), np.array(label)])
            dataset.append([imgPath, label])
    shuffle(dataset)
    # np.save('dataset.npy', dataset)
    return dataset

In [15]:
dataset = load_dataset(DATASET_DIR)
print("dataset cardinality : {}".format(len(dataset)))

# print the first 5 elements
for data in dataset[0:5]:
    print(data[0], "\t", data[1], "\t", get_name_from_label(data[1]))

dataset cardinality : 39252
/datasets/GTSRB/Images/00002/00068_00013.ppm 	 2 	 Speed limit (50km/h)
/datasets/GTSRB/Images/00007/00036_00029.ppm 	 7 	 Speed limit (100km/h)
/datasets/GTSRB/Images/00040/00001_00001.ppm 	 40 	 Roundabout mandatory
/datasets/GTSRB/Images/00038/00061_00014.ppm 	 38 	 Keep right
/datasets/GTSRB/Images/00038/00020_00018.ppm 	 38 	 Keep right


TODO cercare di ottenere X_train, y_train, X_valid, y_valid per allienarci con il progetto originale

## Select data to create the trainingset and the testset

In [16]:
# training_set = dataset[:-5000] # take all the images except the last 5000
# test_set = dataset[-5000:] # take the last 5000 images
# print("training set cardinality : {}".format(len(training_set)))
# print("testset cardinality : {}".format(len(test_set)))

training set cardinality : 34252
testset cardinality : 5000
