# Traffic Sign Image Classification using CNNs 

# Introduction
The problem we are gonna tackle is [The German Traffic Sign Recognition Benchmark(GTSRB)](http://benchmark.ini.rub.de/?section=gtsrb&subsection=news). The problem is to to recognize the traffic sign from the images. Solving this problem is essential for self-driving cars to operate on roads.

The dataset features 43 different signs under various sizes, lighting conditions, occlusions and is very similar to real-life data. Training set includes about 39000 images while test set has around 12000 images. Images are not guaranteed to be of fixed dimensions and the sign is not necessarily centered in each image. Each image contains about 10% border around the actual traffic sign.

Our approach to solving the problem will of course be very successful convolutional neural networks (CNNs). CNNs are multi-layered feed-forward neural networks that are able to learn task-specific invariant features in a hierarchical manner.

## A Brief Summary of the process

Pre-Process Data — Although deep neural networks don’t require data pre-processing and CNNs learn these techniques themselves during model training, pre-processing can really accelerate the training process. We will be using a variety of pre-processing such as Mean, Centering, Normalizing, Histogram Equalizing Data.
    
Augment the dataset as per requirement(optional) — Deep neural networks simply perform better when there is more data. We will be using a variety of techniques such as rotation, scaling, translation, projective transformation, sobel edge , gaussian noise to increase the robustness and variety in our dataset.
    
Build a model — As mentioned before we will be using a convolutional neural network. The depth of convolutions, the number of layers, the type of activation functions will be chosen based on the problem at hand.
    
Search for best hyper-parameters — They are the knobs you get to turn such as learning rate, number of iterations, batch size , number of epochs, dropout rate , regularization factor etc. Choosing a subset of the entire problem and building a random search over a range of hyperparameters for finding the best possible combination is a good way of proceeding.

Train the model using training dataset. Use the training loss as the optimization parameter.



# Imports

In [1]:
import numpy as np
from skimage import io, color, exposure, transform
from sklearn.cross_validation import train_test_split
import os
import glob
import h5py

from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential, model_from_json
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.pooling import MaxPooling2D

from keras.optimizers import SGD
from keras.utils import np_utils
from keras.callbacks import LearningRateScheduler, ModelCheckpoint
from keras import backend as K
K.set_image_data_format('channels_first')

from matplotlib import pyplot as plt
%matplotlib inline

NUM_CLASSES = 43
IMG_SIZE = 48

Using Theano backend.
Using gpu device 0: GeForce GTX TITAN X (CNMeM is disabled, cuDNN 4007)


# Preprocessing
Images vary a lot in illumination. They also vary in size. So, let’s write a function to do histogram equalization in HSV color space and resize the images to a standard size: 

In [2]:
def preprocess_img(img):
    # Histogram normalization in y
    hsv = color.rgb2hsv(img)
    hsv[:,:,2] = exposure.equalize_hist(hsv[:,:,2])
    img = color.hsv2rgb(hsv)

    # central scrop
    min_side = min(img.shape[:-1])
    centre = img.shape[0]//2, img.shape[1]//2
    img = img[centre[0]-min_side//2:centre[0]+min_side//2,
              centre[1]-min_side//2:centre[1]+min_side//2,
              :]

    # rescale to standard size
    img = transform.resize(img, (IMG_SIZE, IMG_SIZE))

    # roll color axis to axis 0
    img = np.rollaxis(img,-1)

    return img


def get_class(img_path):
    return int(img_path.split('/')[-2])

All the training images are store into numpy arrays. We’ll also get labels of images from paths. We’ll convert targets to one-hot form as is required by keras:

In [3]:
try:
    with  h5py.File('X.h5') as hf: 
        X, Y = hf['imgs'][:], hf['labels'][:]
    print("Loaded images from X.h5")
    
except (IOError,OSError, KeyError):  
    print("Error in reading X.h5. Processing all images...")
    root_dir = 'GTSRB/Final_Training/Images/'
    imgs = []
    labels = []

    all_img_paths = glob.glob(os.path.join(root_dir, '*/*.ppm'))
    np.random.shuffle(all_img_paths)
    for img_path in all_img_paths:
        try:
            img = preprocess_img(io.imread(img_path))
            label = get_class(img_path)
            imgs.append(img)
            labels.append(label)

            if len(imgs)%1000 == 0: print("Processed {}/{}".format(len(imgs), len(all_img_paths)))
        except (IOError, OSError):
            print('missed', img_path)
            pass

    X = np.array(imgs, dtype='float32')
    Y = np.eye(NUM_CLASSES, dtype='uint8')[labels]

    with h5py.File('X.h5','w') as hf:
        hf.create_dataset('imgs', data=X)
        hf.create_dataset('labels', data=Y)

Loaded images from X.h5


# Model
1. We’ll use feed forward network with 6 convolutional layers followed by a fully connected hidden layer.
2. We’ll also use dropout layers in between. Dropout regularizes the networks, i.e. it prevents the network from overfitting.
3. All our layers have relu activations except the output layer.A rectified linear unit(relu) has output 0 if the input is less than 0, and raw output otherwise. That is, if the input is greater than 0, the output is equal to the input. 
4. Output layer uses softmax activation as it has to output the probability for each of the classes.A softmax activation function is used on the output layer to turn the outputs into probability-like values and allow one class of the 43 to be selected as the model’s output prediction. 


In [5]:
def cnn_model():
    model = Sequential()

    model.add(Conv2D(32, (3, 3), padding='same',
                     input_shape=(3, IMG_SIZE, IMG_SIZE),
                     activation='relu'))
    model.add(Conv2D(32, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.2))

    model.add(Conv2D(64, (3, 3), padding='same',
                     activation='relu'))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.2))

    model.add(Conv2D(128, (3, 3), padding='same',
                     activation='relu'))
    model.add(Conv2D(128, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.2))

    model.add(Flatten())
    model.add(Dense(512, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(NUM_CLASSES, activation='softmax'))
    return model

model = cnn_model()
# let's train the model using SGD + momentum (how original).
lr = 0.01
sgd = SGD(lr=lr, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
          optimizer=sgd,
          metrics=['accuracy'])


def lr_schedule(epoch):
    return lr*(0.1**int(epoch/10))

# Model Configuration

1. loss : Loss function we want to optimize. We cannot use error percentage as it is not continuous and thus non differentiable. We therefore use a proxy for it - categorical_crossentropy. 
2. optimizer : We use standard [stochastic gradient](http://cs231n.github.io/neural-networks-3/#sgd) descent with [Nesterov momentum](http://cs231n.github.io/neural-networks-3/#sgd).
3. metric : Since we are dealing with a classification problem, our metric is accuracy.


In [6]:
batch_size = 32
nb_epoch = 30

model.fit(X, Y,
          batch_size=batch_size,
          epochs=nb_epoch,
          validation_split=0.2,
          shuffle=True,
          callbacks=[LearningRateScheduler(lr_schedule),
                    ModelCheckpoint('model.h5',save_best_only=True)]
            )

Train on 31367 samples, validate on 7842 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0x7feaf3055898>

# Training
During training, the model will iterate over batches of training set, each of size *batch_size*. For each batch, gradients will be computed and updates will be made to the weights of the network automatically. One iteration over all the training set is referred to as an epoch. Training is usually run until the loss converges to a constant.

We will add a couple of features to our training:

1. Learning rate scheduler : Decaying learning rate over the epochs usually helps model learn better
2. Model checkpoint : We will save the model with best validation accuracy. This is useful because our network might start overfitting after a certain number of epochs, but we want the best model.


# Evaluate
Load test data and evaluate the model on it.

In [7]:
import pandas as pd
test = pd.read_csv('GT-final_test.csv',sep=';')

X_test = []
y_test = []
i = 0
for file_name, class_id  in zip(list(test['Filename']), list(test['ClassId'])):
    img_path = os.path.join('GTSRB/Final_Test/Images/',file_name)
    X_test.append(preprocess_img(io.imread(img_path)))
    y_test.append(class_id)
    
X_test = np.array(X_test)
y_test = np.array(y_test)

In [8]:
y_pred = model.predict_classes(X_test)
acc = np.sum(y_pred==y_test)/np.size(y_pred)
print("Test accuracy = {}".format(acc))

Test accuracy = 0.9792557403008709


# Data Augmentation
If we can generate new images for training from the existing images, that will be a great way to increase the size of the dataset. This can be done by slightly

1. Translating of image
2. Rotating of image
3. Shearing the image
4. Zooming in/out of the image


In [9]:
from sklearn.cross_validation import train_test_split

X_train, X_val, Y_train, Y_val = train_test_split(X, Y, test_size=0.2, random_state=42)

datagen = ImageDataGenerator(featurewise_center=False, 
                            featurewise_std_normalization=False, 
                            width_shift_range=0.1,
                            height_shift_range=0.1,
                            zoom_range=0.2,
                            shear_range=0.1,
                            rotation_range=10.,)

datagen.fit(X_train)

In [10]:
# Reinstallise models 

model = cnn_model()
# let's train the model using SGD + momentum (how original).
lr = 0.01
sgd = SGD(lr=lr, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
          optimizer=sgd,
          metrics=['accuracy'])


def lr_schedule(epoch):
    return lr*(0.1**int(epoch/10))

In [11]:
nb_epoch = 30
model.fit_generator(datagen.flow(X_train, Y_train, batch_size=batch_size),
                            steps_per_epoch=X_train.shape[0],
                            epochs=nb_epoch,
                            validation_data=(X_val, Y_val),
                            callbacks=[LearningRateScheduler(lr_schedule),
                                       ModelCheckpoint('model.h5',save_best_only=True)]
                           )

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0x7fea770a6630>

In [12]:
y_pred = model.predict_classes(X_test)
acc = np.sum(y_pred==y_test)/np.size(y_pred)
print("Test accuracy = {}".format(acc))

Test accuracy = 0.9828978622327791


In [13]:
model.summary()

____________________________________________________________________________________________________
Layer (type)                       Output Shape        Param #     Connected to                     
convolution2d_7 (Convolution2D)    (None, 32, 48, 48)  896         convolution2d_input_2[0][0]      
____________________________________________________________________________________________________
convolution2d_8 (Convolution2D)    (None, 32, 46, 46)  9248        convolution2d_7[0][0]            
____________________________________________________________________________________________________
maxpooling2d_4 (MaxPooling2D)      (None, 32, 23, 23)  0           convolution2d_8[0][0]            
____________________________________________________________________________________________________
dropout_5 (Dropout)                (None, 32, 23, 23)  0           maxpooling2d_4[0][0]             
___________________________________________________________________________________________

In [14]:
model.count_params()

1358155