*Author: Suhan Shetty , suhan.n.shetty@gmail.com*

*Project Files: [Google Drive](https://drive.google.com/open?id=13_9sMv5LcyGX5oo29x41sUwFhXkNbgWV)*

*Note: If you are using Google-Colaboratory, you can use the 'model_for_google_colab.ipynb' file that is included in the above-mentioned link. 'model_for_google_colab.ipynb' is customized to work in Google-Colaboratory, however, I have not given description in that notebook. For the details, follow this notebook.*

# End-to-end Learning  for Self-driving Cars  


## Introduction

In this tutorial, I will walk you through the process of designing an end-to-end learning model for a self-driving car using the resources provided by [Udacity](https://in.udacity.com/course/self-driving-car-engineer-nanodegree--nd013). We will use the tools provided by Udacity to the public. I have customized some of those files, and I have shared all the necessary files in this [google-drive](https://drive.google.com/open?id=13_9sMv5LcyGX5oo29x41sUwFhXkNbgWV). You can use them to follow along. 

You can go through the following resources to understand the concept of end-to-end learning: 
- [Siraj's Youtube Tutorial](https://www.youtube.com/watch?v=EaY5QiZwSP4)  
- [NVDIA's Research Paper](https://arxiv.org/pdf/1604.07316.pdf)
- [Udacity Behavioral Cloning Project](https://github.com/udacity/CarND-Behavioral-Cloning-P3)

### Overview of the tutorial:
- Project Environment Setup

- Data collection and Preprocessing 

- Data Augmentation

- Design CNN to Clone Car's Behavior

- Testing the Performance of the Learning Model

    
   
## Project Environment Setup

   We will use a [Unity](https://unity3d.com/) based simulator provided by Udacity to collect data. The previously mentioned [video](https://www.youtube.com/watch?v=EaY5QiZwSP4) has already explained to you about the data collection procedure involved here. I have used Anaconda distribution in my Mac-OS-X to carry out the project. So, some of the details provided here apply only if you are using Anaconda distribution. 

First of all, download the file stored [here](link). The downloaded folder will be your project directory. The folder is named *e2e_project*, and I will assume it to be the current working directory in your PC if you want to follow along.

**A brief description of the important files in the above-mentioned folder:**
- **drive.py** : Script used to execute the simulator in autonomous mode (provided by Udacity)
- **model.py** : Script used for training CNN model
- **model.h5** : Saved Keras model of the trained CNN model
- **data.py** : Contains the definition of preprocessing operations involved on the images obtained from simualtor
- **data.zip** : The dataset used to train CNN model
- **CarND-Term1-Starter-Kit-master** : Files required to setup the necessary python environment 



**Follow the below procedure to setup the project environment using the files provided in *e2e_project*:**
 1. Install [Unity](https://unity3d.com/)

 
 2. Create the Python envirnoment *carnd-term1*.
 
    You can create this environment using the *environment.yml* file in *e2e_project/CarND-Term1-Starter-Kit/*. 
    You can refer to [this](https://github.com/udacity/CarND-Term1-Starter-Kit) page for details.
    
       
 3. Install and open the simulator. 
    The simulator is included in 'e2e_project/simulator/'.  
    
 
 4. If you are using carnd-term1 environment to train the CNN then this step is not necessary. If that is not the case, then you need to match the verions of the environments that you have used in training with *carnd-term1* environment. Update Keras and tensorflow to the versions that are used in python environment that is used for training the CNN. I did CNN training using [Google Colaboratory](https://colab.research.google.com/) which used Keras-2.1.6 and Tensorflow-1.9.0-rc1. 
 In Mac OS-X, I did the following to update keras and tensorflow of 'carnd-term1' environment:
 
 Open the terminal and execute the following:
 ```sh
        >>source activate carnd-term1
        >>pip install tensorflow==1.9.0-rc1
        >>pip install keras==2.6.1 
```
   
        
  
 
  

 
 
## Data Collection and Preprocessing

When you open the simulator you are given two options: Training Mode and Autonomous mode. For training mode, or just to play with the tool, you can directly open the simulator and click on 'training'. The training mode helps us in getting training dataset. You can drive the car using left and right arrow keys in your keyboard to control steering angle, up and down arrow keys to control speed.  You can collect data by recording the ride. You can start or stop recording of the ride by clicking on the key 'R'. For each frame in the recording mode, the simulator captures and saves three images taken from three cameras- left, right, and center, attached on the car. At the end of the recording, a 'driving_log.csv' file is created wherein the information about the image, steering angle, brake, speed of the car used in every frame of the recording. We use this dataset to train a CNN to model the behavior of the car and predict steering angle, given an image that is taken from the car.

 The autonomous mode is used to test our trained model. Once you have trained your CNN we can test its performance in the autonomous mode. In the autonomous mode, simulator is operated using *drive.py*. This script will get the car's camera image at every frame from the simulator and it will send the steering angle control output predicted from the trained CNN model to the simulator. For this tutorial, I have modified the script *drive.py* originally provided by Udacity to customize for this tutorial, and it is included in *e2e_project* directory. 
 [image1]: ./tutorial_files/simulator.png "Simualator"
 ![alt text][image1]

Using the simulator, I collected data from about 30 minutes of a ride. Here is a snapshot of the distribution of the data that I collected:

[image2]: ./tutorial_files/original_data.png "Raw data"
![image2]

You can observe that the data is heavily biased towards zero steering angle. So we have to preprocess the data to remove this bias. 

I did some data scrapping and the pre-processed dataset looks as follows:
[image3]: ./tutorial_files/balanced_data.png "Preprocessed data"
![image3]

As you can see, the processed data has a better distribution than the raw dataset. However, the distribution that I have obtained is not the best. We can get better distribution if we spend a bit more time in collecting the data and preprocessing.

You can either collect your own dataset to follow along with this article, or, use the pre-processed dataset included in *e2e_project* directory. 


In [None]:
# Setup the project and data path

import os
import sys
local_project_path = '' # Current folder - e2e_project
local_data_path = os.path.join(local_project_path, 'data') 
sys.path.append(local_project_path) 

In [None]:
# Data Visulaization 

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


src_data_path = os.path.abspath(local_data_path)
df = pd.read_csv(os.path.join(src_data_path,'driving_log.csv'))
print("Number of data points: ", len(df))

# We only consider magnitude since data augmentation will take care of balancing data across sign
plt.hist(df.steering, bins = 30)
plt.xlabel('steering')
plt.show()

# Note : The preprocessing was not done that well. More careful preprocessing would result in better dataset

In [4]:
# Install the packages required for training

import numpy as np
import pandas as pd
import skimage
import skimage.transform as sktransform
from sklearn import model_selection
import random
import matplotlib.image as mpimg
import os
import shutil
import sys



# https://keras.io/
!pip install -q keras
from keras.callbacks import Callback
import keras
from keras.models import Model, Sequential
from keras.layers import Dense, Dropout, Flatten, Input, AveragePooling2D, merge, Activation
from keras.layers import Conv2D, MaxPooling2D, BatchNormalization
from keras.layers import Concatenate
from keras.optimizers import Adam
from keras.optimizers import SGD
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import LearningRateScheduler
from keras.models import load_model
import math

# this part will prevent tensorflow to allocate all the avaliable GPU Memory
# backend
import tensorflow as tf
from keras import backend as k

# Don't pre-allocate memory; allocate as-needed
config = tf.ConfigProto()
config.gpu_options.allow_growth = True

# Create a session with the above options specified.
k.tensorflow_backend.set_session(tf.Session(config=config))


print("Keras Version: ",keras.__version__)
print("Tensorflow Version: ",tf.__version__)

Using TensorFlow backend.


Keras Version:  2.1.6
Tensorflow Version:  1.9.0-rc1


## Data Augmentation

We will first define the data augmentation techniques, and then we will train the CNN. For data augmentation I have referred to [this](https://navoshta.com/end-to-end-deep-learning/) blog. For more information on the data augmention techniques used here, refer to the blog.


In [0]:
# Data Augmentation

# Cameras we will use
cameras = ['left', 'center', 'right']
cameras_steering_correction = [.25, 0., -.25]

def preprocess(image, top_offset=.375, bottom_offset=.125):
    """
    Applies preprocessing pipeline to an image: crops `top_offset` and `bottom_offset`
    portions of image, resizes to 32x128 px and scales pixel values to [0, 1].
    """
    top = int(top_offset * image.shape[0])
    bottom = int(bottom_offset * image.shape[0])
    image = sktransform.resize(image[top:-bottom, :], (32, 128, 3))
    return image

def generate_samples(data,root_path, bs=128,augment=True):
    """
    Keras generator yielding batches of training/validation data.
    Applies data augmentation pipeline if `augment` is True.
    """
    
    #import pdb
    #pdb.set_trace()
    while True:
        # Generate random batch of indices
        indices = np.random.permutation(data.count()[0])
        batch_size = bs #128

        for batch in range(0, len(indices), batch_size):
            batch_indices = indices[batch:(batch + batch_size)]
            # Output arrays
            x = np.empty([0, 32, 128, 3], dtype=np.float32)
            y = np.empty([0], dtype=np.float32)
            # Read in and preprocess a batch of images

            for i in batch_indices:
                # Randomly select camera
                camera = np.random.randint(len(cameras)) if augment else 1
                # Read frame image and work out steering angle
                image = mpimg.imread(os.path.join(root_path, data[cameras[camera]].values[i].strip()))
                image.setflags(write=1)
                #print('Flags: ',image.flags)
                angle = data.steering.values[i] + cameras_steering_correction[camera]
                if augment:
                    # Add random shadow as a vertical slice of image
                    h, w = image.shape[0], image.shape[1]
                    [x1, x2] = np.random.choice(w, 2, replace=False)
                    k = h / (x2 - x1)
                    b = - k * x1
                    #print('Flags: ',image.flags)
                    for i in range(h):
                        c = int((i - b) / k)
                        image[i, :c, :] = (image[i, :c, :] * .5).astype(np.int32)
                        
                # Randomly shift up and down while preprocessing
                v_delta = .05 if augment else 0
                image = preprocess(
                    image,
                    top_offset=random.uniform(.375 - v_delta, .375 + v_delta),
                    bottom_offset=random.uniform(.125 - v_delta, .125 + v_delta)
                )
                # Append to batch
                x = np.append(x, [image], axis=0)
                y = np.append(y, [angle])
                
            # Randomly flip half of images in the batch.
            
            flip_indices = random.sample(range(x.shape[0]), int(x.shape[0] / 2))
            x[flip_indices] = x[flip_indices, :, ::-1, :]
            y[flip_indices] = -y[flip_indices]
            
            yield (x, y)
            

 

In [7]:
# Split Training and Validation data

if __name__ == '__main__':
    # Read the data
    df = pd.io.parsers.read_csv(os.path.join(local_data_path, 'driving_log.csv'))
    # Split data into training and validation sets
    df_train, df_valid = model_selection.train_test_split(df, test_size=.1)

print("train: ",df_train.shape[0])
print("validation: ",df_valid.shape[0])

train:  6219
validation:  691


## CNN Design and Training

**CNN architectural consideration: **

- I have used [densenet](https://arxiv.org/pdf/1608.06993.pdf) as building block of CNN

- Two densenet blocks are used. The GPU used in Google Colaboratory had a limitation of a maximum of 128 stacking of feature map (depth of a layer). So, I have addressed this constraint in designing each dense block by limiting growth rate and number layers in each densenet block.


- The layers are designed so that the receptive field is 'good' enough at the latter layers so that last few layers have seen the entire image once.


- **1xC-Cx1** convolution is used instead of direct CxC-convolution. This will speed up the training without much loss in accuracy.


- **Dilated convolution** is used. In case of autonomous driving the features of interest are mostly bounded regions which are better captured at a global level. Thus, we need to grow the receptive field fast enough so that the latter layers have a better global picture. I have used a dilatation rate of 2 in each convolution.



In [9]:
#  Building blocks for Densenet: https://arxiv.org/pdf/1608.06993.pdf

def add_denseblock(input, growth_rate = 12, numLayers=12, dropout_rate = 0.2):
    temp = input
    for _ in range(numLayers):
        BatchNorm = BatchNormalization()(temp)
        relu = Activation('relu')(BatchNorm)
        # 1xCx1 convolution with dilatation
        conv1xC = Conv2D(growth_rate,(1,3), use_bias=False ,padding='same',dilation_rate=2)(relu)
        conv1xCx1 = Conv2D(growth_rate,(3,1), use_bias=False ,padding='same',dilation_rate=2)(conv1xC)
        concat = Concatenate(axis=-1)([temp,conv1xCx1])
        temp = concat
        
    return temp
  
def add_transition(input, growth = 16, dropout_rate = 0.2):
    BatchNorm = BatchNormalization()(input)
    relu = Activation('relu')(BatchNorm)
    # 1x1 convolution
    Conv2D_BottleNeck = Conv2D(growth, (1,1), use_bias=False ,padding='same',dilation_rate=2)(relu)
    if dropout_rate>0:
        Conv2D_BottleNeck = Dropout(dropout_rate)(Conv2D_BottleNeck)
    avg = AveragePooling2D(pool_size=(2,2))(Conv2D_BottleNeck)
    
    return avg


def output_layer(input):
    BatchNorm = BatchNormalization()(input)
    relu = Activation('relu')(BatchNorm)
    AvgPooling = AveragePooling2D(pool_size=(2,2))(relu)
    flat = Flatten()(AvgPooling)

    output = Dense(1,activation='tanh')(flat)
    
    return output



print("Network building blocks are ready")

Network building blocks are ready


In [0]:
# Keeps track of model weights by saving them at the end of each epoch.

class WeightsLogger(Callback):
    def __init__(self, root_path):
      super(WeightsLogger, self).__init__()
      self.weights_root_path = os.path.join(root_path, 'weights/')
      shutil.rmtree(self.weights_root_path, ignore_errors=True)
      os.makedirs(self.weights_root_path, exist_ok=True)

    def on_epoch_end(self, epoch, logs={}):
        self.model.save_weights(os.path.join(self.weights_root_path, 'model_epoch_{}.h5'.format(epoch + 1)))

In [14]:
#Define the hyperparameters : growth rate in densenets

# For densenet-blovk-1
growth_rate_net1 = 14
numLayers_net1 = 7

# For densenet-block-2
growth_rate_net2 = 10
numLayers_net2 = 11

# No dropout
dropout_rate = 0.0

input_shape=(32, 128, 3)

input = Input(shape=input_shape)
# First Layer
First_Conv2D = Conv2D(growth_rate_net1, (3,3), use_bias=False ,padding='same',dilation_rate=2)(input)

# First densenet block
First_Block = add_denseblock(First_Conv2D, growth_rate_net1, numLayers_net1, dropout_rate)
First_Transition = add_transition(First_Block, growth_rate_net1, dropout_rate)

# Second densenet block
Second_Block = add_denseblock(First_Transition, growth_rate_net2, numLayers_net2, dropout_rate) 
Second_Transition = add_transition(Second_Block, growth_rate_net2, dropout_rate)

# Output Layer
output = output_layer(Second_Transition)

model = Model(inputs=[input], outputs=[output])

model.summary()


__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_3 (InputLayer)            (None, 32, 128, 3)   0                                            
__________________________________________________________________________________________________
conv2d_114 (Conv2D)             (None, 32, 128, 14)  378         input_3[0][0]                    
__________________________________________________________________________________________________
batch_normalization_63 (BatchNo (None, 32, 128, 14)  56          conv2d_114[0][0]                 
__________________________________________________________________________________________________
activation_63 (Activation)      (None, 32, 128, 14)  0           batch_normalization_63[0][0]     
__________________________________________________________________________________________________
conv2d_115

In [15]:
# Determine the loss function, optimizer, compile and start training

# Learning Rate Schedule: drop the learnig rate by 10 after every 25 epochs. Initial learning rate is 0.005
def step_decay(epoch):
    initial_lrate = 0.005
    drop = 0.1
    epochs_drop = 25.0
    lrate = initial_lrate * math.pow(drop,  
    math.floor((1+epoch)/epochs_drop))
    return lrate

# Callback for learning rate scheduling
lrate = LearningRateScheduler(step_decay)  

# Determine Loss function and Optimizer. SGD works equally well in this case.
adm = keras.optimizers.Adam(lr = 0.0001,beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)

# Compile the model. Loss function: mean squared error
model.compile(loss='mean_squared_error',optimizer=adm)
print("Model Compilation Successful")

# Training 

epoch = 50
batch_size = 128

history = model.fit_generator(
    generate_samples(df_train,local_data_path, batch_size),
    steps_per_epoch=df_train.shape[0]/batch_size,
    epochs=epoch,
    callbacks=[lrate,WeightsLogger(root_path=local_project_path)],
    validation_data=generate_samples(df_valid, local_data_path, augment=False),
    validation_steps = df_valid.shape[0]/batch_size,
    initial_epoch=0, verbose=1)

# Save the model. 
model.save('model.h5')

# If you require, save the .json form of the model
with open('model.json', 'w') as file:
    file.write(model.to_json())
    
#------------------------------------------------------------------------------------------------------------------#
#-------------------------------------------END OF CODE------------------------------------------------------------#
#------------------------------------------------------------------------------------------------------------------#

Model Compilation Successful
Epoch 1/50


  warn("The default mode, 'constant', will be changed to 'reflect' in "


Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


### Training results:

With the above-choice of CNN architecture and the hyper-parameters, the loss function was about 0.05 at the end of 50th epoch. As you can see in the above training summary, it hints at further reduction in loss function if we continue training longer. The loss function can be further reduced if you train longer. For this tutorial, the loss function that we have achieved is good enough. The range of output of the CNN is (-1,1) and it corresponds to steering angle (-25,25) in degrees. So, mean squared error of 0.05. in this case, means an absolute error of 5 degrees in prediction on an average. 

## Performance Testing

Now we have the trained CNN model to predict steering angle of the car, we can run the simulator in the autonomous mode. To do this, we will use 'drive.py' and the saved CNN model 'model.h5'. Both of these files exist in 'e2e_project/'. 

Follow the below steps to run the simulator in the autonomous mode:
- Open the simulator and click on autonmous mode. Choose track-1
- Make sure 'drive.py' and 'model.h5' are in the local_project_path ('../e2e_project')
- Execute drive.py

A video of the autonomous drive:

<a href="http://www.youtube.com/watch?feature=player_embedded&v=cMEUmcrLJdw
" target="_blank"><img src="http://img.youtube.com/vi/cMEUmcrLJdw/0.jpg" 
alt="IMAGE ALT TEXT HERE" width="540" height="180" border="10" /></a>


*drive.py* has a parameters that you can tune: MAX_SPEED. You can change this to suite your model. I am running the simulator on a CPU so a *MAX_SPEED=8* worked well for me. The MAX_SPEED really depends on how fast *drive.py* can interract with the simulator. If you have a GPU on your PC, then *drive.py* will be able to compute the steering angle prediction from the CNN model at high speed and change the control signals accordingly. So tune *MAX_SPEED* according to your PC. Note that complexity of the CNN network plays a role here. If you have very complex model, with a lot of parameters, then *drive.py* will take a lot of time of each predictiong, and hence will not be able to drive at higher speed. We have used only about 50k parameters in this model - thanks to 1xC-Cx1 convolution, dilatation rate, and non existence of fully-connected layers. 



### Result Analysis 


- With the dataset in hand and the above CNN model, the car was able to complete track-1 autonomously.


- The CNN model involved only about 50k parameters. It is very efficient - thanks to dilated 1xC-Cx1 convolutions and non-existence of fully-connected layer.  


- The CNN model did not use any fully-connected layers (dense layers before the output). I observed that addition of fully-connected layers after the convolution layers did not help in reducing the loss function. 


- 1xC-Cx1 dilated convolution helped to speed up the model prediction and training time. I am not aware of people using this type of convolution in designing end-to-end learning model for cars. It seems like a natural tool.



#### Some Observations:


- Addition of fully connected layers did not help in reducing cost

- Addition of more densent blocks reduced the cost slightly. However, the project was constrained by two densenet blocks

- Comparision of 1xC-Cx1 with CxC.: No significant influence. 1xC-Cx1 requires fewer parameters.

- Learning rate schedule in the range 0.005 to 0.00005 seemed to work best


#### Shortcomings and Possible Causes:


- The maximum speed achieved in track-1 was 9mph. This can be imroved by some clever choice of parameterization of throttle in terms of speed. Running the simulator on a device with GPU will help speed up.


- The data collected was poor. I did a lot of erroneous training and it effected the model performance to some extent. 


- Although the model performed well in track-1, it was not able to generalise well and the car could not drive in track-2(unseen data). This may be due to the poor quality of the data that I have collected. Ideally, we need to collect the data for different scenarios. The datset must include a lot of recovery mode driving - placing the car near the corner of the road and recoverinig from there to centre of the road. This requires some patience and effort during data collection. Also, I did not spend enoughtime in pre-processing of the data. A better strategy for data scrapping will signifacantly improve the performance of the model. Data collection, data preprocessing, and data augmentation plays most important role in deciding the model performance. The steering_angle_correction, image resizing are some of the key hyperparameters in data augmentation.


 

#### What more can be done?
- Lot of scope for improvement in data collection, data preprocessing, and data augmentation 

- Predict steering, speed, brake altogether - data is already available
It would be interesting to come up with a model to predict speed, brake along with steering angle. The data is already available from the simulator. So it is feasible. 
