My project includes the following files:
.
├── assets
├── dataHandler.py
├── drive.py
├── model.h5
├── model.py
├── README.md
├── train.py
└── video.py
model.py
- containing the Nvidia End to End learning model implemented in Keras.drive.py
for driving the car in autonomous modemodel.h5
containing a trained convolution neural networkdataHandler.py
contains generators to generate and augment training and validation datatrain.py
contains the training script which utilizesmodel.py
and 'dataHandler.py' to train the model
To run the model. Start the simulator in Autonomous mode and in a different terminal execute the following command
python drive.py model.h5 imgs/
The following is the model architecture summary
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
lambda_1 (Lambda) (None, 160, 320, 3) 0 lambda_input_1[0][0]
____________________________________________________________________________________________________
cropping2d_1 (Cropping2D) (None, 70, 320, 3) 0 lambda_1[0][0]
____________________________________________________________________________________________________
convolution2d_1 (Convolution2D) (None, 33, 158, 24) 1824 cropping2d_1[0][0]
____________________________________________________________________________________________________
convolution2d_2 (Convolution2D) (None, 15, 77, 36) 21636 convolution2d_1[0][0]
____________________________________________________________________________________________________
convolution2d_3 (Convolution2D) (None, 6, 37, 48) 43248 convolution2d_2[0][0]
____________________________________________________________________________________________________
convolution2d_4 (Convolution2D) (None, 4, 35, 64) 27712 convolution2d_3[0][0]
____________________________________________________________________________________________________
convolution2d_5 (Convolution2D) (None, 2, 33, 64) 36928 convolution2d_4[0][0]
____________________________________________________________________________________________________
flatten_1 (Flatten) (None, 4224) 0 convolution2d_5[0][0]
____________________________________________________________________________________________________
dense_1 (Dense) (None, 1164) 4917900 flatten_1[0][0]
____________________________________________________________________________________________________
dropout_1 (Dropout) (None, 1164) 0 dense_1[0][0]
____________________________________________________________________________________________________
activation_1 (Activation) (None, 1164) 0 dropout_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense) (None, 100) 116500 activation_1[0][0]
____________________________________________________________________________________________________
dropout_2 (Dropout) (None, 100) 0 dense_2[0][0]
____________________________________________________________________________________________________
activation_2 (Activation) (None, 100) 0 dropout_2[0][0]
____________________________________________________________________________________________________
dense_3 (Dense) (None, 50) 5050 activation_2[0][0]
____________________________________________________________________________________________________
dropout_3 (Dropout) (None, 50) 0 dense_3[0][0]
____________________________________________________________________________________________________
activation_3 (Activation) (None, 50) 0 dropout_3[0][0]
____________________________________________________________________________________________________
dense_4 (Dense) (None, 10) 510 activation_3[0][0]
____________________________________________________________________________________________________
activation_4 (Activation) (None, 10) 0 dense_4[0][0]
____________________________________________________________________________________________________
dense_5 (Dense) (None, 1) 11 activation_4[0][0]
====================================================================================================
Total params: 5,171,319
Trainable params: 5,171,319
Non-trainable params: 0
____________________________________________________________________________________________________
The Nvidia End to End learning model in model.py
is defined in Keras and is given by the function:
def get_model(learning_rate=1e-4):
model = Sequential()
model.add(Lambda(lambda x: x/127.5 - 1.0 ,input_shape= (160,320,3)))
model.add(Cropping2D(cropping = ((65,25) ,(0,0))))
model.add(Convolution2D(24,5,5, subsample = (2,2),activation = 'relu'))
model.add(Convolution2D(36,5,5, subsample = (2,2),activation = 'relu'))
model.add(Convolution2D(48,5,5,subsample = (2,2) ,activation ='relu'))
model.add(Convolution2D(64,3,3, subsample=(1, 1), activation = 'relu'))
model.add(Convolution2D(64,3,3, subsample=(1, 1), activation='relu'))
model.add(Flatten())
model.add(Dense(1164))
model.add(Dropout(0.1))
model.add(Activation('relu'))
model.add(Dense(100))
model.add(Dropout(0.1))
model.add(Activation('relu'))
model.add(Dense(50))
model.add(Dropout(0.1))
model.add(Activation('relu'))
model.add(Dense(10))
model.add(Activation('relu'))
model.add(Dense(1))
model.summary()
model.compile(loss = 'mse', optimizer = Adam(learning_rate))
return model
As it can be seen in the summary, the model has 5,171,319
which are trainable, has 5 convolution layers and 5 fully connected layer. The image is cropped in the pipeline itself using the Cropping2D
layer in Keras.
The Udacity self-driving-car-sim was used in Training mode to capture images of while driving. There are 3 cameras, Center
, Left
and Right
positioned accordingly on the car. The steering angle is recorded simultaneously and are time synchronized and saved in driving_log.csv
Here is the tree of the dataset directory. IMG/
contains all the images given in the driving_log.csv
.
├── driving_log.csv
└── IMG
Here is an example of log from the driving_log.csv
Center Image | Left Image | Right Image | steering | throttle | break | speed |
---|---|---|---|---|---|---|
/IMG/center_some_timestamp.jpg | /IMG/left_some_timestamp.jpg | /IMG/right_some_timestamp.jpg | 0 | 0 | 0 | 4.301273E-06 |
Left Image | Center Image | Right Image |
---|---|---|
The row in the driving_log.csv
define the no of samples collected by driving in the driving simulator. The car was driven few laps in the simulator keeping it centered in the track and few more in the opposite direction to make the model less bias to a side, in this case left. To add some adversaries, some samples were collected by aggressive driving and also driving in the other track which had sharp turns. The dataset was the divided into Training
and Testing
by splitting it into 80% and 20% respectively. The actaul samples in training and testing set are given below
('train_samples', 32553)
('test_samples', 8139)
The dataHandler.py
uses the driving_log.csv
to load respective images. For each sample 3 images are loaded, namely Left, right and center. while the simulator in autonomous mode only uses the center camera. Thus a steering correction of 0.245
is compansated in the LEFT and RIGHT images. these images are also flipped and angles are negated to augment the data. Thus for every sample in driving_log.csv
, we obtain 5 data samples each containing an image and it's corresponding angle.
def flip_image(image,angle):
return np.fliplr(image), -1*angle
def augment_image(sample,X_data,Y_data):
center_img = cv2.imread(sample[0])
left_img = cv2.imread(sample[1])
right_img = cv2.imread(sample[2])
steering = float(sample[3])
X_data.append(center_img)
Y_data.append([steering])
X_data.append(left_img)
Y_data.append([steering+STEERING_CONST])
X_data.append(right_img)
Y_data.append([steering-STEERING_CONST])
flip_left, left_angle = flip_image(left_img, steering + STEERING_CONST)
X_data.append(flip_left)
Y_data.append([left_angle])
flip_right, right_angle = flip_image(right_img, steering - STEERING_CONST)
X_data.append(flip_right)
Y_data.append([right_angle])
To reduce memory while loading dataset for training, generators are used.
Training the model is handled by train.py
. The training dataset and testing dataset were shuffled thorougly and the testing dataset was used to validate the model while training.
SAMPLES_PER_EPOCH = 30000
EPOCHS = 8
VALIDATION_SAMPLES = 6400
LEARNING_RATE = 1e-4
The parameters were obtained by trial and error. It is generally observed that the effective learning rate for ADAM optimizer is 1e-4. It was also observed that After 8 epochs, the validation loss increased . This was an indication of overfitting.The mean square error loss obtained was approximately 0.036
and the validation loss was 0.032
. The model was then saved to model.h5
to be used for inference.
Plot of the training history | |
---|---|
training loss | |
no of epochs |
The model was tested by running the simulator in the autonomous mode and using model.h5
for inference. Initially the model was just trained for 2 laps and 2 epochs to see if the model was working correctly. Later more data was collected by driving aggressively and in the opposite direction. This helped the model to generalize and barely make some sharp turns. about 2 laps of data was collected from the other track which had very sharp turns. this made the model better in turns of taking smooth turns and avoiding crashes. The following script was executed to drive car autonomously in the simulator and record the images in imgs/
python drive.py model.h5 imgs/
To generate a dash cam video the following command was executed
python video.py imgs/