# BEHAVIORAL CLONING PROJECT

The aim of this project is to design a system that can drive a car autonomously in a simulated environment.

Our process is splitted into the following several steps:
1. **Data Collecting**
    * The sample data is used to train the designed model. It's worth to mention that my previous code and strategy for collecting data from simulator can also work here.
    
2. **Data Cleaning and Tidying**
    * We check distribution of the data and remove outliers in this step
    
3. **Data Exploration and Augmentation.**
    * One observation from the last step is that the data is unblanced, which will cause a skewed result. So data augmentation is necessary. In this model, we use all images from center, left, and right cameras. Random flip, shift, brightness and shadow skill are used to augmentation. 
         
4. **Data Preprocessing**
    * Since the NVIDIA model (as a base) is going to be used in our method, some necessary preprocess steps, such as, resizing, convert the image from RGB format to YUV are should be done beforehands. We also take a cropping step so that the image only contain necessary information for predict a steering angle.

5. **Modeling and Deep Learning**
    * A lambda layer is used to normalize the preprocessed data. Then we throw them into the NVIDIA model, followed with a dropout (dropout rate is 0.5) layer and a flatten layer. As a last step, four full connected layers (->100->50->10->1) are used to obtain a final result. 
    * One interesting thing is, if the sample data is not properly augmented then the above model will result in both high training loss and validation loss. 

6. **Training and Saving**
    * We used Adam as the optimizer in this step. It is worth to mention that the default learning rate 0.001 is still too large to improve the validation loss.
    * The MSE is used to measure the loss.
    * Thanks to my reviewer, the ModelCheckpoint from Keras is used this time to save the best model.
    * The EPOCHS is set to 10. I tried other higher numbers, and found 10 should be enough. Even though the best validation loss arrives at epoch 7, but the models from epochs 3 are good enough to survive in the test track. 
    * The training and validation loss for each epoch are plot
    * PS. the data tidying step here is not enough and still allow some space to improve. If I remove this step, the obtained model works even better.
    
    


Links: 
* Simulators: [macOS](https://d17h27t6h515a5.cloudfront.net/topher/2016/November/5831f290_simulator-macos/simulator-macos.zip), [Windows 64-bit](https://d17h27t6h515a5.cloudfront.net/topher/2016/November/5831f3a4_simulator-windows-64/simulator-windows-64.zip), [Linux](https://d17h27t6h515a5.cloudfront.net/topher/2016/November/5831f0f7_simulator-linux/simulator-linux.zip)

In [None]:
import pandas as pd
import numpy as np
import cv2
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
%matplotlib inline

import os
from sklearn.model_selection import train_test_split

## DATA COLLECTING

In [None]:
data_dir = './data'
driving_log = pd.read_csv(os.path.join(data_dir, 'driving_log.csv'))

X = driving_log[['center', 'left', 'right']].values
y = driving_log['steering'].values

## DATA VISUALIZATION

In [None]:
driving_log.head()

### data distribution

In [None]:
plt.figure(figsize=(12,3))
plt.subplot(1,2,1)
plt.hist(y, bins=30)
plt.title("Data distribution")
plt.subplot(1,2,2)
plt.hist(y, bins=30)
plt.title("Truncated data distribution")
plt.ylim(0,40)


I didn't go deeper of the data and show the outliers in the range [-1, -0.4) and (0.4, 1]. Because after the data augmentation its proportion becomes low. If the model is trained properly, those effect hopefully can be ignored.

## TIDYING DATA

### remove the outliers in the range (0.5, 1] and [-1, -0.5)

In [None]:
check_set = []
for i in range(len(y)):
    if abs(y[i]) > 0.5:
        check_set.append([i, y[i]])

print("In total, there are {} samples whose steering angles are in the range [-1, -0.5) or (0.5, 1].".format(len(check_set)))



In [None]:
plt.figure(figsize=(12,24))
for i in range(len(check_set)):
    plt.subplot(11,4,i+1)
    row = i % 4
    image = mpimg.imread(os.path.join(data_dir, X[check_set[i][0]][0].strip()))
    plt.imshow(image)
    if row == 0:
        plt.title("row {}: {}".format(i//4 + 1, check_set[i][1]))
    else:
        plt.title(check_set[i][1])


It is not hard to find all images in row 5 and the first image of row 6 are outliers. We decide to remove them. The following formula is used to calculate their indexes.

(Row i, Col j) corresponding to the index (i - 1)* 4 + (j - 1)

In [None]:
remove_pre = [16, 17, 18, 19, 20]
remove_indexes = []
for i in range(len(remove_pre)):
    remove_indexes.append(check_set[remove_pre[i]][0])
print("The indexes of images that we are going to remove are: {}".format(remove_indexes))

# Remove outliers
X = np.delete(X, remove_indexes, 0)
y = np.delete(y, remove_indexes, 0)

In [None]:
stable_indexes = []
for i in range(len(y)):
    if abs(y[i]) < 0.05:
        stable_indexes.append(i)
print("There are {} samples in total whose steering angles are loacted in (-0.05, 0.05).".format(len(stable_indexes)))

remove_indexes = np.random.choice(stable_indexes, len(stable_indexes)*2//3)
#print(remove_indexes[0:10], len(remove_indexes))
X = np.delete(X, remove_indexes, 0)
y = np.delete(y, remove_indexes, 0)

In [None]:
print(len(y))
plt.figure(figsize=(12,3))
plt.subplot(1,2,1)
plt.hist(y, bins=30)
plt.title("Data distribution")
plt.subplot(1,2,2)
plt.hist(y, bins=30)
plt.title("Truncated data distribution")
plt.ylim(0,40)

## DATA AUGMENTATION 
### use all images obtained from center, left and right cameras.

In [None]:
def choose_image(data_dir, center, left, right, steering_angle):
    trigger = np.random.choice(3)
    if trigger == 0:
        return mpimg.imread(os.path.join(data_dir, left.strip())), steering_angle + 0.2
    elif trigger == 1:
        return mpimg.imread(os.path.join(data_dir, right.strip())), steering_angle - 0.2
    return mpimg.imread(os.path.join(data_dir, center.strip())), steering_angle

### random flip

In [None]:
def random_flip(image, steering_angle):
    if np.random.choice(2):
        image = cv2.flip(image, 1)
        steering_angle = -steering_angle
    return image, steering_angle

### random shift

In [None]:
def random_translate(image, steering_angle, range_x, range_y):
    """
    Randomly shift the image virtically and horizontally
    """
    trans_x = range_x * (np.random.rand() - 0.5)
    trans_y = range_y * (np.random.rand() - 0.5)
    steering_angle += trans_x * 0.002
    trans_m = np.float32([[1,0,trans_x], [0,1,trans_y]])
    height, width = image.shape[:2]
    image = cv2.warpAffine(image, trans_m, (width, height))
    return image, steering_angle

### random shadow

In [None]:
def random_shadow(image):
    """
    Generates and adds random shadow
    """
    # (x1, y1) and (x2, y2) forms a line
    # xm, ym gives all the locations of the image
    x1, y1 = IMAGE_WIDTH * np.random.rand(), 0
    x2, y2 = IMAGE_WIDTH * np.random.rand(), IMAGE_HEIGHT
    xm, ym = np.mgrid[0:IMAGE_HEIGHT, 0:IMAGE_WIDTH]
    
    mask = np.zeros_like(image[:, :, 1])
    mask[(ym - y1) * (x2 - x1) - (y2 - y1) * (xm - x1) > 0] = 1

    # choose which side should have shadow and adjust saturation
    cond = mask == np.random.randint(2)
    s_ratio = np.random.uniform(low=0.2, high=0.5)

    # adjust Saturation in HLS(Hue, Light, Saturation)
    hls = cv2.cvtColor(image, cv2.COLOR_RGB2HLS)
    hls[:, :, 1][cond] = hls[:, :, 1][cond] * s_ratio
    return cv2.cvtColor(hls, cv2.COLOR_HLS2RGB)

### random brightness

In [None]:
def random_brightness(image):
    image_hsv = cv2.cvtColor(image, cv2.COLOR_RGB2HSV)
    ratio = 1.0 + 0.4 * (np.random.rand() - 0.5)
    image_hsv[:,:,2] = image_hsv[:,:,2] * ratio
    return cv2.cvtColor(image_hsv, cv2.COLOR_HSV2RGB)

### data augmentation

In [None]:
def augument(data_dir, center, left, right, steering_angle, range_x=100, range_y=10):
    """
    Generate an augumented image and adjust steering angle.
    """
    image, steering_angle = choose_image(data_dir, center, left, right, steering_angle)
    image, steering_angle = random_flip(image, steering_angle)
    image, steering_angle = random_translate(image, steering_angle, range_x, range_y)
    image = random_shadow(image)
    image = random_brightness(image)
    return image, steering_angle

## DATA PREPROCESSING

In [None]:
def crop(image):
    """
    Remove the unrelevant content from image
    """
    return image[60:140,:,:]

def resize(image):
    """
    In order to fit the input shape of NVIDIA model
    """
    return cv2.resize(image, (IMAGE_WIDTH, IMAGE_HEIGHT), cv2.INTER_AREA)

def rgb2yuv(image):
    """
    Will be used in the NVIDIA model
    """
    return cv2.cvtColor(image, cv2.COLOR_RGB2YUV)

def preprocess(image):
    image = crop(image)
    image = resize(image)
    image = rgb2yuv(image)
    return image



## CREATE DATA PARTITION