# <p style="text-align: center;"> Self Driving Car in OpenAI Gym using Imitation Learning and Reinforcement Learning</p>
![title](https://miro.medium.com/max/1575/1*IQfXahuDuh0pgVE5fMpiFQ.gif )

# <p style="text-align: center;"> 1.0 Abstract </p> <a id='Introduction'></a>

We all know self-driving cars is one of the hottest areas of research and business for the tech giants. What seemed like a science-fiction, a few years ago, now seems more like something which is soon to become a part and parcel of life. The reason, I am saying “soon to be” is because of the fact that even though companies like Tesla, Nissan, Cadillac do have self-driving car assistance software, but, they still require a human to keep an eye on the road and take control when needed. However, it is fascinating to see how far we have come in terms of innovation and how fast technology is advancing. So much so, that now, with the help of basic deep learning, neural network magic, we can build our own pipeline for autonomous driving.

Our idea to try and build our very own self driving car emerged from here. In order to understand the basics of the process , we did this project in two parts. 

- Self Driving Car using Supervised Learning
- Self Driving Car using Reinforcement Learning

**PS- To make you understand the structure for the same, We have done this project in 3 parts, and all 3 parts are divided into seperate notebooks. And these individual notebooks contain the whole code and documentation of the entire part.**

### PS2- We have written a research paper while developing code for this project, you can check it out by clicking the [link](Research_Paper.pdf) 

### Folder Structure

**INFO7390_SelfDrivingCar**
- README.md
- Research Paper
- INFO7390_FinalProject.ipynb
- main_videos
- images_main_notebook
- autonomous.yml
- self-driving.yml
- requirements_SL.txt
- requirements_RL.txt
> Umbrella_Academy_INFO7390_Project
- INFO7390_Notebooks
    - sdc_gym (gym environment)
    - modules (py files)
        - SL_model.py
        - SL_data.py
        - RL_dqn.py
        - RL_car_dqn.py
        - RL_exp_replay.py
        - RL_processingimage.py
    - Basics_of_Convolutional_Neural_Network.ipynb
    - Self Driving Car using Supervised Learning
    - Basics_of_Deep_Q_Learning
    - Self Driving Car using Reinforcement Learning 
    - Supervised_IL_train_images
    - Supervised_IL_test_images
    - Supervised_IL_models
    - IL_Videos
    - Images (All images used in project)

### 1. Basics of CNN :- 

The main agenda of this notebook is as follow:-
> - To understand the convolution operation
> - To understand the pooling operation
> - Remembering the vocabulary used in convolutional neural networks (padding, stride, filter, etc.)
> - Building a convolutional neural network for multi-class classification in images
>- Basics of Imitation Learning

This notebook includes the basics of convolutional operations and whole network in general. This was a very integral part of our project and will serve as a guide for any beginner trying to understand CNN .

### 2.  Self Driving Car using Supervised Learning :- 
In this notebook ,we applied a supervised learning algorithm (convolution networks), to control the direction of a car in a 2D simulation. The notebook captures the following:-

> - How a convolution network works?
> - How to create the dataset and use it for training our network
> - How to use gym to retrieve the output of our neural network in order to control the simulation.

The general idea that we used is that of the supervised classifier. We are going to train a convolutional neural network to classify images in the game, according to three labels: left, right and straight ahead. We will then convert these commands into instructions for the simulator, which will execute them.


### 3. Basics of Deep Q-Learning:- 
The main agenda of this notebook is as follow:-

> - Q-Learning
> - Why ‘Deep’ Q-Learning?
> -Introduction to Deep Q-Learning
> - Challenges of Deep Reinforcement Learning as compared to Deep Learning
> - Experience Replay
> - Target Network

This notebook includes the basics of deep q learning. This was a very integral part of our project and will serve as a guide for any beginner trying to understand Q-Learning .


### 4.  Self Driving Car using Reinforcement Learning :-

In this notebook, a python based car racing environment is trained using a deep reinforcement learning algorithm to perform efficient self driving on a racing track. The notebook captures the following.

> - Development of a deep Q learning algorithm which is then used to train an autonomous driver agent. 
> - Different configurations in the deep Q learning algorithm parameters and in the neural network architecture are then tested and compared in order to obtain the best racing car average score over a period of 100 races. This score is given by the gym environment and can be seen on the bottom left corner.

According to OpenAI Gym, this environment is considered solved when the agent successfully reaches an average score of 900 on the last 100 runs. In this project, this goal was surpassed having obtained an average score of 905 over the last 100 runs. Therefore, we successfully solved the environment.


# <p style="text-align: center;"> Index </p>
- # 1 [Abstract](#Introduction)
- # 2 [Basics of CNN and Imitation Learning](./Umbrella_Academy_INFO7390_Project/INFO7390_Notebooks/Basics_of_Convolutional_Neural_Network_&_Imitation_Learning.ipynb)
- # 3 [Self Driving Car using Supervised Learning](./Umbrella_Academy_INFO7390_Project/INFO7390_Notebooks/Self_Driving_Car_Imitation_Learning.ipynb)
- # 4 [Basics of Deep Q learning](./Umbrella_Academy_INFO7390_Project/INFO7390_Notebooks/Basics_of_Deep_Q_Learning.ipynb)
- # 5 [Self Driving Car using Reinforcement Learning](./Umbrella_Academy_INFO7390_Project/INFO7390_Notebooks/RL_Self_Driving_Car.ipynb)
- # 6 [Conclusion](#Conclusion)

![title](images_main_notebook\Pytorch.jpeg)
![title](images_main_notebook\OpenAI_Gym.png)

#  Setting up the Environment  <a id='Environment'></a>

Before we start with the setup of our environment, we need to install a few pakages which will make our game and neural network work.

### 1) Gym facility
Install OpenAI Gym on the machine

Follow the instructions at https://github.com/openai/gym#installation for extensive and deep guide.

**Summary of instructions:**
- Install Python 3.5+
- Clone the gym repo: git clone https://github.com/openai/gym.git
- cd gym
- Gym installation, with the box2d environments: pip install -e '.[box2d]'

Follow the following steps to play the Car Racing Game
- cd gym/envs/box2d
- python car_racing.py

### 2) Pytorch
Pytorch is the deep learning framework that we will be using. It makes it possible to build neural networks very simply.

Follow the instructions on http://pytorch.org/ for a deep guide.

**Summary of instructions:**
- Install Python 3.5+
- It is recommended to manage PyTorch with Anaconda. Please install Anaconda
- Install PyTorch following instructions at https://pytorch.org/get-started/locally/
![title](images_main_notebook\Pytorch_Installation.png)

For example this is the setup for my Computer
> pip install torch==1.7.0+cpu torchvision==0.8.1+cpu torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html


[Back to top](#Introduction)

## The Environment

For this tutorial, we will use the gym library developed by OpenAI. It provides environments (simple games) to develop reinforcement learning algorithms.

The environment we will be using is CarRacing-v0 ( https://gym.openai.com/envs/CarRacing-v0/ ). It is about driving a car on a circuit, the objective being to move forward while staying on the track, which contains many turns. The input to the algorithm (the state provided by the environment) is only the image displayed by the environment: we see the car, and the terrain around it.
![title](images_main_notebook\car-racing.png)

To use the environment, you need to import it like this:

>import gym

>env = gym.make('CarRacing-v0').env

You can then access several useful functions:

- **env.reset() :** Allows you to restart the environment
- **env.step(action) :** Allows you to perform the action `action`. This function returns a tuple `state`, `reward`, `done`, `info` containing the state of the game after the action, the reward obtained, doneindicates if the game is finished, and infocontains debug data.
- **env.render() :** Displays the game window.

[Back to top](#Introduction)

# <p style="text-align: center;"> Conclusion<p><a id='Conclusion'></a>

### 1. Video Simulation of self driving car by supervised learning (Imitation Learning) :- 
//
<video controls src="main_videos/IL_Result.mp4"  width="500" height="340"/>

Our network recognizes the shapes to keep the car on the desired path. It's a sort of classifier that just indicates whether the car is in the right position, too far to the right or too far to the left. We then send this command to the simulator. All of this is done in real time.
    
> Behavioural Cloning though has a few disadvantages, and we can see them here in this notebook.
- We need to manually accelerate and decelerate, and we can only accelerate till a certain limit, because beyond that, the car will spin out of control and go outside in the patch of grass. 
- Since while training we never leave the track, the car has no way of coming back to the road after it has left the track and is into the grass.
- Here we only have a train set of 3000 and validation set of 600, but we tried increasing the sizes of these by a magintude of 10 (30,000 and 6000), but because of the substantial increase in the size of the dataset, the error while generating the dataset also shot up, which turned out to be a very bad dataset for out neural net. 
- Also, because we were well within the tracks, the car has no data on cases in which it goes out by accident.
- A possible remedy for this is preprocessing the data in such a way that the dataset has images of car coming in, but not going out.
 
### PS:- For seeing how this works refer to :- [Self Driving Car using Supervised Learning](./Umbrella_Academy_INFO7390_Project/INFO7390_Notebooks/Self_Driving_Car_Imitation_Learning.ipynb)

[Back to top](#Introduction)

### 2. Video Simulation of self driving car by Reinforcement learning (Deep Q Learning) :- 
//
<video controls src="main_videos/RL_SelfDriving.mp4"  height="340"/>

As we can clearly see from the video above, our bot is trained well and is able to drive the car by itself

Below we see the progression of scores with each time-step and training episode

### PS: -For seeing how this works refer to :- [RL_Self_Driving_Car](./Umbrella_Academy_INFO7390_Project/INFO7390_Notebooks/RL_Self_Driving_Car.ipynb)

**The first set of graphs show the average score across 1000 training episodes**

<table>
<tr>
<td> <img src="./Umbrella_Academy_INFO7390_Project/INFO7390_Notebooks/images/Plots/1000_time_step vs score.png" alt="Drawing" style="width: 500px;"/> </td>
<td> <img src="./Umbrella_Academy_INFO7390_Project/INFO7390_Notebooks/images/Plots/1000_training_Episodes vs score.png" alt="Drawing" style="width: 500px;"/> </td>
</tr>
</table>

**The next set of graphs show the average score across 5000 training episodes**

<table>
<tr>
<td> <img src="./Umbrella_Academy_INFO7390_Project/INFO7390_Notebooks/images/Plots/5000_time_step vs score.png" alt="Drawing" style="width: 500px;"/> </td>
<td> <img src="./Umbrella_Academy_INFO7390_Project/INFO7390_Notebooks/images/Plots/5000_training_Episodes vs score.png" alt="Drawing" style="width: 500px;"/> </td>
</tr>
</table>

**The next set of graphs show the average score across 7500 training episodes**

<table>
<tr>
<td> <img src="./Umbrella_Academy_INFO7390_Project/INFO7390_Notebooks/images/Plots/7500_time_step vs score.png" alt="Drawing" style="width: 500px;"/> </td>
<td> <img src="./Umbrella_Academy_INFO7390_Project/INFO7390_Notebooks/images/Plots/7500_training_Episodes vs score.png" alt="Drawing" style="width: 500px;"/> </td>
</tr>    
</table>

**The next set of graphs show the average score across 10K training episodes**

<table>
<tr>
<td> <img src="./Umbrella_Academy_INFO7390_Project/INFO7390_Notebooks/images/Plots/10K_time_step vs score.png" alt="Drawing" style="width: 500px;"/> </td>
<td> <img src="./Umbrella_Academy_INFO7390_Project/INFO7390_Notebooks/images/Plots/10K_training_Episodes vs score.png" alt="Drawing" style="width: 500px;"/> </td>
</tr>
</table>

As we can clearly see, there is a gradual increase in the score
- The first 1000 episodes we barely touch the score of 110
- By the time 5000 episodes are run though, we just touched the score of 800, averaging around 700s
- By 7500 we are averaging around the 800 score
- By 10000 episodes we see that the score is slightly above 800
- If we would have let it run for 15000 epochs or so, we're confident we would've acheived the game winning condition i.e. (average of 900 score over the last 100 episodes

Below we see the histogram of test scores for 1000, 5000, 7500, and 10000 episodes each.
- For the first 1000 episodes the counts of scores are concentrated between 0 & 100
- For 5000 episodes this shifts to 3 areas, around 800, between 0 & 100 and between 350 and 450
- For 7500 episodes this shifts from previous and the counts near 800+ scores rising over 1400 and other scores between 0 & 100 and around 400 diminishing
- For 10000 episodes we clearly see that over 6000 of the 10000 episodes result in a score of 800+


<table>
<tr>
<td> <img src="./Umbrella_Academy_INFO7390_Project/INFO7390_Notebooks/images/Plots/1000_Histogram of Test Scores.png" alt="Drawing" style="width: 500px;"/> </td>
<td> <img src="./Umbrella_Academy_INFO7390_Project/INFO7390_Notebooks/images/Plots/5000_Histogram of Test Scores.png" alt="Drawing" style="width: 500px;"/> </td>
</tr>
<tr>
<td> <img src="./Umbrella_Academy_INFO7390_Project/INFO7390_Notebooks/images/Plots/7500_Histogram of Test Scores.png" alt="Drawing" style="width: 500px;"/> </td>
<td> <img src="./Umbrella_Academy_INFO7390_Project/INFO7390_Notebooks/images/Plots/10K_Histogram of Test Scores.png" alt="Drawing" style="width: 500px;"/> </td>
</tr>
</table>


[Back to top](#Introduction)

# <p style="text-align: center;"> Future Work<p><a id='Future Work'></a>

## Imitation Learning

**Acceleration Control :**

The control of the car is not total here: the network only controls the lateral acceleration (the right / left direction) of the car, but does not control the acceleration (therefore the speed). The problem is that it is impossible to guess the speed of the car by looking at a single image, so it cannot control the acceleration to maintain a suitable speed.

- Use the speed bar which is under the image (the one that we have hidden). But the direction bar should be kept hidden, which misleads the direction classifier;
- Give the network several successive images, instead of just one. In this way, the network could deduce the speed of the car
- Ask the network to control only the speed, and not the acceleration (it is then necessary to code a feedback system that will maintain the requested speed): this approach is not really end-to-end but can be simpler if we has correct external data on the current speed (one could modify the environment to provide it in addition to the state).


**Data Increase :**

The best way to improve the performance of classifiers is to increase the amount of data. But here it is quite long because the data has to be saved while playing the game manually. 

- One way to artificially increase the amount of data is called data augmentation. 
- It is a question of carrying out transformations to the images, which will not modify the labels (or will modify it in a determined way). One can for example take the image symmetrical with respect to the vertical axis. 
- The left / right labels will then be inverted, and the amount of data is multiplied by 2 immediately. 
- Other possible transformations may be to distort the image a little or to modify the colors slightly (here the colors are fixed in the environment, so it will surely be less effective here than on real images).


## Reinforcement Learning
**Deep Q Learning :**

Deep Q Learning is a very interesting and fairly recent area of study. Many new adaptations and techniques are surely going to happen for deep Q learning in the coming years. To further try and improve the results presented on this report, the following ideas could be explored:

- Running the code on GPU with increased number of Epochs
- Testing Deep Q Learning variants such as Dueling Deep Q Learning or Double Deep Q.
- Testing different optimizers such as the RMSProp.
- Testing different network architectures. Maybe adding an extra convolutional layer or an extra dense layer can help the network to reach higher scores more consistently.
- Testing different hyperparameters, maybe adding having more memory available on the experience replay could help the model recall more experiences. Or, perhaps trying different learning rate exponential decay parameters could speed up the training more.
- Trying different weight regularization parameters.
- Exploring cloud computing, a technology that allows for models to be trained in the cloud using more powerful computation.

[Back to top](#Introduction)

# <p style="text-align: center;"> Contribution<p><a id='Contribution'></a>

    
- Code by self : 65%
- Code from external Sources : 35%

# <p style="text-align: center;">  Citations<p><a id='Citations'></a>

Citations are in the individual notebooks with the different parts.

# <p style="text-align: center;">  License<p><a id='License'></a>
Copyright (c) 2020 Rushabh Nisher, Manali Sharma

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.