# Deep Reinforcement Learning Project Report (Navigation)

## Introduction

#### Project Overview
<p> 
    The aim of this project is to is to build and train an agent to navigate and collect as much as possible yellow bananas within short period in a large square world while avoiding the blue bananas. This implementation is based on Value-based methods. Based on the project requirement, the trained agent was able to get an avergae score of +13 over 100 consecutive episodes.
</p>

#### About the Environment
The environment for this project is based on [Unity ML Agent](https://github.com/Unity-Technologies/ml-agents).
The state space has 37 dimensions and contains the agent's velocity, along with ray-based perception of objects around the agent's forward direction. Given this information, the agent has to learn how to best select actions. 

Four discrete actions are available, corresponding to:

- **`0`** - Move Forward.
- **`1`** - Move Backward.
- **`2`** - Turn Left.
- **`3`** - Turn Right.

A reward of +1 is provided for collecting a yellow banana, and a reward of -1 is provided for collecting a blue banana. Thus, the goal of the agent is to collect as many yellow bananas as possible while avoiding blue bananas.

The task is episodic, and in order to solve the environment, the agent must get an average score of +13 over 100 consecutive episodes.


## Methodology and Algorithm

### Architecture
This project implementation uses a Value-based methods called Deep Q-Networks (DQN). The algorithm combines two(2)  methods;

1. Sarsa Max (Q-Learning)
2. Deep Neural Network for Q-table approximation

![image.png](attachment:image.png)

DeepMind made a tremendeous effort in the training process by using Experience Replay and Fixed Q Targets. This was published in their DQN Nature Paper - [Human-level control through deep reinforcement
learning](https://deepmind.com/research/publications/human-level-control-through-deep-reinforcement-learning)

### Code Structure
There are 3 main files that are very important for the implementation of this project. The files were structured this way for the sake of Modularity and easy debugging.

*``` model.py ```*: The file contains the implementation of the Q_Network class using the Pytorch Framework. The class consist of fully connected (fc) Deep Neural Network (DNN) for predicting actions to be taken by the agent. The network has:

- an Input Layer: This depends on the state_size parameter
- two (2) hidden fully connected(fc) layers with 256 and 128 nodes each.
- an output layer: This depends on the action_size parameter
- a ReLu activation function

   A ```.forward()``` method was used to build the network that maps state to action values
   
   **Network Architecture**
   
   ```
   Input nodes (37)
   (fc) fully connected linear layer (256 nodes, Relu activation)
   (fc) fully connected linear layer (128 nodes, Relu activation)
   Ouput nodes (4)
   ```
   
<br>
<br>

*``` dqn_agent.py ```*: This file consist of the DQN agent and Replay Buffer implementation. The Agent class consist most of methods as describe below:

- the Neural Network instances for local and target network was initialized, as well as the ReplayBuffer.

    - ``` .step() ``` - This method stores the steps taken by the agent (state, action, reward, next_state, done) in the Replay Buffer. It's also responsible for the Fixed Q target by updating the target network weights with the current  weights value from the local network every 4 steps.

    - ```.act()``` - This uses Epsilon-greedy policy to return the actions for the given state.

    - ```.learn()``` - It uses batch of Experience from the Replay buffer to update the Network Value Parameters.

    - ```.soft_update()``` - This is called to update the value from the target neural network using the local network weights during the learning process.
<br>

- ```class ReplayBuffer()``` 

    - ```.add()``` - It adds an experience step into memory
    
    - ```.sample()``` - It ramdonly samples batch of experience steps for learning


<br>
<br>

*``` Navigation.ipynb ```*: This notebook consist of codes for training the agent.

### Hyperparameters
These are hyperparameters used in the dqn_agent.py

```
BUFFER_SIZE = int(1e5) 
BATCH_SIZE = 64        
GAMMA = 0.9965         
TAU = 1e-3             
LR = 5e-4              
UPDATE_EVERY = 4 
```
    Adam Optimizer was used with learning rate (LR) of 

## Result

## Ideas on Performance Improvement