# Description of Implementation
## Learning Algorithm
* ### Algorithm - DQN Implementation
This project implements a *Value Based* method called [Deep Q-Networks](https://deepmind.com/research/dqn/). 

    Deep Q Learning combines 2 approaches :
    - A Reinforcement Learning method called [Q Learning](https://en.wikipedia.org/wiki/Q-learning) (aka SARSA max)
    - A Deep Neural Network to learn a Q-table approximation (action-values)

Especially, this implementation includes the 2 major training improvements by [Deepmind](https://deepmind.com) and described in their [Nature publication : "Human-level control through deep reinforcement learning (2015)"](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf)
    - Experience Replay 
    - Fixed Q Targets

> Reinforcement learning is known to be unstable or even to diverge when a nonlinear function approximator such as a neural network is used to represent the action-value (also known as Q) function20. This instability has several causes: the correlations present in the sequence of observations, the fact that small updates to Q may significantly change the policy and therefore change the data distribution, and the correlations
between the action-values and the target values .
We address these instabilities with a novel variant of Q-learning, which uses two key ideas. First, we used a biologically inspired mechanism termed experience replay that randomizes over the data, thereby removing correlations in the observation sequence and smoothing over changes in the data distribution. Second, we used an iterative update that adjusts the action-values towards target values that are only periodically updated, thereby reducing correlations with the target.

![Deep Q Network](images/DQN.png)
* ### Hyperparameters
    * Replay buffer size
      ```bash
      list BUFFER_SIZE = int(1e5)   
      ```
    * Minibatch size
      ```bash
      BATCH_SIZE = 64             
      ```
    * Discount factor
    ```bash
    GAMMA = 0.99                  
    ```
    *  For soft update of target parameters
    ```bash
    TAU = 1e-3                    
    ```
    * Learning rate 
    ```
    LR = 5e-4               
    ```
    * How often to update the network
    ```
    UPDATE_EVERY = 4        
    ```
    *  Number of nodes in first hidden layer
    ```bash
    fc1_units (int)         
    ```
    * Number of nodes in second hidden layer
    ```bash
    fc2_units (int)         
    ```
    * Start epsilon value
    ```bash
    eps_start = 1.0
    ```
    
    * End epsilon value
    ```bash
    eps_end = .01
    ```
    
    * Epsilon Decay Rate
    ```bash
    eps_decay = .995
    ```
* ### Neural Network Architecture
    The Neural Network will take states as inputs and output actions.
    ![Neural Network Structure](images/NN_architecture.png)

## Plot of Rewards

Running DQN with above hyperparameters and Neural Network Architecture, the agent is able to receive an average reward (over 100 episodes) of at least `+13` after `511` episodes!

![Epsode Solution](images/Epsode_solution.png)

## Ideas for Future Work
* ### Algorithm Updates
    * Implement Double DQN 
    * Implement Dueling DQN
* ### Dealing with Visualized Environment
    * Instead of using banana.app, use visualbanana.app
    * Examine the state space 
      ```bash
      change
      state = env_info.vector_observations[0]
      to
      state = env_info.visual_observations[0]
      ```
    * Design the suitable neural network architecture
    ```bash
      1. Examine the size of the state.
         It becomes (1,84,84,3), a 3D state with 3 channels.
      2. Build a Convolutional Neural Network to handle the pixel inputs.
         a. conv1 = nn.Conv3d(3, 10, (1, 3, 3), stride = (1, 3, 3))
         b. conv2 = nn.Conv3d(10, 128, (1,3, 3), stride = (1, 3, 3))
         c. pooling = nn.MaxPool3d((2,2,2), stride = (2,2,2) - reduce the size
         d. fc1 = nn.Linear(?, 64)
         e. fc2 = nn.Linear(64, 4)
         f. Relu as activation function
      3. Train the agent and watch the scores and check how many episodes it takes to solve the environment!
    ```