# Navigation Project Report 
## The Algorithm 
The implementation used to solve this project was a Double Deep Q-Learning (DQL) agent. Standard DQL uses two Neural Networks (NN) and updates them (at different rates) through experience replay. The `local_NN` is updated more often and used to estimate the expected Q-values and the `target_NN` updates less often and is used as the target Q-values. Often the `target_NN` is simply set to $target_{NN} =  local_{NN}$ every n steps, however this implementation also implements a soft_update, where instead each n steps $target_{NN} = \tau*local_{NN} + (1-\tau)*target_{NN}$ where $\tau$ is a hyperparameter. 

The difference between this and a Double DQL is solely in the calculation of the target values. Instead of using an `argmax()` over the outputs of the respective NNs to find the `target` and `expected` Q-values to compare, the Double DQL agent selects the "best" actions according solely to the `local_NN` which are then used to find the corresponding Q-values for those state, action pairs according to the `target_NN` to use as the target values.

## Results
Below is a list of all the parameters used (found through tests in `testing.ipynb` notebook).

In [2]:
nn_structure = [64, 64]   # NN with 2 64 node fully connected layers
gamma = 0.9               # Best gamma from testin
tau = 0.001               # Best Tau from testing
batch_n = 64              # Number of events to replay at each learning batch
learning_rate = 5e-4      # Default Learning rate
update_every = 4          # How often to replay events 

Initialising an agent with these parameters and training, we get the agent saved in `model_weights`. The rewards plot from this training is shown below. 

<img src="ouput_rewards.png">

The trained agent easily achieves a score of above 12 almost every time it is run, as can be seen in the snippet below, where the fully trained agent is allowed to run. 

<img src="banana_gif.gif">

To see a live running of the trained aent, you may run the final cells in `rtain_agent.ipynb`.

## Future Work 
There is still plenty that can be done to improve the model as it stands, through more indepth parameter tuning, as I only mainly considered the structure, $\gamma$ and $\tau$. 

Another way to possibly imporve the performace of this agent would be to include prioritised experience replay, which includes giving each experience in the replay buffer a weight inversesly proportional to how well the agent did on that particular experience when it was last seen. These weights are then turned into probabilities, whih are used to sample the next batch. This makes it more likely that experiences the agent struggles with will come up again.