# Banana Agent Project
---

This notebook is for training an agent using a deep Q network algorithm to solve the first project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893).

### 1. Setting up the Environment
----

Firstly we will set up the Unity banana collection environment. Please make sure you are running a Python 3.6 kernel for this notebook. 


In [None]:
import sys
if not (sys.version_info[0] == 3 and sys.version_info[1] == 6):
    raise Exception("This notebook must be run with Python 3.6 to allow unityagents package to work correctly.")

from unityagents import UnityEnvironment
from dqn_banana_agent import DQNAgent, train_agent, load_agent, test_agent, criterion_check

import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd

To load the environment please download the correct Unity environment for your OS using the following links. 

- Linux: [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Linux.zip)
- Linux (Headless): [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Linux_NoVis.zip)
- Mac OSX: [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana.app.zip)
- Windows (32-bit): [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Windows_x86.zip)
- Windows (64-bit): [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Windows_x86_64.zip)

Once downloaded change the `path` variable to match the location of the environment that you downloaded.

<strong>Note: only run the following cell once. If the Unity environment has been closed or crashed please restart the kernel. <strong/>
    

In [None]:
path = "Banana.app"
env = UnityEnvironment(file_name=path)

### 2. Training Model
-----

In this section you are able to train a model from scratch using different hyperparameters and neural architectures.


In [None]:
# Collecting parameter infomation from the Unity environment
brain_name = env.brain_names[0]
env_info = env.reset(train_mode=True)[brain_name]
brain = env.brains[brain_name]


The next cells allows you to define hyperparameters and create your model.

Note: The numbers commented next to each parameter were what I used to solve the banana collection problem.

In [None]:
# Agent hyperparameters
buffer_size=20000 #20000
seed=10 #10

state_size=env_info.vector_observations.shape[1]
action_size=brain.vector_action_space_size
hidden_layers=[64,64,32] #[64,64,32]

epsilon=1. #1.
epsilon_decay=0.99995 #0.99995
epsilon_min=0.02 #0.02

gamma=0.95 #0.95
tau=0.001 #0.001
learning_rate=0.001 #0.001
update_frequency=5 #5

double_Q=True #True

prioritised_replay_buffer=True #True
alpha=0.6 #0.6
beta=0.7 #0.7
beta_increment_size=0.00001 #0.00001
base_priority=0.1 #0.1
max_priority=1 #1



In [None]:
#Creates agent with above hyperparameters.
 agent = DQNAgent(
        buffer_size,
        seed,
        state_size,
        action_size,
        hidden_layers,
        epsilon,
        epsilon_decay,
        epsilon_min,
        gamma,
        tau,
        learning_rate,
        update_frequency,
        double_Q,
        prioritised_replay_buffer,
        alpha,
        beta,
        beta_increment_size,
        base_priority,
        max_priority
 )

Finally this cell allows to train a new or loaded model.

In [None]:
# Commence training proccess
no_training_episodes = 100
experiance_batch_size = 128
save_name = "new_dqn_model.pth" # choose a name for the model (eg DQNagent.pth)
save_path = ""  # choose a path to save the best and final models, leave blank for current dir.
print_every = 25

scores = train_agent(env, agent, no_training_episodes, experiance_batch_size, save_name, save_path, print_every)
print(scores)

### 3. Loading Saved Model
----

Use this section to reload a previously saved model.



In [None]:
agent_path = "Trained_agents/double_per_tuned.pth" # Trained_agents/double_per_tuned.pth is an agent trained to solve the enviroment
                                                   # using a double Q and PER. 
show_parameters = True
agent = load_agent(agent_path, show_parameters)

### 4. Model Testing
----

The next cell will run the agent in evaluation mode and save the episode scores. If you would like to watch the agents actions set `quick_view` to `False`.

In [None]:
test_episode_number = 100
print_scores = True
quick_view = True

test_scores = test_agent(env, agent, test_episode_number, print_scores, quick_view)

### 5. Model Evaluation
----
In this section we can create graphs to evaluate the models training process and how it preforms during training as well as checking if the environment has been solved.
The following cell closes the unity environment, converts the agent's training scores into a Panda series for easy of analysis and increase plot size.

In [None]:
# env.close()
train_scores = pd.Series(agent.training_scores)
plt.rcParams['figure.dpi'] = 150

This cell plots both episode score and a rolling average for training rewards of the trained agent.

In [None]:
plt.title("Training Rewards per Episode")
plt.plot(train_scores,label = "Individual episode reward",color="midnightblue")
plt.plot(train_scores.rolling(50).mean(), label = "50 episode moving average", color="r")
plt.grid(True)
plt.xlabel("Episode Number")
plt.ylabel("Total Episode Reward")
plt.legend()
plt.show()

The final cell calculates the episode on which the >=13 reward over 100 episodes criteria was achieved and plots a graphs avergaed over  

In [None]:
passed = criterion_check(agent)
plt.title("Average Score over 100 Consecutive Training Episodes")
plt.plot(train_scores.rolling(100).mean()[100:], label="100 episode moving average", color="midnightblue")
plt.hlines(13,100,len(train_scores),label=">13 threshold",linestyle="--",color="r")
plt.grid(True)
plt.xlabel("Episode Number")
plt.ylabel("Total Episode Reward")
plt.legend()
plt.show()
