# Navigation

---

This notebook uses a simplified version provided by Udacity of the Unity ML-Agents environment to train an agent to navigate (and collect bananas!) in a large, square world. 


![alt-text](banana.gif)


The simulation contains a single agent that has four actions at its disposal:
A reward of +1 is provided for collecting a yellow banana, and a reward of -1 is provided for collecting a blue banana. Thus, the goal of your agent is to collect as many yellow bananas as possible while avoiding blue bananas.

The state space has `37` dimensions and contains the agent's velocity, along with ray-based perception of objects around the agent's forward direction. 

Given this information, the agent has to learn how to best select actions. Four discrete actions are available, corresponding to:

- `0` - walk forward 
- `1` - walk backward
- `2` - turn left
- `3` - turn right


A reward of `+1` is provided for collecting a yellow banana, and a reward of `-1` is provided for collecting a blue banana. The task is episodic, and in order to solve the environment, the agent must get an average score of +13 over 100 consecutive episodes.



In [1]:
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
from unityagents import UnityEnvironment
from train import dqn
from agent import Agent

%matplotlib inline

## Get familiar wiht the Environment
In the following cells you will be introduced using the environment. 

### 1. Start the Environment
By executing the next code cell the environment will be started. Change the `file_name` parameter to match the location the of the Unity environment was unziped to.

- **Mac**: `"path/to/Banana.app"`
- **Linux** (x86): `"path/to/Banana_Linux/Banana.x86"`
- **Linux** (x86_64): `"path/to/Banana_Linux/Banana.x86_64"`
- **Linux** (x86, headless): `"path/to/Banana_Linux_NoVis/Banana.x86"`
- **Linux** (x86_64, headless): `"path/to/Banana_Linux_NoVis/Banana.x86_64"`

*For this notebook, a linux environment will be used.*

In [2]:
env = UnityEnvironment(file_name="Banana.app")

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: BananaBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 37
        Number of stacked Vector Observation: 1
        Vector Action space type: discrete
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


### 2. Examine the State and Action Spaces

The simulation contains a single agent that navigates a large environment. At each time step, it has four actions at its disposal:
- `0` - walk forward 
- `1` - walk backward
- `2` - turn left
- `3` - turn right

The state space has `37` dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction.  A reward of `+1` is provided for collecting a yellow banana, and a reward of `-1` is provided for collecting a blue banana. 

Environments contain brains which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

Run the code cell below to print some information about the environment.

In [3]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents in the environment
print('Number of agents:', len(env_info.agents))

# number of actions
action_size = brain.vector_action_space_size
print('Number of actions:', action_size)

# examine the state space 
state = env_info.vector_observations[0]
print('States look like:', state)
state_size = len(state)
print('States have length:', state_size)

Number of agents: 1
Number of actions: 4
States look like: [1.         0.         0.         0.         0.84408134 0.
 0.         1.         0.         0.0748472  0.         1.
 0.         0.         0.25755    1.         0.         0.
 0.         0.74177343 0.         1.         0.         0.
 0.25854847 0.         0.         1.         0.         0.09355672
 0.         1.         0.         0.         0.31969345 0.
 0.        ]
States have length: 37


### 3. Take Random Actions in the Environment

In the next code cell, you will learn how to use the Python API to control the agent and receive feedback from the environment.

Once this cell is executed, you will watch the agent's performance, if it selects an action (uniformly) at random with each time step. If running with graphics enabled, a window should pop up that allows you to observe the agent, as it moves through the environment.

After training the achieved score is printed, which is usually around zero.


In [4]:
env_info = env.reset(train_mode=False)[brain_name] # reset the environment
state = env_info.vector_observations[0]            # get the current state
score = 0                                          # initialize the score
while True:
    action = np.random.randint(action_size)        # select an action
    env_info = env.step(action)[brain_name]        # send the action to the environment
    next_state = env_info.vector_observations[0]   # get the next state
    reward = env_info.rewards[0]                   # get the reward
    done = env_info.local_done[0]                  # see if episode has finished
    score += reward                                # update the score
    state = next_state                             # roll over the state to next time step
    if done:                                       # exit loop if episode finished
        break
    
print("Score: {}".format(score))

Score: 0.0


## Training

In the next cell an agent is trained using a deep learning algorithm described in ...
A detailed description of the implementation is given here (report)

The following steps are executed:

1. Reset the environment for training
- Create the agent
- Perfom training is performed using the dqn algorithm form the train module 
- Assign scores and running mean to scores, running_mean variables
- Plot scores and running mean


In [5]:
from train import dqn
from agent import Agent

# reset env for training
env_info = env.reset(train_mode=True)[brain_name]

agent = Agent(state_size=37, action_size=4)
scores, running_mean = dqn(env=env, agent=agent, checkpoint_path='checkpoint_initial.pth')


Episode 100	Average Score: 1.03
Episode 200	Average Score: 4.77
Episode 300	Average Score: 8.02
Episode 400	Average Score: 9.51
Episode 500	Average Score: 11.05
Episode 562	Average Score: 13.02
Environment solved in 462 episodes!	Average Score: 13.02


In [None]:
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(np.arange(len(scores)), scores, label='scores')
plt.plot(np.arange(len(running_mean)), running_mean, c='r', label='average')
plt.ylabel('Score')
plt.xlabel('Episode #')
plt.legend(loc='upper left');
plt.show()

### Tuning the Learning Rate
To find optimal hyperparameters regarding the learning rate, the agent is trained with varying values for the 
learning decay. After training, results are plotted.
 

In [None]:
state_size = 37
action_size = 4

def do_training(path='results', checkpoint_path, result_path, eps_params):
    env_info = env.reset(train_mode=True)[brain_name]
    agent = Agent(state_size=state_size, action_size=action_size)
    scores, running_mean = dqn(env=env, agent=agent, checkpoint_path=checkpoint_path,
                              eps_start=eps_params[0], eps_end=eps_params[1], eps_decay=eps_params[2])
    df = pd.DataFrame(index=None)
    df['scores'] = scores
    df['running_mean'] = running_mean
    df.to_csv(result_path, index=None)

do_training('checkpoint_1.pth', 'results_1.csv', (1., 0.01, 0.99))
do_training('checkpoint_2.pth', 'results_2.csv', (1., 0.01, 0.95))
do_training('checkpoint_3.pth', 'results_3.csv', (1., 0.01, 0.89))
do_training('checkpoint_4.pth', 'results_4.csv', (1., 0.01, 0.55))
do_training('checkpoint_5.pth', 'results_5.csv', (1., 0.01, 0.50))


In [None]:
def plot_scores(path='results', result_file, title):
    data = pd.read_csv(path+'/'+result_file)
    fig = plt.figure()
    ax = fig.add_subplot(111)
    plt.title(title)
    plt.plot(np.arange(data.shape[0]), data['scores'], label='scores')
    plt.plot(np.arange(data.shape[0]), data['running_mean'], c='r', label='average')
    plt.ylabel('Score')
    plt.xlabel('Episode #')
    plt.legend(loc='upper left')
    
plot_scores('results_1.csv', 'decay: 0.99')
plot_scores('results_2.csv', 'decay: 0.95')
plot_scores('results_3.csv', 'decay: 0.89')
plot_scores('results_4.csv', 'decay: 0.55')
plot_scores('results_4.csv', 'decay: 0.50')

In [None]:
## Test best performing Agent

# initialize the agent
agent = Agent(state_size=state_size, action_size=action_size, seed=0)

# load the weights from file
agent.qnetwork_local.load_state_dict(torch.load('reults/checkpoint_4.pth'))

scores = []
for i_episode in range(1, 21):
    env_info = env.reset(train_mode=False)[brain_name] # reset the environment
    state = env_info.vector_observations[0]            # get the current state
    score = 0                                          # initialize the score
    while True:
        action = agent.act(state, eps=0)               # select an action
        env_info = env.step(action)[brain_name]        # send the action to the environment
        next_state = env_info.vector_observations[0]   # get the next state
        reward = env_info.rewards[0]                   # get the reward
        done = env_info.local_done[0]                  # see if episode has finished
        score += reward                                # update the score
        state = next_state                             # roll over the state to next time step
        if done:                                       # exit loop if episode finished
            scores.append(score)
            print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores)))
            break


When finished, close the environment.

In [None]:
env.close()

In [None]:
savefig('foo.png', bbox_inches='tight')