# Navigation

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the first project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893).

### 1. Start the Environment

We begin by importing some necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [1]:
%reload_ext autoreload
%autoreload 2

from collections import deque

from unityagents import UnityEnvironment
import numpy as np
import matplotlib.pyplot as plt
import torch

from navigation_agent import Agent

%matplotlib inline

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Banana.app"`
- **Windows** (x86): `"path/to/Banana_Windows_x86/Banana.exe"`
- **Windows** (x86_64): `"path/to/Banana_Windows_x86_64/Banana.exe"`
- **Linux** (x86): `"path/to/Banana_Linux/Banana.x86"`
- **Linux** (x86_64): `"path/to/Banana_Linux/Banana.x86_64"`
- **Linux** (x86, headless): `"path/to/Banana_Linux_NoVis/Banana.x86"`
- **Linux** (x86_64, headless): `"path/to/Banana_Linux_NoVis/Banana.x86_64"`

For instance, if you are using a Mac, then you downloaded `Banana.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Banana.app")
```

In [9]:
SCORE_CHECKPOINT_THRESHOLD = 13.0
IS_AGENT_TRAINED = False
IS_AGENT_LOADED = True

In [3]:
env = UnityEnvironment(file_name="Banana_Linux/Banana.x86_64")

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: BananaBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 37
        Number of stacked Vector Observation: 1
        Vector Action space type: discrete
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [4]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

### 2. Examine the State and Action Spaces

The simulation contains a single agent that navigates a large environment.  At each time step, it has four actions at its disposal:
- `0` - walk forward 
- `1` - walk backward
- `2` - turn left
- `3` - turn right

The state space has `37` dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction.  A reward of `+1` is provided for collecting a yellow banana, and a reward of `-1` is provided for collecting a blue banana. 

Run the code cell below to print some information about the environment.

In [5]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents in the environment
print('Number of agents:', len(env_info.agents))

# number of actions
action_size = brain.vector_action_space_size
print('Number of actions:', action_size)

# examine the state space 
state = env_info.vector_observations[0]
print('States look like:', state)
state_size = len(state)
print('States have length:', state_size)

Number of agents: 1
Number of actions: 4
States look like: [1.         0.         0.         0.         0.84408134 0.
 0.         1.         0.         0.0748472  0.         1.
 0.         0.         0.25755    1.         0.         0.
 0.         0.74177343 0.         1.         0.         0.
 0.25854847 0.         0.         1.         0.         0.09355672
 0.         1.         0.         0.         0.31969345 0.
 0.        ]
States have length: 37


In [None]:
def train_agent(env, agent, n_episodes=2000, max_t=1000, eps_start=1.0, eps_end=0.01, eps_decay=0.995):
    """Deep Q-Learning for Navigation (baseline)
    
    Params
    ======
        env : Unity enviroment
        agent : RL agent intance to be trained
        n_episodes (int): maximum number of training episodes
        max_t (int): maximum number of timesteps per episode
        eps_start (float): starting value of epsilon, for epsilon-greedy action selection
        eps_end (float): minimum value of epsilon
        eps_decay (float): multiplicative factor (per episode) for decreasing epsilon
    """

    
    scores = []                        # list containing scores from each episode
    scores_window = deque(maxlen=100)  # last 100 scores
    eps = eps_start                    # initialize epsilon
    for i_episode in range(1, n_episodes+1):
        brain_name = env.brain_names[0]
        brain = env.brains[brain_name]
        env_info = env.reset(train_mode=True)[brain_name]
        state = env_info.vector_observations[0]
        
        score = 0
        for t in range(max_t):
            action = agent.act(state, eps)
            next_state, reward, done = env_step(env, action)
            
            agent.step(state, action, reward, next_state, done)
            state = next_state
            score += reward
            if done:
                break 
        scores_window.append(score)       # save most recent score
        scores.append(score)              # save most recent score
        eps = max(eps_end, eps_decay*eps) # decrease epsilon
        print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_window)), end="")
        # if i_episode % 5 == 0:
        if True:
            print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_window)))
        if np.mean(scores_window) >= SCORE_CHECKPOINT_THRESHOLD:
            print('\nEnvironment solved in {:d} episodes!\tAverage Score: {:.2f}'.format(i_episode-100, np.mean(scores_window)))
            torch.save(agent.qnetwork_local.state_dict(), 'checkpoint.pth')
            break
    return scores


def env_step(env, action):
    """Return next_state, reward, done from environment given action"""
    env_info = env.step(action)[brain_name]
    next_state = env_info.vector_observations[0]
    reward = env_info.rewards[0]
    done = env_info.local_done[0]
    return next_state, reward, done

In [12]:
# torch.save(agent.qnetwork_local.state_dict(), 'checkpoint.pth')

In [6]:
agent = Agent(state_size=state_size, action_size=action_size, seed=0)

In [9]:
if IS_AGENT_TRAINED:
    env.reset(train_mode=True)
    scores = train_agent(env, agent)

    # plot the scores
    fig = plt.figure()
    ax = fig.add_subplot(111)
    plt.plot(np.arange(len(scores)), scores)
    plt.ylabel('Score')
    plt.xlabel('Episode #')
    plt.show()

Episode 1	Average Score: -1.00
Episode 2	Average Score: -0.50
Episode 3	Average Score: -0.33
Episode 4	Average Score: -0.25
Episode 5	Average Score: -0.60
Episode 6	Average Score: -0.50
Episode 7	Average Score: -0.43
Episode 8	Average Score: -0.38
Episode 9	Average Score: -0.33
Episode 10	Average Score: -0.30
Episode 11	Average Score: -0.09
Episode 12	Average Score: -0.08
Episode 13	Average Score: -0.15
Episode 14	Average Score: -0.21
Episode 15	Average Score: -0.20
Episode 16	Average Score: -0.19
Episode 17	Average Score: -0.18
Episode 18	Average Score: -0.11
Episode 19	Average Score: -0.16
Episode 20	Average Score: -0.15
Episode 21	Average Score: -0.10
Episode 22	Average Score: 0.05
Episode 23	Average Score: -0.04
Episode 24	Average Score: 0.00
Episode 25	Average Score: -0.04
Episode 26	Average Score: -0.08
Episode 27	Average Score: -0.15
Episode 28	Average Score: -0.07
Episode 29	Average Score: -0.14
Episode 30	Average Score: -0.10
Episode 31	Average Score: -0.13
Episode 32	Average 

Episode 259	Average Score: 3.90
Episode 260	Average Score: 3.94
Episode 261	Average Score: 3.89
Episode 262	Average Score: 3.88
Episode 263	Average Score: 3.89
Episode 264	Average Score: 3.86
Episode 265	Average Score: 3.93
Episode 266	Average Score: 3.96
Episode 267	Average Score: 4.05
Episode 268	Average Score: 4.14
Episode 269	Average Score: 4.19
Episode 270	Average Score: 4.19
Episode 271	Average Score: 4.20
Episode 272	Average Score: 4.21
Episode 273	Average Score: 4.21
Episode 274	Average Score: 4.29
Episode 275	Average Score: 4.29
Episode 276	Average Score: 4.34
Episode 277	Average Score: 4.28
Episode 278	Average Score: 4.39
Episode 279	Average Score: 4.34
Episode 280	Average Score: 4.42
Episode 281	Average Score: 4.36
Episode 282	Average Score: 4.42
Episode 283	Average Score: 4.49
Episode 284	Average Score: 4.51
Episode 285	Average Score: 4.56
Episode 286	Average Score: 4.63
Episode 287	Average Score: 4.65
Episode 288	Average Score: 4.60
Episode 289	Average Score: 4.57
Episode 

Episode 516	Average Score: 7.81
Episode 517	Average Score: 7.90
Episode 518	Average Score: 7.90
Episode 519	Average Score: 7.93
Episode 520	Average Score: 7.99
Episode 521	Average Score: 8.06
Episode 522	Average Score: 7.95
Episode 523	Average Score: 8.04
Episode 524	Average Score: 7.99
Episode 525	Average Score: 8.04
Episode 526	Average Score: 8.02
Episode 527	Average Score: 8.09
Episode 528	Average Score: 8.09
Episode 529	Average Score: 8.16
Episode 530	Average Score: 8.16
Episode 531	Average Score: 8.08
Episode 532	Average Score: 8.07
Episode 533	Average Score: 8.00
Episode 534	Average Score: 8.00
Episode 535	Average Score: 8.01
Episode 536	Average Score: 7.99
Episode 537	Average Score: 8.06
Episode 538	Average Score: 8.05
Episode 539	Average Score: 8.05
Episode 540	Average Score: 8.11
Episode 541	Average Score: 8.08
Episode 542	Average Score: 7.98
Episode 543	Average Score: 8.01
Episode 544	Average Score: 8.00
Episode 545	Average Score: 7.99
Episode 546	Average Score: 8.04
Episode 

Episode 773	Average Score: 9.16
Episode 774	Average Score: 9.23
Episode 775	Average Score: 9.16
Episode 776	Average Score: 9.12
Episode 777	Average Score: 9.02
Episode 778	Average Score: 8.98
Episode 779	Average Score: 8.91
Episode 780	Average Score: 8.98
Episode 781	Average Score: 9.12
Episode 782	Average Score: 9.01
Episode 783	Average Score: 9.11
Episode 784	Average Score: 9.16
Episode 785	Average Score: 9.11
Episode 786	Average Score: 9.21
Episode 787	Average Score: 9.23
Episode 788	Average Score: 9.28
Episode 789	Average Score: 9.24
Episode 790	Average Score: 9.33
Episode 791	Average Score: 9.37
Episode 792	Average Score: 9.47
Episode 793	Average Score: 9.40
Episode 794	Average Score: 9.27
Episode 795	Average Score: 9.28
Episode 796	Average Score: 9.27
Episode 797	Average Score: 9.31
Episode 798	Average Score: 9.36
Episode 799	Average Score: 9.45
Episode 800	Average Score: 9.43
Episode 801	Average Score: 9.46
Episode 802	Average Score: 9.48
Episode 803	Average Score: 9.50
Episode 

Episode 1026	Average Score: 10.82
Episode 1027	Average Score: 10.78
Episode 1028	Average Score: 10.92
Episode 1029	Average Score: 10.88
Episode 1030	Average Score: 10.84
Episode 1031	Average Score: 10.78
Episode 1032	Average Score: 10.70
Episode 1033	Average Score: 10.59
Episode 1034	Average Score: 10.54
Episode 1035	Average Score: 10.66
Episode 1036	Average Score: 10.52
Episode 1037	Average Score: 10.59
Episode 1038	Average Score: 10.61
Episode 1039	Average Score: 10.51
Episode 1040	Average Score: 10.43
Episode 1041	Average Score: 10.59
Episode 1042	Average Score: 10.60
Episode 1043	Average Score: 10.60
Episode 1044	Average Score: 10.48
Episode 1045	Average Score: 10.52
Episode 1046	Average Score: 10.51
Episode 1047	Average Score: 10.61
Episode 1048	Average Score: 10.75
Episode 1049	Average Score: 10.81
Episode 1050	Average Score: 10.83
Episode 1051	Average Score: 10.88
Episode 1052	Average Score: 10.86
Episode 1053	Average Score: 11.02
Episode 1054	Average Score: 10.96
Episode 1055	A

Episode 1267	Average Score: 11.65
Episode 1268	Average Score: 11.77
Episode 1269	Average Score: 11.72
Episode 1270	Average Score: 11.77
Episode 1271	Average Score: 11.82
Episode 1272	Average Score: 11.76
Episode 1273	Average Score: 11.61
Episode 1274	Average Score: 11.54
Episode 1275	Average Score: 11.63
Episode 1276	Average Score: 11.65
Episode 1277	Average Score: 11.52
Episode 1278	Average Score: 11.54
Episode 1279	Average Score: 11.46
Episode 1280	Average Score: 11.37
Episode 1281	Average Score: 11.29
Episode 1282	Average Score: 11.28
Episode 1283	Average Score: 11.18
Episode 1284	Average Score: 11.33
Episode 1285	Average Score: 11.36
Episode 1286	Average Score: 11.36
Episode 1287	Average Score: 11.48
Episode 1288	Average Score: 11.47
Episode 1289	Average Score: 11.47
Episode 1290	Average Score: 11.50
Episode 1291	Average Score: 11.50
Episode 1292	Average Score: 11.38
Episode 1293	Average Score: 11.46
Episode 1294	Average Score: 11.51
Episode 1295	Average Score: 11.59
Episode 1296	A

Episode 1508	Average Score: 12.93
Episode 1509	Average Score: 12.96
Episode 1510	Average Score: 12.93
Episode 1511	Average Score: 13.05
Episode 1512	Average Score: 13.03
Episode 1513	Average Score: 12.94
Episode 1514	Average Score: 13.02
Episode 1515	Average Score: 13.03
Episode 1516	Average Score: 12.98
Episode 1517	Average Score: 12.89
Episode 1518	Average Score: 12.88
Episode 1519	Average Score: 12.93
Episode 1520	Average Score: 12.89
Episode 1521	Average Score: 12.79
Episode 1522	Average Score: 12.81
Episode 1523	Average Score: 12.71
Episode 1524	Average Score: 12.75
Episode 1525	Average Score: 12.86
Episode 1526	Average Score: 12.80
Episode 1527	Average Score: 12.76
Episode 1528	Average Score: 12.63
Episode 1529	Average Score: 12.63
Episode 1530	Average Score: 12.56
Episode 1531	Average Score: 12.58
Episode 1532	Average Score: 12.74
Episode 1533	Average Score: 12.76
Episode 1534	Average Score: 12.80
Episode 1535	Average Score: 12.72
Episode 1536	Average Score: 12.84
Episode 1537	A

Episode 1749	Average Score: 14.13
Episode 1750	Average Score: 14.11
Episode 1751	Average Score: 14.10
Episode 1752	Average Score: 14.09
Episode 1753	Average Score: 14.05
Episode 1754	Average Score: 13.92
Episode 1755	Average Score: 14.01
Episode 1756	Average Score: 13.94
Episode 1757	Average Score: 13.99
Episode 1758	Average Score: 13.97
Episode 1759	Average Score: 13.90
Episode 1760	Average Score: 14.03
Episode 1761	Average Score: 14.05
Episode 1762	Average Score: 14.03
Episode 1763	Average Score: 13.99
Episode 1764	Average Score: 14.05
Episode 1765	Average Score: 14.07
Episode 1766	Average Score: 14.21
Episode 1767	Average Score: 14.20
Episode 1768	Average Score: 14.18
Episode 1769	Average Score: 14.20
Episode 1770	Average Score: 14.15
Episode 1771	Average Score: 14.16
Episode 1772	Average Score: 14.27
Episode 1773	Average Score: 14.30
Episode 1774	Average Score: 14.21
Episode 1775	Average Score: 14.16
Episode 1776	Average Score: 14.09
Episode 1777	Average Score: 14.00
Episode 1778	A

KeyboardInterrupt: 

In [None]:
if IS_AGENT_LOADED:
    print("Loaded trained agent")
    agent.qnetwork_local.load_state_dict(torch.load('checkpoint.pth'))

    env_info = env.reset(train_mode=False)[brain_name] # reset the environment
    state = env_info.vector_observations[0]            # get the current state
    score = 0                                         # initialize the score
    while True:
        action = agent.act(state, 0.01)                # select an action      
        env_info = env.step(action)[brain_name]        # send the action to the environment
        next_state = env_info.vector_observations[0]   # get the next state
        reward = env_info.rewards[0]                   # get the reward
        done = env_info.local_done[0]                  # see if episode has finished
        score += reward                                # update the score
        state = next_state                             # roll over the state to next time step
        if done:                                       # exit loop if episode finished
            break

    print("Score: {}".format(score))
else:
    print("Random agent")
    env_info = env.reset(train_mode=False)[brain_name] # reset the environment
    state = env_info.vector_observations[0]            # get the current state
    score = 0                                          # initialize the score
    while True:
        action = np.random.randint(action_size)        # select an action
        env_info = env.step(action)[brain_name]        # send the action to the environment
        next_state = env_info.vector_observations[0]   # get the next state
        reward = env_info.rewards[0]                   # get the reward
        done = env_info.local_done[0]                  # see if episode has finished
        score += reward                                # update the score
        state = next_state                             # roll over the state to next time step
        if done:                                       # exit loop if episode finished
            break

    print("Score: {}".format(score))

Loaded trained agent
