# Navigation

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the first project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893).

### 1. Start the Environment

We begin by importing some necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [1]:
from unityagents import UnityEnvironment

import numpy as np


Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Banana.app"`
- **Windows** (x86): `"path/to/Banana_Windows_x86/Banana.exe"`
- **Windows** (x86_64): `"path/to/Banana_Windows_x86_64/Banana.exe"`
- **Linux** (x86): `"path/to/Banana_Linux/Banana.x86"`
- **Linux** (x86_64): `"path/to/Banana_Linux/Banana.x86_64"`
- **Linux** (x86, headless): `"path/to/Banana_Linux_NoVis/Banana.x86"`
- **Linux** (x86_64, headless): `"path/to/Banana_Linux_NoVis/Banana.x86_64"`

For instance, if you are using a Mac, then you downloaded `Banana.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Banana.app")
```

In [2]:
env = UnityEnvironment(file_name="Banana.app")


INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: BananaBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 37
        Number of stacked Vector Observation: 1
        Vector Action space type: discrete
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [5]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]


### 2. Examine the State and Action Spaces

The simulation contains a single agent that navigates a large environment.  At each time step, it has four actions at its disposal:
- `0` - walk forward 
- `1` - walk backward
- `2` - turn left
- `3` - turn right

The state space has `37` dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction.  A reward of `+1` is provided for collecting a yellow banana, and a reward of `-1` is provided for collecting a blue banana. 

Run the code cell below to print some information about the environment.

In [6]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents in the environment
print('Number of agents:', len(env_info.agents))

# number of actions
action_size = brain.vector_action_space_size
print('Number of actions:', action_size)

# examine the state space 
state = env_info.vector_observations[0]
print('States look like:', state)
state_size = len(state)
print('States have length:', state_size)

Number of agents: 1
Number of actions: 4
States look like: [1.         0.         0.         0.         0.84408134 0.
 0.         1.         0.         0.0748472  0.         1.
 0.         0.         0.25755    1.         0.         0.
 0.         0.74177343 0.         1.         0.         0.
 0.25854847 0.         0.         1.         0.         0.09355672
 0.         1.         0.         0.         0.31969345 0.
 0.        ]
States have length: 37


### 3. Take Random Actions in the Environment

In the next code cell, you will learn how to use the Python API to control the agent and receive feedback from the environment.

Once this cell is executed, you will watch the agent's performance, if it selects an action (uniformly) at random with each time step.  A window should pop up that allows you to observe the agent, as it moves through the environment.  

Of course, as part of the project, you'll have to change the code so that the agent is able to use its experience to gradually choose better actions when interacting with the environment!

In [8]:
env_info = env.reset(train_mode=False)[brain_name] # reset the environment
state = env_info.vector_observations[0]            # get the current state
score = 0                                          # initialize the score
while True:
    action = np.random.randint(action_size)        # select an action
    env_info = env.step(action)[brain_name]        # send the action to the environment
    next_state = env_info.vector_observations[0]   # get the next state
    reward = env_info.rewards[0]                   # get the reward
    done = env_info.local_done[0]                  # see if episode has finished
    score += reward                                # update the score
    state = next_state                             # roll over the state to next time step
    if done:                                       # exit loop if episode finished
        break
    
print("Score: {}".format(score))


KeyboardInterrupt: 

When finished, you can close the environment.

In [9]:
env.close()


### 4. It's Your Turn!

Now it's your turn to train your own agent to solve the environment!  When training the environment, set `train_mode=True`, so that the line for resetting the environment looks like the following:
```python
env_info = env.reset(train_mode=True)[brain_name]
```

In [8]:
from dqn_agent import Agent


def train(agent, environment, n_episodes, max_t=1000, eps_start=1.0, eps_decay_rate=0.995, eps_end=0.01):
    eps = eps_start
    
    scores = []
    
    for i_episode in range(1, n_episodes + 1):
        print(f"Entering episode {i_episode}")
        env_info = environment.reset(train_mode=True)[brain_name]
        state = env_info.vector_observations[0]
        
        score = 0

        for t in range(max_t):
            action = agent.act(state, eps)
            env_info = environment.step(action)[brain_name]
            next_state = env_info.vector_observations[0]
            reward = env_info.rewards[0]
            done = env_info.local_done[0]
            score += reward
            agent.step(state, action, reward, next_state, done)

            state = next_state
            if done:
                break
        
        scores.append(score)
        eps = max(eps_end, eps * eps_decay_rate)
        print(f"score: {score}. Average score so far: {np.mean(scores)}")


training_agent = Agent(state_size=state_size, action_size=action_size, seed=0)

train(training_agent, env, 100)


Entering episode 1


score: -1.0. Average score so far: -1.0
Entering episode 2


score: 0.0. Average score so far: -0.5
Entering episode 3


score: -2.0. Average score so far: -1.0
Entering episode 4


score: 0.0. Average score so far: -0.75
Entering episode 5


score: -1.0. Average score so far: -0.8
Entering episode 6


score: -1.0. Average score so far: -0.8333333333333334
Entering episode 7


score: 0.0. Average score so far: -0.7142857142857143
Entering episode 8


score: 0.0. Average score so far: -0.625
Entering episode 9


score: 1.0. Average score so far: -0.4444444444444444
Entering episode 10


score: 0.0. Average score so far: -0.4
Entering episode 11


score: -1.0. Average score so far: -0.45454545454545453
Entering episode 12


score: -1.0. Average score so far: -0.5
Entering episode 13


score: -1.0. Average score so far: -0.5384615384615384
Entering episode 14


score: -2.0. Average score so far: -0.6428571428571429
Entering episode 15


score: -2.0. Average score so far: -0.7333333333333333
Entering episode 16


score: 1.0. Average score so far: -0.625
Entering episode 17


score: 1.0. Average score so far: -0.5294117647058824
Entering episode 18


score: 1.0. Average score so far: -0.4444444444444444
Entering episode 19


score: -1.0. Average score so far: -0.47368421052631576
Entering episode 20


score: 0.0. Average score so far: -0.45
Entering episode 21


score: 1.0. Average score so far: -0.38095238095238093
Entering episode 22


score: -1.0. Average score so far: -0.4090909090909091
Entering episode 23


score: 0.0. Average score so far: -0.391304347826087
Entering episode 24


score: 1.0. Average score so far: -0.3333333333333333
Entering episode 25


score: 1.0. Average score so far: -0.28
Entering episode 26


score: 0.0. Average score so far: -0.2692307692307692
Entering episode 27


score: -1.0. Average score so far: -0.2962962962962963
Entering episode 28


score: 1.0. Average score so far: -0.25
Entering episode 29


score: 0.0. Average score so far: -0.2413793103448276
Entering episode 30


score: -3.0. Average score so far: -0.3333333333333333
Entering episode 31


score: 0.0. Average score so far: -0.3225806451612903
Entering episode 32


score: 0.0. Average score so far: -0.3125
Entering episode 33


score: 0.0. Average score so far: -0.30303030303030304
Entering episode 34


score: 0.0. Average score so far: -0.29411764705882354
Entering episode 35


score: 0.0. Average score so far: -0.2857142857142857
Entering episode 36


score: -2.0. Average score so far: -0.3333333333333333
Entering episode 37


score: 0.0. Average score so far: -0.32432432432432434
Entering episode 38


score: 1.0. Average score so far: -0.2894736842105263
Entering episode 39


score: 1.0. Average score so far: -0.2564102564102564
Entering episode 40


score: 0.0. Average score so far: -0.25
Entering episode 41


score: 0.0. Average score so far: -0.24390243902439024
Entering episode 42


score: 2.0. Average score so far: -0.19047619047619047
Entering episode 43


score: -1.0. Average score so far: -0.20930232558139536
Entering episode 44


score: 0.0. Average score so far: -0.20454545454545456
Entering episode 45


score: 1.0. Average score so far: -0.17777777777777778
Entering episode 46


score: -1.0. Average score so far: -0.1956521739130435
Entering episode 47


score: 3.0. Average score so far: -0.1276595744680851
Entering episode 48


score: 2.0. Average score so far: -0.08333333333333333
Entering episode 49


score: 0.0. Average score so far: -0.08163265306122448
Entering episode 50


score: 1.0. Average score so far: -0.06
Entering episode 51


score: 1.0. Average score so far: -0.0392156862745098
Entering episode 52


score: -1.0. Average score so far: -0.057692307692307696
Entering episode 53


score: 3.0. Average score so far: 0.0
Entering episode 54


score: 1.0. Average score so far: 0.018518518518518517
Entering episode 55


score: 2.0. Average score so far: 0.05454545454545454
Entering episode 56


score: 1.0. Average score so far: 0.07142857142857142
Entering episode 57


score: 1.0. Average score so far: 0.08771929824561403
Entering episode 58


score: 4.0. Average score so far: 0.15517241379310345
Entering episode 59


score: 0.0. Average score so far: 0.15254237288135594
Entering episode 60


score: 3.0. Average score so far: 0.2
Entering episode 61


score: 2.0. Average score so far: 0.22950819672131148
Entering episode 62


score: 2.0. Average score so far: 0.25806451612903225
Entering episode 63


score: 0.0. Average score so far: 0.25396825396825395
Entering episode 64


score: 0.0. Average score so far: 0.25
Entering episode 65


score: 2.0. Average score so far: 0.27692307692307694
Entering episode 66


score: 0.0. Average score so far: 0.2727272727272727
Entering episode 67


score: 0.0. Average score so far: 0.26865671641791045
Entering episode 68


score: -1.0. Average score so far: 0.25
Entering episode 69


score: -1.0. Average score so far: 0.2318840579710145
Entering episode 70


score: 1.0. Average score so far: 0.24285714285714285
Entering episode 71


score: 2.0. Average score so far: 0.2676056338028169
Entering episode 72


score: 1.0. Average score so far: 0.2777777777777778
Entering episode 73


score: 4.0. Average score so far: 0.3287671232876712
Entering episode 74


score: 2.0. Average score so far: 0.35135135135135137
Entering episode 75


score: 1.0. Average score so far: 0.36
Entering episode 76


score: 5.0. Average score so far: 0.42105263157894735
Entering episode 77


score: 2.0. Average score so far: 0.44155844155844154
Entering episode 78


score: 0.0. Average score so far: 0.4358974358974359
Entering episode 79


score: 1.0. Average score so far: 0.4430379746835443
Entering episode 80


score: 4.0. Average score so far: 0.4875
Entering episode 81


score: 2.0. Average score so far: 0.5061728395061729
Entering episode 82


score: 0.0. Average score so far: 0.5
Entering episode 83


score: 0.0. Average score so far: 0.4939759036144578
Entering episode 84


score: 1.0. Average score so far: 0.5
Entering episode 85


score: 3.0. Average score so far: 0.5294117647058824
Entering episode 86


score: 1.0. Average score so far: 0.5348837209302325
Entering episode 87


score: 2.0. Average score so far: 0.5517241379310345
Entering episode 88


score: 1.0. Average score so far: 0.5568181818181818
Entering episode 89


score: -1.0. Average score so far: 0.5393258426966292
Entering episode 90


score: 3.0. Average score so far: 0.5666666666666667
Entering episode 91


score: 4.0. Average score so far: 0.6043956043956044
Entering episode 92


score: 1.0. Average score so far: 0.6086956521739131
Entering episode 93


score: 0.0. Average score so far: 0.6021505376344086
Entering episode 94


score: 3.0. Average score so far: 0.6276595744680851
Entering episode 95


score: 3.0. Average score so far: 0.6526315789473685
Entering episode 96


score: 4.0. Average score so far: 0.6875
Entering episode 97


score: 0.0. Average score so far: 0.6804123711340206
Entering episode 98


score: 3.0. Average score so far: 0.7040816326530612
Entering episode 99


score: 3.0. Average score so far: 0.7272727272727273
Entering episode 100


score: 4.0. Average score so far: 0.76
