# Navigation

---

In this notebook,  we will use the Unity ML-Agents environment for the first project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893).

### 1. Start the Environment




#### 1.1 Unity ML-Agents
Unity Machine Learning Agents (ML-Agents) is an open-source Unity plugin that enables games and simulations to serve as environments for training intelligent agents. For game developers, these trained agents can be used for multiple purposes, including controlling [NPC](https://en.wikipedia.org/wiki/Non-player_character) behavior (in a variety of settings such as multi-agent and adversarial), automated testing of game builds and evaluating different game design decisions pre-release.

In this project, we will use Unity's rich environments to design, train, and evaluate your own deep reinforcement learning algorithms. You can read more about ML-Agents by perusing the [GitHub repository](https://github.com/Unity-Technologies/ml-agents).




#### 1.2 The environment

For this project, we will train an agent to navigate (and collect bananas!) in a large, square world.
A reward of +1 is provided for collecting a yellow banana, and a reward of -1 is provided for collecting a blue banana. Thus, the goal of your agent is to collect as many yellow bananas as possible while avoiding blue bananas.

![SegmentLocal](banana.gif "banana")

The state space has 37 dimensions and contains the agent's velocity, along with ray-based perception of objects around the agent's forward direction. Given this information, the agent has to learn how to best select actions. Four discrete actions are available, corresponding to:

- 0 move forward.
- 1 move backward.
- 2 turn left.
- 3 turn right.

The task is episodic, and in order to solve the environment, your agent must get an average score of +13 over 100 consecutive episodes.

**Note:**
The project environment is similar to, but not identical to the Banana Collector environment on the Unity ML-Agents GitHub page.




#### 1.3 Packages
We begin by importing some necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [1]:
from unityagents import UnityEnvironment
import numpy as np

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Banana.app"`
- **Windows** (x86): `"path/to/Banana_Windows_x86/Banana.exe"`
- **Windows** (x86_64): `"path/to/Banana_Windows_x86_64/Banana.exe"`
- **Linux** (x86): `"path/to/Banana_Linux/Banana.x86"`
- **Linux** (x86_64): `"path/to/Banana_Linux/Banana.x86_64"`
- **Linux** (x86, headless): `"path/to/Banana_Linux_NoVis/Banana.x86"`
- **Linux** (x86_64, headless): `"path/to/Banana_Linux_NoVis/Banana.x86_64"`

For instance, if you are using a Mac, then you downloaded `Banana.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Banana.app")
```

In [2]:
env = UnityEnvironment(file_name="Banana.app")

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: BananaBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 37
        Number of stacked Vector Observation: 1
        Vector Action space type: discrete
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [3]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

### 2. Examine the State and Action Spaces

The simulation contains a single agent that navigates a large environment.  At each time step, it has four actions at its disposal:
- `0` - walk forward 
- `1` - walk backward
- `2` - turn left
- `3` - turn right

The state space has `37` dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction.  A reward of `+1` is provided for collecting a yellow banana, and a reward of `-1` is provided for collecting a blue banana. 

Run the code cell below to print some information about the environment.

In [4]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents in the environment
print('Number of agents:', len(env_info.agents))

# number of actions
action_size = brain.vector_action_space_size
print('Number of actions:', action_size)

# examine the state space 
state = env_info.vector_observations[0]
print('States look like:', state)
state_size = len(state)
print('States have length:', state_size)

Number of agents: 1
Number of actions: 4
States look like: [1.         0.         0.         0.         0.84408134 0.
 0.         1.         0.         0.0748472  0.         1.
 0.         0.         0.25755    1.         0.         0.
 0.         0.74177343 0.         1.         0.         0.
 0.25854847 0.         0.         1.         0.         0.09355672
 0.         1.         0.         0.         0.31969345 0.
 0.        ]
States have length: 37


### 3. Take Random Actions in the Environment

In the next code cell, we will see how to use the Python API to control the agent and receive feedback from the environment.

Once this cell is executed, we can watch the agent's performance, if it selects an action (uniformly) at random with each time step.  A window should pop up that allows us to observe the agent, as it moves through the environment.  

Of course, as part of the project, we will change the code so that the agent is able to use its experience to gradually choose better actions when interacting with the environment!

In [5]:
env_info = env.reset(train_mode=False)[brain_name] # reset the environment
state = env_info.vector_observations[0]            # get the current state
score = 0                                          # initialize the score
while True:
    action = np.random.randint(action_size)        # select an action
    env_info = env.step(action)[brain_name]        # send the action to the environment
    next_state = env_info.vector_observations[0]   # get the next state
    reward = env_info.rewards[0]                   # get the reward
    done = env_info.local_done[0]                  # see if episode has finished
    score += reward                                # update the score
    state = next_state                             # roll over the state to next time step
    if done:                                       # exit loop if episode finished
        break
    
print("Score: {}".format(score))

Score: 0.0


### 4. Train the agent

Next train our own agent to solve the environment!  When training the environment, set `train_mode=True`, so that the line for resetting the environment looks like the following:
```python
env_info = env.reset(train_mode=True)[brain_name]
```

In [6]:
import random
import torch
import numpy as np
from collections import deque
import matplotlib.pyplot as plt
%matplotlib inline

In [7]:
# Instantiate the Environment and Agent
from agent import Agent
framework = 'DQN'  #  'DQN' or 'DDQN'
buffer_type = 'PER_ReplayBuffer'  
#buffer_type = 'ReplayBuffer' 
# setup parameter framework to determine whether to use DQN or DDQN; Different framework has different def of function "learn"
agent = Agent(state_size=state_size, action_size=action_size, seed=0, framework = framework, buffer_type = buffer_type)


In [8]:
# watch an untrained agent
env_info = env.reset(train_mode=True)[brain_name]
state = env_info.vector_observations[0]            # get the current state
score = 0  
for j in range(200):
    action = agent.act(state)
    env_info = env.step(action)[brain_name]
    next_state = env_info.vector_observations[0]
    reward = env_info.rewards[0]
    done = env_info.local_done[0]
    #state, reward, done, _ = env.step(action)
    if done:
        break 
        
print('Score: {}'.format(score))

Score: 0


#### Train the Agent with framework is DQN or DDQN and buffer type is Replay Buffer or PER Replay Buffer

In [9]:
def dqn_ddqn(n_episodes=3000, max_t=1000, eps_start=1.0, eps_end=0.01, eps_decay=0.99):
    """Deep Q-Learning.
    
    Params
    ======
        n_episodes (int): maximum number of training episodes
        max_t (int): maximum number of timesteps per episode
        eps_start (float): starting value of epsilon, for epsilon-greedy action selection
        eps_end (float): minimum value of epsilon
        eps_decay (float): multiplicative factor (per episode) for decreasing epsilon
    """
    scores = []                        # list containing scores from each episode
    scores_window = deque(maxlen=100)  # last 100 scores
    eps = eps_start                    # initialize epsilon
    for i_episode in range(1, n_episodes+1):
        # state = env.reset() old
        env_info = env.reset(train_mode=True)[brain_name] #new
        state = env_info.vector_observations[0] #new
        
        score = 0
        for t in range(max_t):
            action = agent.act(state, eps)
            
            env_info = env.step(action)[brain_name] #new
            next_state = env_info.vector_observations[0] #new
            reward = env_info.rewards[0] #new
            done = env_info.local_done[0] #new

            
            agent.step(state, action, reward, next_state, done)
            state = next_state
            score += reward
            if done:
                break 
        scores_window.append(score)       # save most recent score
        scores.append(score)              # save most recent score
        eps = max(eps_end, eps_decay*eps) # decrease epsilon
        print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_window)), end="")
        if i_episode % 100 == 0:
            print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_window)))
        if np.mean(scores_window)>=13.0:
            print('\nEnvironment solved in {:d} episodes!\tAverage Score: {:.2f}'.format(i_episode-100, np.mean(scores_window)))
            torch.save(agent.qnetwork_local.state_dict(), framework + '_' + buffer_type + '_' + 'checkpoint.pth')
            #break
    return scores

In [None]:
scores = dqn_ddqn()

Episode 100	Average Score: 0.88
Episode 200	Average Score: 4.88
Episode 300	Average Score: 8.38
Episode 400	Average Score: 12.07
Episode 487	Average Score: 13.05
Environment solved in 387 episodes!	Average Score: 13.05
Episode 488	Average Score: 13.09
Environment solved in 388 episodes!	Average Score: 13.09
Episode 489	Average Score: 13.16
Environment solved in 389 episodes!	Average Score: 13.16
Episode 490	Average Score: 13.13
Environment solved in 390 episodes!	Average Score: 13.13
Episode 491	Average Score: 13.09
Environment solved in 391 episodes!	Average Score: 13.09
Episode 492	Average Score: 13.09
Environment solved in 392 episodes!	Average Score: 13.09
Episode 493	Average Score: 13.11
Environment solved in 393 episodes!	Average Score: 13.11
Episode 494	Average Score: 13.02
Environment solved in 394 episodes!	Average Score: 13.02
Episode 495	Average Score: 13.08
Environment solved in 395 episodes!	Average Score: 13.08
Episode 496	Average Score: 13.04
Environment solved in 396 ep

Episode 589	Average Score: 13.70
Environment solved in 489 episodes!	Average Score: 13.70
Episode 590	Average Score: 13.72
Environment solved in 490 episodes!	Average Score: 13.72
Episode 591	Average Score: 13.80
Environment solved in 491 episodes!	Average Score: 13.80
Episode 592	Average Score: 13.90
Environment solved in 492 episodes!	Average Score: 13.90
Episode 593	Average Score: 13.81
Environment solved in 493 episodes!	Average Score: 13.81
Episode 594	Average Score: 13.89
Environment solved in 494 episodes!	Average Score: 13.89
Episode 595	Average Score: 13.93
Environment solved in 495 episodes!	Average Score: 13.93
Episode 596	Average Score: 13.99
Environment solved in 496 episodes!	Average Score: 13.99
Episode 597	Average Score: 13.97
Environment solved in 497 episodes!	Average Score: 13.97
Episode 598	Average Score: 13.95
Environment solved in 498 episodes!	Average Score: 13.95
Episode 599	Average Score: 13.99
Environment solved in 499 episodes!	Average Score: 13.99
Episode 60

Episode 681	Average Score: 14.22
Environment solved in 581 episodes!	Average Score: 14.22
Episode 682	Average Score: 14.11
Environment solved in 582 episodes!	Average Score: 14.11
Episode 683	Average Score: 14.09
Environment solved in 583 episodes!	Average Score: 14.09
Episode 684	Average Score: 14.06
Environment solved in 584 episodes!	Average Score: 14.06
Episode 685	Average Score: 14.22
Environment solved in 585 episodes!	Average Score: 14.22
Episode 686	Average Score: 14.27
Environment solved in 586 episodes!	Average Score: 14.27
Episode 687	Average Score: 14.26
Environment solved in 587 episodes!	Average Score: 14.26
Episode 688	Average Score: 14.29
Environment solved in 588 episodes!	Average Score: 14.29
Episode 689	Average Score: 14.26
Environment solved in 589 episodes!	Average Score: 14.26
Episode 690	Average Score: 14.20
Environment solved in 590 episodes!	Average Score: 14.20
Episode 691	Average Score: 14.16
Environment solved in 591 episodes!	Average Score: 14.16
Episode 69

Episode 773	Average Score: 14.59
Environment solved in 673 episodes!	Average Score: 14.59
Episode 774	Average Score: 14.66
Environment solved in 674 episodes!	Average Score: 14.66
Episode 775	Average Score: 14.65
Environment solved in 675 episodes!	Average Score: 14.65
Episode 776	Average Score: 14.66
Environment solved in 676 episodes!	Average Score: 14.66
Episode 777	Average Score: 14.62
Environment solved in 677 episodes!	Average Score: 14.62
Episode 778	Average Score: 14.61
Environment solved in 678 episodes!	Average Score: 14.61
Episode 779	Average Score: 14.70
Environment solved in 679 episodes!	Average Score: 14.70
Episode 780	Average Score: 14.59
Environment solved in 680 episodes!	Average Score: 14.59
Episode 781	Average Score: 14.57
Environment solved in 681 episodes!	Average Score: 14.57
Episode 782	Average Score: 14.66
Environment solved in 682 episodes!	Average Score: 14.66
Episode 783	Average Score: 14.67
Environment solved in 683 episodes!	Average Score: 14.67
Episode 78

Episode 865	Average Score: 14.57
Environment solved in 765 episodes!	Average Score: 14.57
Episode 866	Average Score: 14.50
Environment solved in 766 episodes!	Average Score: 14.50
Episode 867	Average Score: 14.53
Environment solved in 767 episodes!	Average Score: 14.53
Episode 868	Average Score: 14.52
Environment solved in 768 episodes!	Average Score: 14.52
Episode 869	Average Score: 14.50
Environment solved in 769 episodes!	Average Score: 14.50
Episode 870	Average Score: 14.39
Environment solved in 770 episodes!	Average Score: 14.39
Episode 871	Average Score: 14.44
Environment solved in 771 episodes!	Average Score: 14.44
Episode 872	Average Score: 14.41
Environment solved in 772 episodes!	Average Score: 14.41
Episode 873	Average Score: 14.44
Environment solved in 773 episodes!	Average Score: 14.44
Episode 874	Average Score: 14.39
Environment solved in 774 episodes!	Average Score: 14.39
Episode 875	Average Score: 14.40
Environment solved in 775 episodes!	Average Score: 14.40
Episode 87

Episode 957	Average Score: 14.56
Environment solved in 857 episodes!	Average Score: 14.56
Episode 958	Average Score: 14.61
Environment solved in 858 episodes!	Average Score: 14.61
Episode 959	Average Score: 14.68
Environment solved in 859 episodes!	Average Score: 14.68
Episode 960	Average Score: 14.74
Environment solved in 860 episodes!	Average Score: 14.74
Episode 961	Average Score: 14.69
Environment solved in 861 episodes!	Average Score: 14.69
Episode 962	Average Score: 14.70
Environment solved in 862 episodes!	Average Score: 14.70
Episode 963	Average Score: 14.70
Environment solved in 863 episodes!	Average Score: 14.70
Episode 964	Average Score: 14.69
Environment solved in 864 episodes!	Average Score: 14.69
Episode 965	Average Score: 14.71
Environment solved in 865 episodes!	Average Score: 14.71
Episode 966	Average Score: 14.74
Environment solved in 866 episodes!	Average Score: 14.74
Episode 967	Average Score: 14.78
Environment solved in 867 episodes!	Average Score: 14.78
Episode 96

Episode 1048	Average Score: 15.40
Environment solved in 948 episodes!	Average Score: 15.40
Episode 1049	Average Score: 15.32
Environment solved in 949 episodes!	Average Score: 15.32
Episode 1050	Average Score: 15.38
Environment solved in 950 episodes!	Average Score: 15.38
Episode 1051	Average Score: 15.39
Environment solved in 951 episodes!	Average Score: 15.39
Episode 1052	Average Score: 15.45
Environment solved in 952 episodes!	Average Score: 15.45
Episode 1053	Average Score: 15.49
Environment solved in 953 episodes!	Average Score: 15.49
Episode 1054	Average Score: 15.47
Environment solved in 954 episodes!	Average Score: 15.47
Episode 1055	Average Score: 15.38
Environment solved in 955 episodes!	Average Score: 15.38
Episode 1056	Average Score: 15.38
Environment solved in 956 episodes!	Average Score: 15.38
Episode 1057	Average Score: 15.40
Environment solved in 957 episodes!	Average Score: 15.40
Episode 1058	Average Score: 15.39
Environment solved in 958 episodes!	Average Score: 15.39

Episode 1138	Average Score: 15.50
Environment solved in 1038 episodes!	Average Score: 15.50
Episode 1139	Average Score: 15.59
Environment solved in 1039 episodes!	Average Score: 15.59
Episode 1140	Average Score: 15.57
Environment solved in 1040 episodes!	Average Score: 15.57
Episode 1141	Average Score: 15.71
Environment solved in 1041 episodes!	Average Score: 15.71
Episode 1142	Average Score: 15.68
Environment solved in 1042 episodes!	Average Score: 15.68
Episode 1143	Average Score: 15.68
Environment solved in 1043 episodes!	Average Score: 15.68
Episode 1144	Average Score: 15.70
Environment solved in 1044 episodes!	Average Score: 15.70
Episode 1145	Average Score: 15.65
Environment solved in 1045 episodes!	Average Score: 15.65
Episode 1146	Average Score: 15.62
Environment solved in 1046 episodes!	Average Score: 15.62
Episode 1147	Average Score: 15.59
Environment solved in 1047 episodes!	Average Score: 15.59
Episode 1148	Average Score: 15.55
Environment solved in 1048 episodes!	Average S

In [None]:
# plot the scores
fig = plt.figure(figsize=(12, 6))
ax = fig.add_subplot(111)
plt.plot(np.arange(len(scores[0:2000])), scores[0:2000])
plt.ylabel('Score')
plt.xlabel('Episode #')
plt.savefig(framework + '_' + buffer_type  + '.png')

### 5. Watch a smart engine

In [None]:
# load the weights from file
agent.qnetwork_local.load_state_dict(torch.load(framework + '_' + buffer_type + '_' + 'checkpoint.pth', map_location=lambda storage, loc: storage))

env_info = env.reset(train_mode=False)[brain_name] # reset the environment
state = env_info.vector_observations[0]            # get the current state
score = 0                                          # initialize the score
while True:
    action = agent.act(state)        # select an action
    env_info = env.step(action)[brain_name]        # send the action to the environment
    next_state = env_info.vector_observations[0]   # get the next state
    reward = env_info.rewards[0]                   # get the reward
    done = env_info.local_done[0]                  # see if episode has finished
    score += reward                                # update the score

    agent.step(state, action, reward, next_state, done)
    state = next_state
    if done:                                       # exit loop if episode finished
        break

print("Score: {}".format(score))