# Navigation

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the first project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893).

### 1. Start the Environment

We begin by importing some necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [1]:
from unityagents import UnityEnvironment
import numpy as np
from collections import deque
import matplotlib.pyplot as plt
%matplotlib inline

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Banana.app"`
- **Windows** (x86): `"path/to/Banana_Windows_x86/Banana.exe"`
- **Windows** (x86_64): `"path/to/Banana_Windows_x86_64/Banana.exe"`
- **Linux** (x86): `"path/to/Banana_Linux/Banana.x86"`
- **Linux** (x86_64): `"path/to/Banana_Linux/Banana.x86_64"`
- **Linux** (x86, headless): `"path/to/Banana_Linux_NoVis/Banana.x86"`
- **Linux** (x86_64, headless): `"path/to/Banana_Linux_NoVis/Banana.x86_64"`

For instance, if you are using a Mac, then you downloaded `Banana.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Banana.app")
```

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from collections import OrderedDict,deque

In [3]:
from dqn_agent import Agent
from collections import deque

## Loading the model

In [4]:
def load_trained_agent(filepath):
    checkpoint = torch.load(filepath)
    agent = Agent(
                  checkpoint['state_size'],
                  checkpoint['action_size'],
                  checkpoint['hidden_layers'],
                  checkpoint['p'],
                  checkpoint['seed'])

    agent.qnetwork_local.load_state_dict(checkpoint['state_dict'])
    
    return agent

In [5]:
!pwd

/home/chuqui/Fernando/DRL/Navigation_Project-Banannas_Collector


In [6]:
env = UnityEnvironment(file_name="./Banana_Linux/Banana.x86_64")
#env = UnityEnvironment(file_name="./Banana_Linux_NoVis/Banana.x86_64")

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: BananaBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 37
        Number of stacked Vector Observation: 1
        Vector Action space type: discrete
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


In [7]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

In [9]:
# Instantiate an agent with an already trained Q-Network
agent = load_trained_agent('checkpoint.pth')

env_info = env.reset(train_mode=False)[brain_name]
score = 0
while True:
    state = env_info.vector_observations[0]  # get the current state
    action = agent.act(state)
    env_info = env.step(action)[brain_name]        # send the action to the environment
    state = env_info.vector_observations[0]        # get the next state
    reward = env_info.rewards[0]                   # get the reward
    done = env_info.local_done[0]                  # see if episode has finished
    score += reward
    if done:
        break

print('Score: {:.2f}'.format(score))

Score: 22.00


In [14]:
def run_agent():
    env_info = env.reset(train_mode=False)[brain_name]
    score = 0
    while True:
        state = env_info.vector_observations[0]  # get the current state
        action = agent.act(state)
        env_info = env.step(action)[brain_name]        # send the action to the environment
        state = env_info.vector_observations[0]        # get the next state
        reward = env_info.rewards[0]                   # get the reward
        done = env_info.local_done[0]                  # see if episode has finished
        score += reward
        if done:
            return score

In [28]:
hundred_runs = [run_agent() for i in range(100)]
hundred_runs

[11.0,
 7.0,
 22.0,
 9.0,
 12.0,
 17.0,
 10.0,
 13.0,
 14.0,
 14.0,
 16.0,
 5.0,
 14.0,
 16.0,
 13.0,
 0.0,
 19.0,
 12.0,
 9.0,
 11.0,
 11.0,
 9.0,
 17.0,
 21.0,
 11.0,
 5.0,
 2.0,
 0.0,
 9.0,
 17.0,
 13.0,
 0.0,
 4.0,
 4.0,
 11.0,
 14.0,
 17.0,
 18.0,
 28.0,
 5.0,
 15.0,
 11.0,
 3.0,
 13.0,
 5.0,
 4.0,
 2.0,
 13.0,
 15.0,
 9.0,
 4.0,
 15.0,
 12.0,
 18.0,
 9.0,
 17.0,
 16.0,
 7.0,
 4.0,
 16.0,
 0.0,
 13.0,
 13.0,
 11.0,
 9.0,
 14.0,
 16.0,
 17.0,
 16.0,
 10.0,
 4.0,
 16.0,
 18.0,
 8.0,
 18.0,
 3.0,
 16.0,
 12.0,
 9.0,
 21.0,
 13.0,
 13.0,
 11.0,
 14.0,
 2.0,
 3.0,
 1.0,
 14.0,
 18.0,
 21.0,
 20.0,
 21.0,
 12.0,
 2.0,
 0.0,
 5.0,
 11.0,
 8.0,
 4.0,
 1.0]

In [30]:
np.max(hundred_runs)

28.0

In [8]:
#env.close()