# Deep Q-Learning - Banana Navigation - Watch Agents

In this notebook you can watch how an untrained and a trained agent navigate. 
The training itself is done in `NavigationTraining.ipynb`. 
Here we will simply import the trained model which is stored in `checkpoint.pth`.

## Imports and setup
For details on imports and the environment setup see `NavigationTraining.ipynb`.

In [12]:
from unityagents import UnityEnvironment
import numpy as np
import torch

from dqn_agent import Agent
import model as dqn_model

In [18]:
# make sure the path to the Unity environment is set correctly in the line below
env = UnityEnvironment(file_name="path/to/Banana_Windows_x86/Banana.exe")

In [14]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

# number of actions and states
action_size = brain.vector_action_space_size
state = env_info.vector_observations[0]
state_size = len(state)

## Watch an untrained agent
The agent takes random actions.

The result should look like this:

<img style="float: left;" src="img/banana_random.gif" width="400"/>

In [8]:
env_info = env.reset(train_mode=False)[brain_name] # reset the environment

state = env_info.vector_observations[0]            # get the current state

score = 0                                          # initialize the score
while True:
    action = np.random.randint(action_size)        # select an action
    env_info = env.step(action)[brain_name]        # send the action to the environment
    next_state = env_info.vector_observations[0]   # get the next state
    reward = env_info.rewards[0]                   # get the reward
    done = env_info.local_done[0]                  # see if episode has finished
    score += reward                                # update the score
    state = next_state                             # roll over the state to next time step
    if done:                                       # exit loop if episode finished
        break
        
print("Score: {}".format(score))

Score: 2.0


## Watch a trained agent

#### Create new agent and pass pretrained model to agent. 

Method `load_model_checkpoint(filepath)` reads the model from file and returns the model.

In [16]:
def load_model_checkpoint(filepath):
    checkpoint = torch.load(filepath)
    model = dqn_model.QNetwork(state_size=checkpoint['input_size'],
                               action_size=checkpoint['output_size'],
                               seed=0,
                               hidden_sizes=checkpoint['hidden_layers'],
                              )
    model.load_state_dict(checkpoint['state_dict'])
    
    return model

agent = Agent(state_size=state_size, action_size=action_size, seed=0)
model = load_model_checkpoint('checkpoint.pth')
agent.qnetwork_local = model
print(model)

QNetwork(
  (hidden_layers): ModuleList(
    (0): Linear(in_features=37, out_features=512, bias=True)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (output): Linear(in_features=256, out_features=4, bias=True)
  (dropout): Dropout(p=0.05, inplace=False)
)


#### Watch the trained agent by running the cell below.

The result should look like this:

<img style="float: left;" src="img/banana_trained.gif" width="400"/>

In [17]:
env_info = env.reset(train_mode=False)[brain_name] # reset the environment

state = env_info.vector_observations[0]            # get the current state

score = 0                                          # initialize the score
while True:
    action = np.int16(agent.act(state)).item()  # select an action
    env_info = env.step(action)[brain_name]        # send the action to the environment
    next_state = env_info.vector_observations[0]   # get the next state
    reward = env_info.rewards[0]                   # get the reward
    done = env_info.local_done[0]                  # see if episode has finished
    score += reward                                # update the score
    state = next_state                             # roll over the state to next time step
    if done:                                       # exit loop if episode finished
        break
    
print("Score: {}".format(score))

Score: 21.0


#### When done, close the environment.

In [19]:
env.close()