## Visualising the trained agents!

Since the agent was trained on a Google Cloud Platform, we used a headless version for training the agent.
However, for visualising the trained robotic arm agent we need to use the Normal Unity environment.

***Refer to the README.md file to download the Tennis Unity Environment for your system***

In [1]:
from unityagents import UnityEnvironment
import random
import datetime
import torch
import numpy as np
from collections import deque
import matplotlib.pyplot as plt
%matplotlib inline

from ddpg_agent import Agent

# imports for rendering outputs in Jupyter.
from JSAnimation.IPython_display import display_animation
from matplotlib import animation
from IPython.display import display

# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

## Load unity environment

In [2]:
# visualise multi agent environment
env = UnityEnvironment(file_name='unity_envs/Crawler.app')

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: CrawlerBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 129
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 20
        Vector Action descriptions: , , , , , , , , , , , , , , , , , , , 


In [3]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

Lets look at environment details

In [4]:
# reset the environment
env_info = env.reset(train_mode=False)[brain_name]

# number of agents
num_agents = len(env_info.agents)
print('Number of agents:', num_agents)

# size of each action
action_size = brain.vector_action_space_size
print('Size of each action:', action_size)

# examine the state space 
states = env_info.vector_observations
state_size = states.shape[1]
print('There are {} agents. Each observes a state with length: {}'.format(states.shape[0], state_size))
print('The state for the first agent looks like:', states[0])

Number of agents: 12
Size of each action: 20
There are 12 agents. Each observes a state with length: 129
The state for the first agent looks like: [ 0.00000000e+00  0.00000000e+00  0.00000000e+00  2.25000000e+00
  1.00000000e+00  0.00000000e+00  1.78813934e-07  0.00000000e+00
  1.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  6.06093168e-01 -1.42857209e-01 -6.06078804e-01  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  1.33339906e+00 -1.42857209e-01
 -1.33341408e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
 -6.0609

## Define the agent

In [5]:
# defining the ddpg agent
agent = Agent(state_size=env_info.vector_observations.shape[1], 
               action_size=brain.vector_action_space_size,
               num_agents=num_agents, random_seed=0)

Load weights from saved checkpoint files

In [6]:
agent.actor_local.load_state_dict(torch.load('trained_models/checkpoint_actor.pth',map_location='cpu'))
agent.critic_local.load_state_dict(torch.load('trained_models/checkpoint_critic.pth',map_location='cpu'))

## Visualize

A new window should pop up on running the following cell, showing the trained agents!

In [10]:
env_info = env.reset(train_mode=False)[brain_name]
states = env_info.vector_observations
agent.reset()
score = np.zeros(num_agents)
while True:
    actions = agent.act(states,add_noise=False)
    env_info = env.step(actions)[brain_name]
    next_states = env_info.vector_observations         # get next state (for each agent)
    rewards = env_info.rewards                         # get reward (for each agent)
    dones = env_info.local_done                        # see if episode finished
    states = next_states
    score += rewards
    if np.any(dones):
        break
avg_agent_score = np.mean(score)
print ("Avg Episode Reward: {}".format(avg_agent_score))

Avg Episode Reward: 60.97586840948012


Close environment after visulization

In [11]:
env.close()