# A2C Continuous Control

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the second project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program.

### 1. Start the Environment

We begin by importing the necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [None]:
from unityagents import UnityEnvironment
import numpy as np
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
from A2C_agent import A2CAgent
from train_a2c import A2CConfig, load_weights, train_agent, evaluate_current_weights

In [None]:
env = UnityEnvironment(file_name='../environments/Reacher_20/Reacher.x86_64')

### 2. Init Agent
Initialize a A2C agent.

In [None]:
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

env_info = env.reset(train_mode=True)[brain_name]

# number of agents
num_agents = len(env_info.agents)
print('Number of agents:', num_agents)

# size of each action
action_size = brain.vector_action_space_size
print('Size of each action:', action_size)

# examine the state space
states = env_info.vector_observations
state_size = states.shape[1]
print('There are {} agents. Each observes a state with length: {}'.format(states.shape[0], state_size))
print('The state for the first agent looks like:', states[0])

#use A2CConfig with best performing default parameters
a2c_config = A2CConfig()
a2c_config.learning_rate = 2e-4
a2c_config.gamma = 0.95
a2c_config.use_gae = True
a2c_config.gae_tau = 0.95

agent = A2CAgent(state_dim=state_size, action_dim=action_size, a2c_config=a2c_config, num_agents=num_agents)

### 3. Train your agent
This following section will train a new agent, the weights are only saved when the agent reaches an average of 30+ points. If you want to test the pre-trained weights skip to Section 5.

In [None]:
weight_dir = '../a2c_weights/new.pth'
train_agent(env, agent, brain_name, num_agents, weight_dir=weight_dir, n_episodes=200)

### 4. Test your agent
Here you can test your weights. Hint: The weights are only saved when the agent reaches an average of 30+ points. If you want to test the pre-trained weights skip to Section 5.

In [None]:
load_weights(agent, critic_weight_dir, actor_weight_dir)
evaluate_current_weights(env, agent, brain_name, num_agents, n_episodes=5, train_mode=False)

### 5. Test pre-trained agent
The following lines load and test a pre-trained agent.

In [None]:
a2c_config = A2CConfig()
a2c_config.learning_rate = 2e-4
a2c_config.gamma = 0.95
a2c_config.use_gae = True
a2c_config.gae_tau = 0.95

agent = A2CAgent(state_dim=state_size, action_dim=action_size, a2c_config=a2c_config, num_agents=num_agents)
load_weights(agent, '../a2c_weights/best.pth')
evaluate_current_weights(env, agent, brain_name, num_agents, n_episodes=5, train_mode=False)

In [None]:
env.close()