# Navigation in the Environment

---

### 1. Start the Environment

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from ac_continuous_control.environment import UnityEnvironmentWrapper

In [3]:
# reacher_env = UnityEnvironmentWrapper(env_binary='../bin/reacher_single/Reacher.x86_64', train_mode=True)
reacher_env = UnityEnvironmentWrapper(env_binary='../bin/reacher_multi/Reacher.x86_64', train_mode=True)

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		goal_size -> 5.0
		goal_speed -> 1.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents.

### 2. Examine the State and Action Spaces

The simulation contains multiple agents that control robotic arms.  At each time step, agents control joints of arm with continuous actions. Action space has `4` dimensions  

The state space has `33` dimensions.

Run the code cell below to print some information about the environment.

In [4]:
starting_states = reacher_env.reset()

print(f'We have {reacher_env.num_agents} parallel worlds')

action_size = reacher_env.action_size
print('Number of actions:', action_size)

print('States look like:\n', starting_states[0])

state_size = reacher_env.state_size
print('States have length:', state_size)

We have 20 parallel worlds
Number of actions: 4
States look like:
 [  0.00000000e+00  -4.00000000e+00   0.00000000e+00   1.00000000e+00
  -0.00000000e+00  -0.00000000e+00  -4.37113883e-08   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00  -1.00000000e+01   0.00000000e+00
   1.00000000e+00  -0.00000000e+00  -0.00000000e+00  -4.37113883e-08
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   7.90150833e+00  -1.00000000e+00
   1.25147629e+00   0.00000000e+00   1.00000000e+00   0.00000000e+00
  -5.22214413e-01]
States have length: 33


### 3. Take Random Actions in the Environment

In the next code cell, you will learn how to use the Python API to control the agent and receive feedback from the environment.

Once this cell is executed, you will watch the agent's performance, if it selects an action (uniformly) at random with each time step.  A window should pop up that allows you to observe the agent, as it moves through the environment.  

In [5]:
import numpy as np

num_agents = reacher_env.num_agents
action_size = reacher_env.action_size

states = reacher_env.reset()                                  # reset env
scores = np.zeros(num_agents)                                 # initialize the score (for each agent)
while True:
    actions = np.random.randn(num_agents, action_size)        # select an action (for each agent)
    actions = np.clip(actions, -1, 1)                         # all actions between -1 and 1   
    
    next_states, rewards, dones = reacher_env.step(actions)
    scores += rewards                               
    states = next_states
    
    if np.any(dones):
        break
    
print("Score: {}".format(scores))

Score: [ 0.          0.88999998  0.          0.          0.          0.          0.09
  0.          0.          0.          0.17        0.          0.          0.06
  0.25999999  0.67999998  0.44999999  0.          0.35999999  0.        ]


When finished, you can close the environment.

In [6]:
reacher_env.close()