# Navigation in the Environment

---

### 1. Start the Environment

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from pddpg_tennis.environment import UnityEnvironmentWrapper

In [3]:
reacher_env = UnityEnvironmentWrapper(env_binary='../bin/tennis/Tennis.x86_64', train_mode=True)

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: TennisBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 8
        Number of stacked Vector Observation: 3
        Vector Action space type: continuous
        Vector Action space size (per agent): 2
        Vector Action descriptions: , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents.

### 2. Examine the State and Action Spaces

The simulation contains 2 agents that control tennis rackets.  At each time step, agents control 2 continuous actions, move forward/backward & up/down with a racekt.  

The state space has `24` dimensions.

Run the code cell below to print some information about the environment.

In [4]:
starting_states = reacher_env.reset()

print(f'We have {reacher_env.num_agents} agents')

action_size = reacher_env.action_size
print('Number of actions:', action_size)

print('States look like:\n', starting_states[0])

state_size = reacher_env.state_size
print('States have length:', state_size)

We have 2 agents
Number of actions: 2
States look like:
 [ 0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.         -7.38993645 -1.5        -0.          0.
  6.83172083  5.99607611 -0.          0.        ]
States have length: 24


### 3. Take Random Actions in the Environment

In the next code cell, you will learn how to use the Python API to control the agent and receive feedback from the environment.

Once this cell is executed, you will watch the agent's performance, if it selects an action (uniformly) at random with each time step.  A window should pop up that allows you to observe the agent, as it moves through the environment.  

In [5]:
import numpy as np

num_agents = reacher_env.num_agents
action_size = reacher_env.action_size

for run in range(10):
    states = reacher_env.reset()                                  # reset env
    scores = np.zeros(num_agents)                                 # initialize the score (for each agent)
    while True:
        actions = np.random.randn(num_agents, action_size)        # select an action (for each agent)
        actions = np.clip(actions, -1, 1)                         # all actions between -1 and 1   

        next_states, rewards, dones = reacher_env.step(actions)
        scores += rewards                               
        states = next_states

        if np.any(dones):
            break

    print(f"Score from run {run}: {np.max(scores)}, [raw: {scores}]")

Score from run 0: 0.09000000357627869, [raw: [ 0.    0.09]]
Score from run 1: 0.0, [raw: [-0.01  0.  ]]
Score from run 2: 0.0, [raw: [ 0.   -0.01]]
Score from run 3: 0.0, [raw: [-0.01  0.  ]]
Score from run 4: 0.0, [raw: [ 0.   -0.01]]
Score from run 5: 0.0, [raw: [ 0.   -0.01]]
Score from run 6: 0.0, [raw: [ 0.   -0.01]]
Score from run 7: 0.10000000149011612, [raw: [-0.01  0.1 ]]
Score from run 8: 0.0, [raw: [ 0.   -0.01]]
Score from run 9: 0.0, [raw: [-0.01  0.  ]]


When finished, you can close the environment.

In [6]:
reacher_env.close()