# Collaboration and Competition

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the third project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program.

In [None]:
import os
import sys

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from unityagents import UnityEnvironment

## Create the Unity environment

We begin by importing the necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

__Before running the code cell below__, change the ENVIRONMENT_PATH parameter to match the location of the Unity environment that you downloaded.

In [None]:
ENVIRONMENT_PATH = os.path.join("..", "environments", "Tennis.app")
#ENVIRONMENT_PATH = os.path.join("..", "environments", "Tennis_Linux", "Tennis.x86_64")

In [None]:
SEED = 0
SRC_PATH = os.path.join("..", "src")
AGENT_CHECKPOINT_DIR = os.path.join("..", "models")

In [None]:
sys.path.append(SRC_PATH)

In [None]:
from environments import UnityEnvWrapper

In [None]:
env = UnityEnvWrapper(UnityEnvironment(file_name=ENVIRONMENT_PATH))

### Take Random Actions in the Environment

In the next code cell, you will learn how to use the Python API to control the agents and receive feedback from the environment.

Once this cell is executed, you will watch the agents' performance, if they select actions at random with each time step.  A window should pop up that allows you to observe the agents.

Of course, as part of the project, you'll have to change the code so that the agents are able to use their experiences to gradually choose better actions when interacting with the environment!

In [None]:
num_agents = 2
action_size = 2
for i in range(1, 6):                                      # play game for 5 episodes
    states = env.reset(train_mode=False)                   # reset the environment    
    scores = np.zeros(num_agents)                          # initialize the score (for each agent)
    while True:
        actions = np.random.randn(num_agents, action_size) # select an action (for each agent)
        actions = np.clip(actions, -1, 1)                  # all actions between -1 and 1
        next_states, rewards, dones = env.step(actions)    # send all actions to tne environment
        scores += rewards                                  # update the score (for each agent)
        states = next_states                               # roll over states to next time step
        if np.any(dones):                                  # exit loop if episode finished
            break
    print('Score (max over agents) from episode {}: {}'.format(i, np.max(scores)))

In [None]:
print(actions.shape), print(type(actions))
print(next_states.shape), print(type(next_states))
print(rewards.shape), print(type(rewards))
print(dones.shape), print(type(dones))

When finished, you can close the environment.

In [None]:
env.close()

### 4. It's Your Turn!

Now it's your turn to train your own agent to solve the environment!  When training the environment, set `train_mode=True`, so that the line for resetting the environment looks like the following:
```python
env_info = env.reset(train_mode=True)[brain_name]
```