# Collaboration and Competition

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the third project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program.

### 1. Start the Environment

We begin by importing the necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [1]:
from unityagents import UnityEnvironment
import numpy as np
import torch
import progressbar as pb
from ddpg_agent import Agent
from collections import deque
from utils import load_trained_agent
import matplotlib.pyplot as plt
%matplotlib inline

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Tennis.app"`
- **Windows** (x86): `"path/to/Tennis_Windows_x86/Tennis.exe"`
- **Windows** (x86_64): `"path/to/Tennis_Windows_x86_64/Tennis.exe"`
- **Linux** (x86): `"path/to/Tennis_Linux/Tennis.x86"`
- **Linux** (x86_64): `"path/to/Tennis_Linux/Tennis.x86_64"`
- **Linux** (x86, headless): `"path/to/Tennis_Linux_NoVis/Tennis.x86"`
- **Linux** (x86_64, headless): `"path/to/Tennis_Linux_NoVis/Tennis.x86_64"`

For instance, if you are using a Mac, then you downloaded `Tennis.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Tennis.app")
```

In [2]:
env = UnityEnvironment(file_name="./Tennis_Linux/Tennis.x86")


INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: TennisBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 8
        Number of stacked Vector Observation: 3
        Vector Action space type: continuous
        Vector Action space size (per agent): 2
        Vector Action descriptions: , 


In [3]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

## Loading the models

In [4]:
ddpg = load_trained_agent('./saved_models/ddpg_model_best_weights.pth')
longer_training_time_ddpg = load_trained_agent('./saved_models/ddpg_model_first_succesful_weights.pth')

## Evaluation

Now the trained agents perform in the visual environment.

In [5]:
def run_agent(agent):
    """Shows the performance of an agent over Reacher's visual environment and returns the scored achieved"""
    env_info = env.reset(train_mode=False)[brain_name]
    scores = np.zeros(agent.n_agents)
    while True:
        states = env_info.vector_observations          # get the current state
        actions = agent.act(states)                    # compute actions for each agent  
        env_info = env.step(actions)[brain_name]       # send the action to the environment
        next_states = env_info.vector_observations      # get the next states
        rewards = env_info.rewards                     # get the reward
        dones = env_info.local_done                    # see if episode has finished
        scores += rewards
        if any(dones):
            return np.mean(scores)

Observe the performance of the trained DDPG agent(when the agent first met the .5 average score). The agent stopped its learning after solving the environment.

In [6]:
print(f'The average score for the agents in the episode is: {run_agent(ddpg)}')

The average score for the agents in the episode is: 0.7950000120326877


The next agent was trained for a longer time achieving a higher average score in the training stage(the average score attained was .87).

In [8]:
print(f'The average score for the agents in the episode is: {run_agent(longer_training_time_ddpg)}')

The average score for the agents in the episode is: 1.6950000254437327


When finished you can close the environment

In [9]:
env.close()