# Collaboration and Competition

---

This notebook explores and solves the third project in the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program called Collaboration and Competition.
This project involves learning in an environment with multiple agents. The starter code for this project can be found [here](https://github.com/udacity/deep-reinforcement-learning/tree/master/p3_collab-compet).

### 1. Import / Setup all necessary packages and notebook variables

If you have any trouble importing these packages make sure you check the README file and have all the necessary dependencies.
**Note:** To set up the unity environment the UnityEnvironment(filepath) must match the location of the Unity environment that you downloaded from the README.

In [1]:
from unityagents import UnityEnvironment
from agents import MADDPG
import numpy as np
import os

#----- Setup Notebook Variables -----#
path = os.path.abspath(os.getcwd())
env = UnityEnvironment(f'{path}\\Tennis_Windows_x86_64\\Tennis.exe')
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: TennisBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 8
        Number of stacked Vector Observation: 3
        Vector Action space type: continuous
        Vector Action space size (per agent): 2
        Vector Action descriptions: , 


### 2. Examine the environment

In this environment, two agents control rackets to bounce a ball over a net.
If an agent hits the ball over the net, it receives a reward of +0.1.  If an agent lets a ball hit the ground or hits the ball out of bounds, it receives a reward of -0.01.
Thus, this is a cooperative environment and the goal of each agent is to keep the ball in play.

The observation space consists of variables corresponding to the position and velocity of the ball and racket.
Two continuous actions are available, corresponding to movement toward (or away from) the net, and jumping.

Below is additional information about the environment

In [2]:
#-------         Environment Details        -------#
env_info = env.reset(train_mode=True)[brain_name]
num_agents = len(env_info.agents)
action_size = brain.vector_action_space_size
states = env_info.vector_observations
state_size = states.shape[1]
print(f'Number of agents: {num_agents}')
print(f'Size of each action per agent: {action_size}')
print(f'Size of observation space per agent: {state_size}')
print(f'Example State: {states[0]}')

#-------        Untrained Agent Example     -------#
for i in range(1, 4):
    env_info = env.reset(train_mode=False)[brain_name]
    states = env_info.vector_observations
    scores = np.zeros(num_agents)
    while True:
        actions = np.random.randn(num_agents, action_size)
        actions = np.clip(actions, -1, 1)
        env_info = env.step(actions)[brain_name]
        next_states = env_info.vector_observations
        rewards = env_info.rewards                         # get reward (for each agent)
        dones = env_info.local_done                        # see if episode finished
        scores += env_info.rewards                         # update the score (for each agent)
        states = next_states                               # roll over states to next time step
        if np.any(dones):                                  # exit loop if episode finished
            break

    # Show Example of what environment returns
    if i == 1:
        print(f'Action Example: {actions}')
        print(f'Shape of states: {states.shape}, Shape of next_states: {next_states.shape}')
        print(f'Example of rewards: {rewards}')
        print(f'Example of Dones: {dones}')


Number of agents: 2
Size of each action per agent: 2
Size of observation space per agent: 24
Example State: [ 0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.         -6.65278625 -1.5
 -0.          0.          6.83172083  6.         -0.          0.        ]
Action Example: [[ 0.57516834  0.12005162]
 [-0.14044376  0.02694566]]
Shape of states: (2, 24), Shape of next_states: (2, 24)
Example of rewards: [-0.009999999776482582, 0.0]
Example of Dones: [True, True]


### 3. Train an Agent

In [3]:
env_info = env.reset(train_mode=True)[brain_name]
maddpg_agent = MADDPG(state_size, action_size, num_agents, epsilon=1.0, random_seed=88)
maddpg_agent.train(env,brain_name)


Episode: 10 	Average Score: 0.02
Episode: 20 	Average Score: 0.0
Episode: 30 	Average Score: 0.02
Episode: 40 	Average Score: 0.0
Episode: 50 	Average Score: 0.03
Episode: 60 	Average Score: 0.02
Episode: 70 	Average Score: 0.02
Episode: 80 	Average Score: 0.03
Episode: 90 	Average Score: 0.0
Episode: 100 	Average Score: 0.04
Episode: 110 	Average Score: 0.02
Episode: 120 	Average Score: 0.0
Episode: 130 	Average Score: 0.01
Episode: 140 	Average Score: 0.06
Episode: 150 	Average Score: 0.01
Episode: 160 	Average Score: 0.01
Episode: 170 	Average Score: 0.01
Episode: 180 	Average Score: 0.02
Episode: 190 	Average Score: 0.0
Episode: 200 	Average Score: 0.02
Episode: 210 	Average Score: 0.0
Episode: 220 	Average Score: 0.02
Episode: 230 	Average Score: 0.0
Episode: 240 	Average Score: 0.02
Episode: 250 	Average Score: 0.02
Episode: 260 	Average Score: 0.0
Episode: 270 	Average Score: 0.01
Episode: 280 	Average Score: 0.02
Episode: 290 	Average Score: 0.02
Episode: 300 	Average Score: 0.

([0.09000000171363354,
  0.0,
  0.10000000149011612,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.10000000149011612,
  0.0,
  0.0,
  0.09000000171363354,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.20000000298023224,
  0.0,
  0.0,
  0.0,
  0.10000000149011612,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.10000000149011612,
  0.10000000149011612,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.10000000149011612,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.10000000149011612,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.10000000149011612,
  0.0,
  0.10000000149011612,
  0.0,
  0.0,
  0.09000000171363354,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.09000000171363354,
  0.0,
  0.0,
  0.0,
  0.10000000149011612,
  0.0,
  0.10000000149011612,
  0.0,
  0.10000000149011612,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,


### 4. Show Results of Training

### 5. Test the Trained Agent

When finished close the environment.

In [4]:
env.close()