# Udacity Collaboration and Competition Project Submission

* This notebook contains a solution for the third project of the Udacity Reinforcement Learning Course. 
* The notebook trains 2 agents in the Tennis environment using the [Multi-Agent Deep Deterministic Policy Gradient (MADDPG)](https://arxiv.org/pdf/1706.02275.pdf) algorithm.
* The top level functionality is implemented in `cc.py`
* The `src` directory contains the following files:
  * `maddpg_agent.py`: This implements training parts of the MADDPG algorithm.   
  * `ddpg_agent.py`: This file implements agent specific parts of the MADDPG algorithm (computing the actions
     , replay buffers etc).
     * This implementation is heavily based on the Udacity Reinforcement Learning Course Implementation  
  * `model.py`: This file implements the 4 types of neural networks for each agent used in the of the MADDPG algorithm.
     * This implementation is heavily based on the Udacity Reinforcement Learning Course implementation 
  * `config.py`: This contains the configuration parameters used in the MADDPG algorithm    
* The `README.md` file contains setup details
* The `Report.ipynb` file contains the description of the algorithm and other implementation details.
* The weights of the trained neural networks are in the following files:
  * First agent:    
    * `actor_local_weights_0.pt`
    * `actor_target_weights_0.pt`
    * `critic_local_weights_0.pt`
    * `critic_target_weights_0.pt`
  * Second agent:
    * `actor_local_weights_1.pt`
    * `actor_target_weights_1.pt`
    * `critic_local_weights_1.pt`
    * `critic_target_weights_1.pt`
* Training results (maximum scores and the moving average maximum scores) are stored in the `training_results.pkl` pickle file
* The training results plot is saved in `training_performance.png`

### 1. Setup

In [2]:
from unityagents import UnityEnvironment
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import cc as cc

### 2. Load the simulation environment

In [3]:
env = UnityEnvironment(file_name="Tennis.app")

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: TennisBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 8
        Number of stacked Vector Observation: 3
        Vector Action space type: continuous
        Vector Action space size (per agent): 2
        Vector Action descriptions: , 


### 3. Train the agents

In [None]:
num_episodes, moving_max_scores, scores = cc.train(env, num_episodes=2500, min_performance = 1.0)
print("Done training")

Number of agents: 2
Number of actions: 2
States have length: 24
There are 2 agents. Each observes a state with length: 24
The state for the first agent looks like: [ 0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.         -6.65278625 -1.5
 -0.          0.          6.83172083  6.         -0.          0.        ]
Episode:1 Time: 0s Sim Time: 15s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:2 Time: 1s Sim Time: 14s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:3 Time: 0s Sim Time: 14s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:4 Time: 0s Sim Time: 14s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:5 Time: 0s Sim Time: 14s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:6 Time: 0s Sim Time: 15s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:7 Time: 0s Sim Time: 14s min_score:-0.01 max_score:0.0

Episode:92 Time: 10s Sim Time: 14s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:93 Time: 10s Sim Time: 14s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:94 Time: 10s Sim Time: 14s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:95 Time: 9s Sim Time: 14s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:96 Time: 10s Sim Time: 15s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:97 Time: 10s Sim Time: 14s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:98 Time: 9s Sim Time: 14s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:99 Time: 9s Sim Time: 14s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:100 Time: 10s Sim Time: 14s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:101 Time: 10s Sim Time: 15s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:102 Time: 10s Sim Time: 14s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:103 Time: 10s Sim Time: 14s min_score:-0.01 ma

Episode:187 Time: 10s Sim Time: 14s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:188 Time: 10s Sim Time: 14s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:189 Time: 10s Sim Time: 14s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:190 Time: 10s Sim Time: 14s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:191 Time: 11s Sim Time: 15s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:192 Time: 10s Sim Time: 14s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:193 Time: 10s Sim Time: 14s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:194 Time: 22s Sim Time: 31s min_score:0.00 max_score:0.09 ma_max_score:  0.00
Episode:195 Time: 21s Sim Time: 30s min_score:0.00 max_score:0.09 ma_max_score:  0.00
Episode:196 Time: 29s Sim Time: 40s min_score:-0.01 max_score:0.10 ma_max_score:  0.00
Episode:197 Time: 10s Sim Time: 14s min_score:-0.01 max_score:0.00 ma_max_score:  0.00
Episode:198 Time: 33s Sim Time: 45s min_score

### 4. Plot the results

In [None]:
max_scores = [np.max(m) for m in scores]
plt.ion()
fig = plt.figure()
_ = fig.add_subplot(111)
plt.plot(np.arange(len(max_scores)), max_scores)
plt.plot(np.arange(len(moving_max_scores)), moving_max_scores)
plt.ylabel("Max and Moving Average of Max Scores ")
plt.xlabel("Episode #")
plt.legend(["Max Score", "Moving Average of Max Score"], loc="upper left")
plt.savefig("training_performance.png")
plt.show()

### 5. Run the trained agent

In [None]:
scores = cc.run(env, num_episodes=10)

In [None]:
print("Done simulating")
for idx, score in enumerate(scores):
    print("""Episode:{}, scores:{}""".format(idx, score))

### 6. Close the environment

In [None]:
env.close()