# Udacity Continuous Control Project Submission

* In this notebook, a solution for the second project of the Udacity Reinforcement Learning Course is shown. 
* The program trains an agent using the Reacher environment with 20 robots.
* The top level functionality is implemented in `continuous_control.py`
* The `src` directory contains the following files:
  * `ddpg_agent.py`: This file implements the  Deep Deterministic Policy Gradient algorithm (DDPG).
     * This implementation is heavily based on the Udacity Reinforcement Learning Course implementation   
  * `model.py`: This file implements the 4 neural networks of the  (DDPG) algorithm.
     * This implementation is heavily based on the Udacity Reinforcement Learning Course implementation 
* The `README.md` file contains setup details
* The `Report.ipynb` file contains the description of the algorithm and other implementation details.
* The weights of the trained neural networks are in the following files:
  * `actor_local_weights.pt`
  * `actor_target_weights.pt`
  * `critic_local_weights.pt`
  * `critic_target_weights.pt`

### 1. Setup

In [1]:
from unityagents import UnityEnvironment
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import continuous_control as cc

### 2. Load the simulaton environment

In [2]:
 env = UnityEnvironment(file_name="Reacher20.app")

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		goal_size -> 5.0
		goal_speed -> 1.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


### 3. Train the agent

In [3]:
num_episodes,  avg_scores, scores = cc.train(env, num_episodes=2)
print("Done training")

Number of agents: 20
Number of actions: 4
States have length: 33
There are 20 agents. Each observes a state with length: 33
Episode: 1 Episode Duration: 174s min_score: 0.00 max_score: 1.26 average_score: 0.42
Episode: 2 Episode Duration: 182s min_score: 0.15 max_score: 2.04 average_score: 0.63
Done training


### 4. Plot the results

In [None]:
mean_scores = [np.mean(m) for m in scores]
plt.ion()
fig = plt.figure()
_ = fig.add_subplot(111)
plt.plot(np.arange(len(mean_scores)), mean_scores)
plt.plot(np.arange(len(avg_scores)), avg_scores)
plt.ylabel('Score and Average Scores ')
plt.xlabel('Episode #')
plt.legend(['Score', 'Average Score'], loc='upper left')
plt.savefig('training_performance.png')
plt.show()

### 5. Run the trained Agent

In [None]:
scores = cc.run(env,num_episodes=1,actor_local_load_filename='actor_local_weights.pt')
print("Done simulating")

### 6. Close the  simulation environment

In [None]:
env.close()