# Deep Deterministic Policy Gradient: DDPG
This notebook is an implementation of the DDPG algorithm to solve the Reacher environment. You can find
an explanation of DDPG in this [paper](https://arxiv.org/abs/1509.02971)
and an explanation of the Reacher enviornment [here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Examples.md).

## 1. Import all necessary packages
If you have any trouble importing these packages make sure you check the README file and have all the necessary dependencies.

In [1]:
from unityagents import UnityEnvironment
from collections import deque
import numpy as np
import torch
import matplotlib.pyplot as plt
%matplotlib inline
from agents import DDPG

PATH = "C:\Dev\Python\RL\\Udacity_Continuous_Control"


## 2. Setup the Environment
Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.
**Note:** `file_name` parameter must match the location of the Unity environment that you downloaded.

In [2]:
env = UnityEnvironment(file_name='C:\Dev\Python\RL\\Udacity_Continuous_Control\Reacher_Windows_x86_64_single\Reacher.exe')
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]
# reset the environment and get env info to setup Agent
env_info = env.reset(train_mode=True)[brain_name]
action_size = brain.vector_action_space_size
states = env_info.vector_observations
state_size = states.shape[1]


INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		goal_speed -> 1.0
		goal_size -> 5.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


## 3. Setup and Train the Agent
This section contains a function for training the agent. If you do not want to train the agent and only wish to see the result of training and view the agent then set
`train = false` or skip this section.

In [3]:
ddpg_agent = DDPG(state_size = state_size, action_size = action_size, random_seed = 88)
train = True

def train_agent(agent: DDPG, num_episodes= 200, max_time = 300, print_every = 10):
    scores = []
    scores_deque = deque(maxlen=print_every)

    # Simulations for all the episodes
    for episode_num in range(1,num_episodes+1):
        # Reset everything
        env_info = env.reset(train_mode=True)[brain_name]
        state = env_info.vector_observations[0]
        score = 0
        # Run the episode
        for t in range(max_time):
            action = agent.act(state)
            env_info = env.step(action)[brain_name]
            next_state, reward, done = env_info.vector_observations[0], env_info.rewards[0], env_info.local_done[0]
            agent.step(state, action, reward, next_state, done)
            score += reward
            state = next_state
            if done:
                break
        # Episode Finished
        scores.append(score)
        scores_deque.append(score)
        if episode_num % print_every == 0:
            print(f'Episode: {episode_num} \tAverage Score: {round(np.mean(scores_deque),2)}')
            torch.save(agent.actor_local.state_dict(), f'{PATH}\checkpoints\{agent}_Actor_Single.pth')
            torch.save(agent.critic_local.state_dict(), f'{PATH}\checkpoints\{agent}_Critic_Single.pth')

    # All Episodes finished Save parameters and scores
    torch.save(agent.actor_local.state_dict(), f'{PATH}\checkpoints\{agent}_Actor_Single.pth')
    torch.save(agent.critic_local.state_dict(), f'{PATH}\checkpoints\{agent}_Critic_Single.pth')
    f = open(f'{PATH}\scores\{agent}_Single.txt', 'w')
    scores_string = "\n".join([str(score) for score in scores])
    f.write(scores_string)
    f.close

    return scores


if train:
    train_agent(ddpg_agent)


Episode: 10 	Average Score: 0.1
Episode: 20 	Average Score: 0.15
Episode: 30 	Average Score: 0.17
Episode: 40 	Average Score: 0.64
Episode: 50 	Average Score: 0.26
Episode: 60 	Average Score: 0.43
Episode: 70 	Average Score: 0.41
Episode: 80 	Average Score: 0.36
Episode: 90 	Average Score: 0.62
Episode: 100 	Average Score: 0.43


## 4. View the Results of Training

In [5]:
def graph_results(filename, save_graph = True):
    """ Graph results from training an Agent

    :param filename: file to get scores from
    """
    # Read in results from file
    with open(filename) as f:
        scores = [round(float(score),2) for score in f.read().splitlines()]

    # Graph results
    fig, ax = plt.subplots()
    ax.set(xlabel="Episode #", ylabel="Score", title="DDPG Agent Learning Curve")
    ax.grid()
    ax.plot(np.arange(len(scores)), scores, label="DDPG Agent Scores")
    ax.plot(np.arange(len(scores)), np.ones(len(scores))*30, color="black", linestyle="dashed", label="Solved")
    ax.legend()
    # Save graph results
    if save_graph:
        fig.savefig(f'{PATH}\images\DDPG_Agent_Single.png')
    plt.show()

graph_results(f'{PATH}\scores\DDPG_Agent_Single.txt')




FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Dev\\Python\\RL\\Udacity_Continuous_Control\\scores\\DDPG_Agent_Single.txt'

## 5. Watch the Trained Agent