# Deep Deterministic Policy Gradient: DDPG
This notebook is an implementation of the DDPG algorithm to solve the Reacher environment. You can find
an explanation of DDPG [here](https://arxiv.org/abs/1509.02971)
and an explanation of the Reacher enviornment [here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Examples.md).

## 1. Import all necessary packages
If you have any trouble importing these packages make sure you check the README file and have all the necessary dependencies.

In [1]:
from unityagents import UnityEnvironment
import torch
import numpy as np
from collections import deque
import matplotlib.pyplot as plt
%matplotlib inline
from agents import *

PATH = "C:\Dev\Python\RL\\Udacity_Continuous_Control"


## 2. Setup the Environment
Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.
**Note:** `file_name` parameter must match the location of the Unity environment that you downloaded.

In [None]:
env = UnityEnvironment(file_name='C:\Dev\Python\RL\\Udacity_Continuous_Control\Reacher_Windows_x86_64_single\Reacher.exe')
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]
# reset the environment and get env info to setup Agent
env_info = env.reset(train_mode=True)[brain_name]
num_agents = len(env_info.agents)
action_size = brain.vector_action_space_size
states = env_info.vector_observations
state_size = states.shape[1]


## 3. Setup and Train the Agent
This section contains a function for training the agent. If you do not want to train the agent and only wish to see the result of training and view the agent then set
`train = false` or skip this section.

In [None]:
ddpg_agent = DDPG(num_agents = num_agents, state_size = state_size, action_size = action_size, random_seed = 88)
train = True

def train_agent(agent: Agent, num_episodes= 1000, max_time = 300, print_every = 100):
    scores = []
    scores_deque = deque(maxlen=print_every)

    # Simulations for all the episodes
    for episode_num in range(1,num_episodes+1):
        # Reset everything
        env_info = env.reset(train_mode=True)[brain_name]
        state = env_info.vector_observations[0]
        score = 0
        # Run the episode
        for t in max_time:
            action = agent.act(state)
            env_info = env.step(action)[brain_name]
            next_state, reward, done = env_info.vector_observations[0], env_info.rewards[0], env_info.local_done[0]
            agent.step(state, action, reward, next_state, done)
            score += reward
            state = next_state
            if done:
                break
        # Episode Finished
        scores.append(score)
        scores_deque.append(score)
        if episode_num % print_every == 0:
            print(f'r\Episode: {episode_num} \tAverage Score: {round(np.mean(scores_deque),2)}')
            torch.save(agent.actor_local.state_dict(), f'{PATH}\checkpoints\{agent}_actor.pth')
            torch.save(agent.critic_local.state_dict(), f'{PATH}\checkpoints\{agent}_critic.pth')

    # All Episodes finished Save parameters and scores
    torch.save(agent.actor_local.state_dict(), f'{PATH}\checkpoints\{agent}_actor.pth')
    torch.save(agent.critic_local.state_dict(), f'{PATH}\checkpoints\{agent}_critic.pth')
    f = open(f'{PATH}\scores\{agent}.txt', 'w')
    scores_string = "\n".join([score for score in scores])
    f.write(scores_string)
    f.close

    return scores


if train:
    train_agent(ddpg_agent)


## 4. View the Results of Training

## 5. Watch the Trained Agent