# Deep Deterministic Policy Gradients (DDPG)
---
In this notebook, we train DDPG with [OpenAI Gym's BipedalWalker-v2 environment](https://gym.openai.com/envs/BipedalWalker-v2/).

Reward is given for moving forward, total 300+ points up to the far end. If the robot falls, it gets -100. Applying motor torque costs a small amount of points, more optimal agent will get better score. State consists of hull angle speed, angular velocity, horizontal speed, vertical speed, position of joints and joints angular speed, legs contact with ground, and 10 lidar rangefinder measurements. There's no coordinates in the state vector.

### 1. Import the Necessary Packages

In [1]:
import gym
import random
import torch
import numpy as np
from collections import deque
from DDPG_Agent import DDPG_Agent

import matplotlib.pyplot as plt
%matplotlib inline

### 2. Instantiate the Environment and Agent

In [2]:
env = gym.make('BipedalWalker-v3')
env.seed(10)
action_size = env.action_space.shape[0]
state_size = env.observation_space.shape[0]

print('Action Size:', action_size)
print('Action High:', env.action_space.high)
print('Action Low:', env.action_space.low)
print('State Size:', state_size)
print('State High:', env.observation_space.high)
print('State Low:', env.observation_space.low)

Action Size: 4
Action High: [1. 1. 1. 1.]
Action Low: [-1. -1. -1. -1.]
State Size: 24
State High: [inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf
 inf inf inf inf inf inf]
State Low: [-inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf
 -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf]




In [3]:
agent = DDPG_Agent(state_size=state_size, action_size=action_size, random_seed=10)

### 3. Train the Agent with DDPG

Run the code cell below to train the agent from scratch.  Alternatively, you can skip to the next code cell to load the pre-trained weights from file.

In [4]:
from tqdm.notebook import tqdm

def ddpg(n_episodes=10000, max_t=700):
    
    scores_deque = deque(maxlen=100)
    scores = []
    max_score = -np.Inf
    
    for i_episode in tqdm(range(1, n_episodes+1)):
        
        state = env.reset()
        agent.reset()
        score = 0
        
        for t in range(max_t):
            
            action = agent.act(state)
            next_state, reward, done, _ = env.step(action)
            agent.step(state, action, reward, next_state, done)
            
            state = next_state
            score += reward
        
            if done:
                break
                
        scores_deque.append(score)
        scores.append(score)
        
        print('\rEpisode {}\tAverage Score: {:.2f}\tScore: {:.2f}'.format(i_episode, np.mean(scores_deque), score), end="")
        if i_episode % 100 == 0:
            torch.save(agent.actor_local.state_dict(), 'checkpoint_actor.pth')
            torch.save(agent.critic_local.state_dict(), 'checkpoint_critic.pth')
            print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_deque)))   
    return scores

In [15]:
scores = ddpg()

fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(np.arange(1, len(scores)+1), scores)
plt.ylabel('Score')
plt.xlabel('Episode #')
plt.show()

HBox(children=(FloatProgress(value=0.0, max=2000.0), HTML(value='')))

Episode 13	Average Score: -71.60	Score: -15.855

KeyboardInterrupt: 

### 4. Watch a Smart Agent!

In the next code cell, you will load the trained weights from file to watch a smart agent!

In [None]:
agent.actor_local.load_state_dict(torch.load('checkpoint_actor.pth'))
agent.critic_local.load_state_dict(torch.load('checkpoint_critic.pth'))

state = env.reset()
agent.reset()
score = []
while True:
    action = agent.act(state)
    env.render()
    next_state, reward, done, _ = env.step(action)
    score.append(reward)
    state = next_state
    if done:
        break

print(np.sum(score))        
env.close()

In [16]:
!pip list

Package                            Version
---------------------------------- -----------------
-orch                              1.5.0
absl-py                            0.11.0
alabaster                          0.7.12
anaconda-client                    1.7.2
anaconda-navigator                 1.9.12
anaconda-project                   0.8.3
argh                               0.26.2
asn1crypto                         1.3.0
astroid                            2.3.3
astropy                            4.0
astunparse                         1.6.3
atomicwrites                       1.3.0
attrs                              19.3.0
autopep8                           1.4.4
Babel                              2.8.0
backcall                           0.1.0
backports.functools-lru-cache      1.6.1
backports.shutil-get-terminal-size 1.0.0
backports.tempfile                 1.0
backports.weakref                  1.0.post1
bcrypt                             3.1.7
beautifulsoup4                     4.8