Skip to content

Latest commit

 

History

History
104 lines (67 loc) · 2.18 KB

ddpg.rst

File metadata and controls

104 lines (67 loc) · 2.18 KB

stable_baselines3.ddpg

DDPG

Deep Deterministic Policy Gradient (DDPG) combines the trick for DQN with the deterministic policy gradient, to obtain an algorithm for continuous actions.

Available Policies

MlpPolicy

Notes

Note

The default policy for DDPG uses a ReLU activation, to match the original paper, whereas most other algorithms' MlpPolicy uses a tanh activation. to match the original paper

Can I use?

  • Recurrent policies: ❌
  • Multi processing: ❌
  • Gym spaces:
Space Action Observation
Discrete ✔️
Box ✔️

✔️

MultiDiscrete ✔️
MultiBinary ✔️

Example

import gym
import numpy as np

from stable_baselines3 import DDPG
from stable_baselines3.common.noise import NormalActionNoise, OrnsteinUhlenbeckActionNoise

env = gym.make('Pendulum-v0')

# The noise objects for DDPG
n_actions = env.action_space.shape[-1]
action_noise = NormalActionNoise(mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions))

model = DDPG('MlpPolicy', env, action_noise=action_noise, verbose=1)
model.learn(total_timesteps=10000, log_interval=10)
model.save("ddpg_pendulum")
env = model.get_env()

del model # remove to demonstrate saving and loading

model = DDPG.load("ddpg_pendulum")

obs = env.reset()
while True:
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    env.render()

Parameters

DDPG

DDPG Policies

MlpPolicy