pendulum_ddpg

continuous control of pendulum with ddpg ddpg paper

Deep Deterministic Policy Gradient

One can have trouble with applying value-based reinforcement learning to continuous action problem. In DQN, most famous value-based RL algorithm, agent choose action according to the epsilon-greedy action selection strategy. But if action is continuous, choosing according to Q-function becomes problem.

DDPG(Deep Deterministic Policy Gradient) is an variant of policy gradient algorithms. It uses actor-critic architecture to solve continuous problem. The output of actor is not the probability of actions but action itself which is deterministic policy.

Most Policy Gradient Algorithms uses "Policy Gradient Theorem". David Silver proved policy gradient theorem can be applied to deterministic policy and called it as deterministic policy gradient theorem.

The policy gradient of objective function by the "determinitic policy gradient theorem". It is gradient of Q-function of the selected action and using chain-rule, one can get the policy gradient of deterministic policy gradient.

Requirements

Python 3.5
Tensorflow
Keras
numpy
scipy
matplotlib
h5py
gym

Usage

I used Pendulum-v0 environment of openai gym to test ddpg algorithm. It is simplist continuous action environment. You can train ddpg agent like this

python3 pendulum_ddpg.py

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
img		img
save_graph		save_graph
save_model		save_model
README.md		README.md
pendulum_ddpg.py		pendulum_ddpg.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pendulum_ddpg

Deep Deterministic Policy Gradient

Requirements

Usage

About

Releases

Packages

Languages

dnddnjs/pendulum_ddpg

Folders and files

Latest commit

History

Repository files navigation

pendulum_ddpg

Deep Deterministic Policy Gradient

Requirements

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages