continuous control of pendulum with ddpg ddpg paper
One can have trouble with applying value-based reinforcement learning to continuous action problem. In DQN, most famous value-based RL algorithm, agent choose action according to the epsilon-greedy action selection strategy. But if action is continuous, choosing according to Q-function becomes problem.
DDPG(Deep Deterministic Policy Gradient) is an variant of policy gradient algorithms. It uses actor-critic architecture to solve continuous problem. The output of actor is not the probability of actions but action itself which is deterministic policy.
Most Policy Gradient Algorithms uses "Policy Gradient Theorem". David Silver proved policy gradient theorem can be applied to deterministic policy and called it as deterministic policy gradient theorem.
The policy gradient of objective function by the "determinitic policy gradient theorem". It is gradient of Q-function of the selected action and using chain-rule, one can get the policy gradient of deterministic policy gradient.
- Python 3.5
- Tensorflow
- Keras
- numpy
- scipy
- matplotlib
- h5py
- gym
I used Pendulum-v0 environment of openai gym to test ddpg algorithm. It is simplist continuous action environment. You can train ddpg agent like this
python3 pendulum_ddpg.py