Skip to content

TensorFlow Implementation of Deep Deterministic Policy Gradients for Continuous OpenAI Gym Environments

Notifications You must be signed in to change notification settings

MahanFathi/DeepDPG-TensorFlow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepDPG-TensorFlow

TensorFlow Implementation of Deep Deterministic Policy Gradients

Intro

Replay buffers and target networks, as first proposed in ATARI playing paper, have made it possible to train deep value networks (DQN) over complicated environments. This is great, but DQN only works fine with discrete domains, since it relies on finding the action that maximizes the action-value function. Insisting on solving continuous valued cases, same authors came up with this model-free off-policy actor-critic algorithm, again by putting the DQN successes to good use. Here the exact algorithm is implemented using TensorFlow for continuous OpenAI Gym environments.

Overview

This code contains:

  1. Deep Q-Networking and Policy Improvement
  2. Easy Network Setting and Batch Normalization at Will
    • changing your network architecture reduces to editing a list
  3. Experience Replay Memory
    • makes the algorithm off-policy
  4. Target Networks for Both Action-Value and Policy Functions
    • stabilizes the learning process
  5. Ornstein—Uhlenbeck Action Noise for Exploration
  6. It's Modular

A Playground for Controlling OpenAI Gym

Can play with and tune network settings in config.py and control other environments.

TODOS

  • extend it to MuJoCo environments
  • saving and loading checkpoints (net weights)
  • make nice summaries

References

Releases

No releases published

Packages

No packages published

Languages