Skip to content

AlexHermansson/hindsight-experience-replay

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hindsight Experience Replay

This is an implementation of the bit flip experiment in the Hindsight Experience Replay paper using Double DQN with a Dueling architecture in PyTorch.

Introduction

Hindsight Experience Replay (HER) is a technique for dealing with sparse rewards in environments with clearly defined goal states. In these environments, it is easy to tell if the goal is reached, but might be hard to get there. The idea is to use an off-policy algorithm such as DQN, DDPG or NAF that uses a replay memory and fill it with two kinds of experiences acquired during training. First of all experiences conditioned on the original goal states, but also experiences conditioned on "hindsight goals" that have actually been reached (e.g. the final state that was reached during an episode). Introducing these artificially reached goals means that both positive and negative feedback will be provided so that learning can occur.

Results

Below are the success rates during training for 50 bits (there are more than 10^15 different states/goals in this case 😱). Exploration starts at 20% and is decayed linearly to 0% during half of the training epochs. Each epoch consist of 50 cycles, where in each cycle 16 episodes are used to fill the replay memory followed by 40 update steps of the DQN. The success rates are taken as an avarage success over each epoch (i.e. avarged over 50 cycles * 16 episodes runs). All parameters/hyperparameters except for the exploration are identical to those noted in the paper, they can be found in the train() function and in the class DQNAgent.

50 bits

Todo

  • Run experiment with 50 bits
  • Implement DDPG to use for continous actions
  • Implement environment to test DDPG

Releases

No releases published

Packages

No packages published

Languages