Reproduction of Kulkarni et al. (2016) in Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
agent
data/raw
envs
fig
.gitignore
LICENSE
README.md Add MDP results to README.md Jul 23, 2017
actor_critic.py Add plotting to actor critic Sep 11, 2016
run.py
search_architectures.py Add code to search architectures randomly Sep 22, 2016
test_mdp.py
test_naive_hierarchy.py Remove extraneous dependencies Aug 21, 2016
work_notes.md

README.md

h-DQN

Reproduction of "Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation" by Kulkarni et al. (2016) in Python: https://arxiv.org/abs/1604.06057

Disclaimer

This is a work in progress. I haven't been able to replicate the results yet.

Also, I haven't started on Montezuma's revenge yet. I intend to do this eventually, but I'm not sure when. Pull requests are welcomed and encouraged!

Comments/criticisms/suggestions/etc welcome, as always.

Progress

MDP Environment

  • Create MDP Environment [Done]
  • Create a non-hierarchical actor-critic agent as a baseline [Done]
  • Evaluate the non-hierachical actor-critic by plotting which states it visits [Done]
  • Create a h-DQN agent [Done]
  • Evaluate the h-DQN agent by plotting which states it visits [Done]

Montezuma's Revenge

TODO (This might be a while. Pull requests welcome.)

Results

Stochastic MDP Environment

h-DQN

The h-DQN agent is located in ./agent/hDQN.py. Below is our replication of Figure 4 from the paper:

Figure 4

Requirements

  • numpy
  • tensorflow
  • keras
  • h5py
  • matplotlib