Asynchronous Methods for Deep Reinforcement Learning
Python Shell
Clone or download
Pull request Compare This branch is 129 commits ahead, 22 commits behind miyosuda:master.
Permalink
Failed to load latest commit information.
checkpoints.montezuma-b-rap000-avg-84M
docs
gcp-install ehnance screen-output-function for high-score and new-rooms Oct 15, 2016
gcp-preemptible-VM-instaces update scripts Dec 24, 2016
learning-curves add new learning-curve and checkpoint of that Oct 7, 2016
logs add: log of Monezuma's Revenge by A3C+ (A3C + pseudo-count based reward) Aug 29, 2016
movies add some files Sep 6, 2016
sample-yamls update scripts Dec 24, 2016
.gitignore add saved data (checkpoints/) for replaying breakout with a3c_display.py Aug 8, 2016
LICENSE.txt initial commit Apr 10, 2016
README.md update README.md Dec 24, 2016
a3c.py Merge branch 'master' of github.com:Itsukara/async_deep_reinforce Dec 1, 2016
a3c_display.py add support of OpenAI Gym Oct 6, 2016
a3c_training_thread.py add function to chage tes, psc-beta, psc-pow in each thread Dec 4, 2016
a3c_visualize.py add function to chage tes, psc-beta, psc-pow in each thread Dec 4, 2016
accum_trainer.py experimental gpu May 16, 2016
accum_trainer_test.py Python3.5対応 Aug 2, 2016
all-plot add tool to plot all graphs for log-files (output of a3c.py) Jan 8, 2017
average.py remove header of script file Aug 24, 2016
breakout.bin 対象ゲームをBreakoutに変更 Aug 2, 2016
checkpoints.montezuma-c-avg-greedy-rar025-78M.tgz add new learning-curve and checkpoint of that Oct 7, 2016
cleanup update scripts Dec 24, 2016
custom_lstm.py try unrolling lstm Jun 30, 2016
game_ac_network.py Major Enhance v2.0: add many functions for experiments Aug 23, 2016
game_state.py add function to chage tes, psc-beta, psc-pow in each thread Dec 4, 2016
game_state_test.py Major Enhance v2.0: add many functions for experiments Aug 23, 2016
montezuma_revenge.bin Major Enhance v2.0: add many functions for experiments Aug 23, 2016
options.py add function to chage tes, psc-beta, psc-pow in each thread Dec 4, 2016
plot.py
plot2.py
pong.bin initial commit Apr 10, 2016
psc-view.py add tool to view psc information as pictures Oct 15, 2016
report add function to chage tes, psc-beta, psc-pow in each thread Dec 4, 2016
rmsprop_applier.py fix batch loop bug Jun 5, 2016
rmsprop_applier_test.py initial commit Apr 10, 2016
rooms.py add tool to collect visited rooms in log file Oct 15, 2016
run-avconv-all change some tools Oct 18, 2016
run-avconv-and-rm change some tools Oct 18, 2016
run-option update run-option (overcome laser barrier in ALE) Jan 7, 2017
run-option-gym update scripts Dec 24, 2016
run-psc-view-all fixed shell script for psc-view Oct 15, 2016
run-record add support of OpenAI Gym Oct 6, 2016

README.md

async_deep_reinforce

Asynchronous deep reinforcement learning + Pseudo-count based reward + On-highscore-learning

About

This code is fork from miyosuda's code. I added many functions for my Deep Learning experiments. Of which, pseudo-count based reward based on following DeepMind's paper and on-highscore-learning (my original) enable over 1500 point average score in Montezuma's Revenge, which is higher than the paper as for A3C.

https://arxiv.org/abs/1606.01868 (Unifying Count-Based Exploration and Intrinsic Motivation, DeepMind)

"on-highscore-learning" is my original idea, which learn from state-action-rewards-history when getting highscore. But in evaluation of Montezuma's Revenge, I set option to reset highscore in every episode, so learning occured in every score. (I'm changing this now. In new version, only highscore episode will be selected automatically based on history of scores)

Slide

See following slide (in English) for explanation of this project.

http://www.slideshare.net/ItsukaraIitsuka/drl-challenge-on-montezumas-revenge

See following slide (in Japanese) for Japanese explanation.

http://www.slideshare.net/ItsukaraIitsuka/deepmind20166-unifying-countbased-exploration-and-intrinsic-motivation-pseudocount-montezumas-revenge

Learning curve of Montezuma's Revenge

The following graph is the average score of Montezuma's Revenge.

learning result after 39M steps

0 - 30M steps: Pseudo-count based reward is ON.

30 - 40M steps: Above + on-highscore-learning is ON.

Best Learning curve of Monezuma's Revenge

The following graph is the best Learning Curve of Montezuma's Revenge (2016/10/7). Best score is 2500 and peak average score is more than 1500 point.

best learning result

Explored Rooms

  • My result The following picture indicates the rooms explored in my all trainings.

explored rooms in my trainings

This is better than DeepMind result (see next picture). This was achieved in OpenAI Gym environment only. In ALE environment, although the average score is higher than OpenAI Gym, the number of explored rooms is less than that of OpenAI Gym.

  • DeepMind's result The rooms expolred in DeepMind paper (all in all).

explored rooms in DeepMind's trainings

Play movie

The following is a play movie of Montezuma's Revenge after training 50M steps. Its score is 2600.

How to prepare environment

This code needs Anaconda, tensorflow, opencv3 and Arcade Learning Environment (ALE). After download of gcp-install-a3c-env.tgz, you can use scrips in "gcp-install" directory. Run following.

$ sudo apt-get install git
$ git clone https://github.com/Itsukara/async_deep_reinforce.git
$ mkdir Download
$ cp async_deep_reinforce/gcp-install/gcp-install-a3c-env.tgz Download/
$ cd Download/
$ tar zxvf gcp-install-a3c-env.tgz
$ bash -x install-Anaconda.sh
$ . ~/.bashrc
$ bash -x install-tensorflow.sh
$ bash -x install-opencv3.sh
$ bash -x install-ALE.sh
$ bash -x install-additionals.sh
$ cd ../async_deep_reinforce
$ ./run-option montezuma-c-avg-greedy-rar025

When program requests input, just hit Enter or input "y" or "yes" and hit Enter. But as for Anaconda, you have to input "q" when License and "--More--" is displayed.

I built the environment using my scripts on Ubuntu 14.04LTS 64bit in Google Cloud Platform, Amazon EC2 and Microsoft Azure.

How to train

To train,

$ ./run-option montezuma-c-max-greedy-rar025

To display game screen played by the program,

$ python a3c_display.py --rom=montezuma_revenge.bin --display=True

To create play movie without displaying the game screen,

$ python a3c_display.py --rom=montezuma_revenge.bin --record-screen-dir=screen
$ run-avconv-all screen # you need avconv

Run options

As for options, see options.py.

How to reproduce OpenAI Gym Result

I uploaded evaluation result in OpenAI Gym. See "OpenAI Gym evaluation page". I'd appreciate if you cloud review my evaluation.

To repuroduce OpenAP Gym result,

$ ./run-option-gym montezuma-j-tes30-b0020-ff-fs2

Play screens are recorded in following directory,

  • screen.new-room : screens when entered new room are recored
  • screen.new-record : screens when achieved new score are recorded

Status of code

The source code is still under development and may chage frequently. Currently, I'm searching best parameters to speed-up learning and get higher score. In this search, I'm adding new functions to change behavior of the program. So, it might be degraded sometimes. Sorry for that in advance.

Sharing experiment result

I'd appreciate if you could write your experiment result to thread "Experiment Results" in Issues.

Blog

I'm writing blog on this program. See following (in Japanese):

http://itsukara.hateblo.jp/ (Itsukara's Blog)

How to refer

I'd appreciate if you woud refer my code in your blog or paper as following:

https://github.com/Itsukara/async_deep_reinforce (On-Highscore-Learning code based on A3C+ and Pseudo-count developed by Itsukara)

Acknowledgements

  • @miosuda for providing very fast A3C program.