DORA The Explorer: Directed Outreaching Reinforcement Action-Selection

This repository contains supplementary code for the ICLR2018 paper DORA The Explorer: Directed Outreaching Reinforcement Action-Selection.

If you use any of the code related to this repository in a paper, research etc., please cite:

@inproceedings{
    fox2018dora,
    title={{DORA} The Explorer: Directed Outreaching Reinforcement Action-Selection},
    author={Lior Fox and Leshem Choshen and Yonatan Loewenstein},
    booktitle={International Conference on Learning Representations},
    year={2018},
    url={https://openreview.net/forum?id=ry1arUgCW},
}

Linear Approximation with tile-coding

Tile-coding features uses Richard Sutton implementation. The experiments in the paper were performed using a version of the MountainCar environment with episode length of 1000 steps, following the definition in Sutton and Barto book. The gym version of MountainCar has by deafult an episode length of 200 steps. To register a 1000 steps version add the following snippet to the gym/envs/__init__.py file:

register(
    id='MountainCarLong-v0',
    entry_point='gym.envs.classic_control:MountainCarEnv',
    max_episode_steps=1000,
    reward_threshold=-110.0,
)

To use the sparse reward version of the environment (reward=0 unless reaching goal) add the --binreward flag to execution. Train a DORA agent (LLL-softmax) on the sparse-reward version of mountaincar (1000 episode steps):

$ python3 linapprox/main.py -a dora -e MountainCarLong-v0 --binreward

DQN Implementation for ATARI

The implementation is based on the atari-rl package, currently under this fork. We added support for a two-streamed network for predicting Q and E values. Exploration bonus was added to the reward, and action-selection was e-greedy (more details can be found in section 4 of the paper). To run the e-values based agent, add the --e_network flag to train the two-streamed network for predicting E-values, and the --e_exploration_bonus flag to use generalized counters (E-values) as exploration bonus.

$ python3 main.py --game freeway --e_network --e_exploration_bonus

Reproducibility Challenge

The paper was part of the ICLR 2018 Reproducibility Challenge. A discussion of the replication study can be found in the OpenReviews submission forum. We forked the package of the replication study by Drew Davis, Jiaxuan Wang, and Tianyang Pan adding some details that were missing in their original implementation. Our revised version of their replication project can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
atari-rl @ 53f0d89		atari-rl @ 53f0d89
deep_exploration_with_E_network @ 2349bc9		deep_exploration_with_E_network @ 2349bc9
linapprox		linapprox
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DORA The Explorer: Directed Outreaching Reinforcement Action-Selection

Linear Approximation with tile-coding

DQN Implementation for ATARI

Reproducibility Challenge

About

Releases

Packages

Contributors 2

Languages

License

borgr/DORA

Folders and files

Latest commit

History

Repository files navigation

DORA The Explorer: Directed Outreaching Reinforcement Action-Selection

Linear Approximation with tile-coding

DQN Implementation for ATARI

Reproducibility Challenge

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages