pi-IW

Implementation of the paper Deep Policies for Width-Based Planning in Pixel Domains, appearing in the Proceedings of the 29th International Conference on Automated Planning and Scheduling (ICAPS 2019).

pi-IW is an on-line planning algorithm that enhances Rollout IW by incorporating an action selection policy, resulting in an informed width-based search. It interleaves a planning and a learning step:

Planning: expands a tree using Rollout IW guided by a policy estimate.
Learning: a target policy is extracted from the planned trajectories and is used to update the policy estimate (supervised learning).

On-line planning

To illustrate how on-line planning works, and the benefit of adding a guiding policy, we provide examples for planning in MDPs in a simple corridor task:

One planning step of Rollout IW (off-line)
On-line Rollout IW: interleaving planning steps with action executions, without any policy guidance or learning
On-line planning and learning: pi-IW using BASIC and dynamic features

Experiments

We compare our algorithm with AlphaZero, and demonstrate that pi-IW has superior performance in simple, sparse-reward environments. For instance, to run pi-IW with dynamic features in the two walls environment:

python3 piIW_alphazero.py --algorithm pi-IW-dynamic --seed 1234 --env GE_MazeKeyDoor-v2

See the help (-h) section for more details.

For atari games, use the deterministic version of the gym environments, which can be specified by selecting v4 environments (e.g. "Breakout-v4").

Important: this repository is a reimplementation of the algorithm in Tensorflow eager (2.0 compatible), which is more clear, intuitive, and easier to modify and debug. The results of the paper were obtained from a previous implementation in tensorflow graph (v1.4), which can be found in a separate branch. This previous version of the code allows to parallelize the algorithm using one parameter server and many workers, using distributed tensorflow.

Installation

Install the requirements (numpy, tensorflow and gym packages)
Make sure that gridenvs is added to the python path.

Update (31/07/2020)

We corrected a bug that altered the input of the neural network for atari games. This affects the results of Table 2 of the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
atari_wrappers.py		atari_wrappers.py
experience_replay.py		experience_replay.py
mcts.py		mcts.py
online_planning.py		online_planning.py
online_planning_learning.py		online_planning_learning.py
piIW_alphazero.py		piIW_alphazero.py
planning_step.py		planning_step.py
readme.md		readme.md
requirements.txt		requirements.txt
rollout_iw.py		rollout_iw.py
supervised_policy.py		supervised_policy.py
tree.py		tree.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pi-IW

On-line planning

Experiments

Installation

Update (31/07/2020)

About

Releases

Packages

Languages

aig-upf/pi-IW

Folders and files

Latest commit

History

Repository files navigation

pi-IW

On-line planning

Experiments

Installation

Update (31/07/2020)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages