Skip to content
QMDP-Net implementation
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore Edit readme Oct 11, 2017 adding files Oct 10, 2017 adding files Oct 10, 2017 adding files Oct 10, 2017 adding files Oct 10, 2017 adding files Oct 10, 2017 adding files Oct 10, 2017


Implementation of the NIPS 2017 paper:

QMDP-Net: Deep Learning for Planning under Partial Observability
Peter Karkus, David Hsu, Wee Sun Lee
National University of Singapore

The code implements the 2D grid navigation domain, and a QMDP-net with 2D state space in tensorflow.


Python 2.7
Tensorflow 1.3.0
Python packages: numpy, scipy, tables, pymdptoolbox, tensorpack

To install these packages using pip:

pip install tensorflow
pip install numpy scipy tables pymdptoolbox tensorpack

Optional: to speed up data generation download and install the latest version of pymdptoolbox

git clone pymdptoolbox
cd ./pymdptoolbox
python install

Train and evaluate a QMDP-net

The folder ./data/grid10 contains training and test data for the deterministic 10x10 grid navigation domain (10,000 environments, 5 trajectories each for training, 500 environments, 1 trajectory each for testing).

Train network using only the first 4 steps of each training trajectory:

python ./data/grid10/ --logpath ./data/grid10/output-lim4/ --lim_traj_len 4

The learned model will be saved to ./data/grid10/output-lim4/final.chk

Load the previously saved model and train further using the full trajectories:

python ./data/grid10/ --logpath ./data/grid10/output-lim100/ --loadmodel ./data/grid10/output-lim4/final.chk --lim_traj_len 100

For help on arguments execute:

python --help

Evaluate a previously trained model

A model trained by the commands above is readily available in the folder: data/grid10/trained-model. You may load and evaluate this model using the following command:

python ./data/grid10/ --loadmodel ./data/grid10/trained-model/final.chk --epochs 0

The expected output:

Evaluating 100 samples, repeating simulation 1 time(s)
Success rate: 1.000  Trajectory length: 7.3  Collision rate: 0.000
Success rate: 0.990  Trajectory length: 7.1  Collision rate: 0.000

Generate training and test data

You may generate data using the script
As an example, the command for the 18x18 deterministic grid navigation domain is:

python ./data/grid18/ 10000 500 --N 18 --train_trajs 5 --test_trajs 1

This will generate 10,000 random environments for training, 500 for testing, 5 and 1 trajectories per environment.

For the stochastic variant use:

python ./data/grid18/ 10000 500 --N 18 --train_trajs 5 --test_trajs 1 --Pmove_succ 0.8 --Pobs_succ 0.9

For help on arguments execute:

python --help
You can’t perform that action at this time.