Skip to content

ShawnLue/Dynamic-VIN-in-Gridworld

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learning to plan using dynamic VIN

A variation of Value Iteration Network, NIPS 2016 [arxiv] .

The main idea building upon original VIN, is to iterate a generated step-wise reward map in the value-iteration loop, in order to learn to plan in a dynamic scene. This work can be combined with "Video Prediction" techniques, and it is still in progress. Currently, it is trained by using the ground-truth state in the simulator.

We use A3C + Curriculum Learning for Rl-training scheme, similar to [Wu et al, ICLR 2017]. Due to the skeleton method of pygame rendering, here we use multi-processes to generate experience from simulator, instead of multiple threads.

results

map1 map2

About the code

The a3c.py defines the policy/value network with a share structure (a3c) embedded with a VI Module, as the following,

VIN.

The agent.py indicates the single agent and interaction with the environment in reinforcement learning stage, which includes the async with global model and the training methods.

The thread.py contains high-level distributed training with tf.train.ClusterSpec, and curriculum settings.

The constants.py defines all the hyper-parameters.

How to use

  1. Start training: bash train_scipt.sh
  2. Open tmux for monitoring: tmux a -t a3c (you can monitor each thread by switching tmux control pane: ctrl + b, w)
  3. Open tensorboard: **.**.**.**:15000
  4. Check log: less Curriculum log
  5. Stop training: ctrl + c

Requirements

  • Tensorflow 1.1
  • Pygame
  • Numpy

MISC

I completed this code when I was an intern at Horizon Robotics. Greatly thanks my mentor Penghong Lin, and Lisen Mu for helpful discussions.

Useful Resources

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published