Using State Predictions for Value Regularization in Curiosity Driven Deep Reinforcement Learning
Tensorflow implementation of the algorithm described in ‘Using State Predictions for Value Regularization in Curiosity Driven Deep Reinforcement Learning’ using the maze environments.
sudo apt-get install -y tmux htop cmake golang libjpeg-dev git clone -b master https://github.com/ManuelFritsche/vpc.git conda create -n curiosity python=2.7 source activate curiosity pip install numpy pip install -r vpc/requirements.txt
cd vpc/ # for A3C remove --unsup, for PRED use --unsup pred, for VPC use --unsup vpc # for Maze A use --env-id mazeSmall-v0, for Maze B use --env-id mazeLarge-v0 python train.py --unsup vpc --env-id mazeSmall-v0
Training process is shown in Tensorboard on http://localhost:12345
The implentation is based on the code of Curiosity-driven Exploration by Self-supervised Prediction.
Vanilla A3C code is based on the open source implementation of universe-starter-agent.
Maze implementations are based on Pycolab