Path Consistency Learning in Tensorflow

This is Tensorflow (partially using keras) implementation of PCL as described in Bridging the Gap Between Value and Policy Based Reinforcement Learning.

Requirement

Verified on the following environment,

python main.py --target_task CartPole-v0

If you want to render, please add "-v" argument

python main.py --target_task CartPole-v0 -v

Verified work on the task "Copy-v0", but not as much as their report.

python main --target_task Copy-v0 --tau 0.005 --gamma 0.9 --d 10 -b 400 --step_to_report 100 -r 5e-5 --start_at 2000 -c 0.5 --with_lstm

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
README.md		README.md
__init__.py		__init__.py
main.py		main.py
policy.py		policy.py
replay_buffer.py		replay_buffer.py