Implementation of Asynchronous Advantage Actor-Critic algorithm using Long Short Term Memory Networks (A3C-LSTM)

Modified from the work of Arthur Juliani: Simple Reinforcement Learning with Tensorflow Part 8: Asynchronous Actor-Critic Agents (A3C)

Paper can be found here: "Asynchronous Methods for Deep Reinforcement Learning" - Mnih et al., 2016

Tested on CartPole

Requirements

Gym and TensorFlow.

Usage

Training only happens on minibatches of greater than 30, effectively preventing poor performing episodes from influencing training. A reward factor is used to allow for effective training at faster learning rates.

Models are saved every 100 episodes. They can be reloaded for further training or visualised for testing by setting either of the global parameters to True.

This is just example code to test an A3C-LSTM implementation. This should not be considered the optimal way to learn for this environment!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
ac_network.py		ac_network.py
cartpole_a3c.py		cartpole_a3c.py
worker.py		worker.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ac_network.py

ac_network.py

cartpole_a3c.py

cartpole_a3c.py

worker.py

worker.py

Repository files navigation

Implementation of Asynchronous Advantage Actor-Critic algorithm using Long Short Term Memory Networks (A3C-LSTM)

Requirements

Usage

About

Releases

Packages

Languages

bekerov/A3C-LSTM

Folders and files

Latest commit

History

Repository files navigation

Implementation of Asynchronous Advantage Actor-Critic algorithm using Long Short Term Memory Networks (A3C-LSTM)

Requirements

Usage

About

Resources

Stars

Watchers

Forks

Languages