Soft Actor-Critic (Haarnoja et al., 2017)

In this repo we implement soft actor-critic and test it using the continuous-actions implementation of Lunar Lander from Open AI Gym. The Lunar Lander problem is considered solved by algorithms that attain a reward of 200 or higher. Our algorithm comfortably reaches this level and attains similar performance to the spinning up SAC implementation.

A set of Tensorboard log files and a final saved policy are included in this directory.

Our approach is object orientated and defined in soft_actor_critic.py with utility functions defined in utils.py. Use train.py to train a soft actor-critic agent and test.py to see the results in testing and watch the trained agent play.

We use hyperparameters borrowed from Open AI's SAC implementation. In general, our approach is similar to that of Open AI in that we choose not to learn the weighting of the entropy in the policy objective preferring to fix this as a hyperparameter for simplicity. Furthermore, we borrow a couple of tricks to stabilise the implementation. Our implementation is however more widely applicable as we separate the agent's computation from that of the environment etc.

Our implementation is build on TensorFlow (v 1.14) and NumPy (v 1.16.4).

See the comments in the code for more implementation details.

Included is a response to the original paper which summarises and critiques soft actor-critic. This is provided in Soft Actor-Critic Response.pdf.

### Original Paper Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine, "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor," Deep Learning Symposium, NIPS 2017

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Documentation		Documentation
.gitignore		.gitignore
LICENSE		LICENSE
readme.md		readme.md
replay_buffer.py		replay_buffer.py
soft_actor_critic.py		soft_actor_critic.py
test.py		test.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation

Documentation

.gitignore

.gitignore

LICENSE

LICENSE

readme.md

readme.md

replay_buffer.py

replay_buffer.py

soft_actor_critic.py

soft_actor_critic.py

test.py

test.py

train.py

train.py

utils.py

utils.py

Repository files navigation

Soft Actor-Critic (Haarnoja et al., 2017)

About

Releases

Packages

Languages

License

IanRDavies/soft-actor-critic

Folders and files

Latest commit

History

Repository files navigation

Soft Actor-Critic (Haarnoja et al., 2017)

About

Resources

License

Stars

Watchers

Forks

Languages