SoftActorCritic_ReinforcementLearning

This is my implementation of the discrete Soft Actor Critic algorithm for Reinforcement Learning. More details about the algorithm can be found here:

Christodoulou, Petros. "Soft actor-critic for discrete action settings." arXiv preprint arXiv:1910.07207 (2019).

This implementation works with all the Open AI Gym Environments having a discrete action set, but right now it plays Space Invaders.

How to use it?

The requirements for the repo to work are in env.yml (if you use conda: conda env create -f env.yml).

First, edit the config01.json file (or create a new one), with the metaparameters and variables used for the training:

configId: string identifier of the traning,
env_parameters:
- screen_size: size of the video game screen,
- frame_skip: frame skip value,
- seed_value: random number generators' seed
training_parameters:
- n_episodes: number of episodes to play during training
- t_tot_cut: maximum number of moves to play in each episode
- batch_size: number of previous moves to use during a training step
agent_parameters:
- gamma: discount rate
- lr_Q: learning rate of the Q-function
- lr_pi: learning rate of the policy
- lr_alpha: learning rate of the temperature
- alpha: value of the temperature. Set it to "auto" if you want it to be learnt during training
- tau: update rate of the Q target functions
- entropy_rate: constant to be put in front of the target entropy if alpha="auto". The lower, the less random the moves will be at the end.
- h_dim: hidden units of the Q-functions flat layers
- h_mu_dim: hidden units of the policy flat layers

The run "python train.py" to start training. The training curve will be saved into the "train_figs" folder every 100 episodes, while the model parameters will be stored in "saved_models".

When the training is over, launch "python best_of_100_episodes.py" to have the best episode reward out of a sample of 100. Best score on the Open AI Leader Board (https://github.com/openai/gym/wiki/Leaderboard) is 3454. Here, we should get around 1800.

To generate a small movie of a match use instead "python generate_match_video.py". There is an example in the /video directory.

config01_seed_0.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
saved_models		saved_models
train_figs		train_figs
videos		videos
.gitattributes		.gitattributes
README.md		README.md
best_of_100_episodes.py		best_of_100_episodes.py
config01.json		config01.json
config02.json		config02.json
env.yml		env.yml
generate_match_video.py		generate_match_video.py
memories.py		memories.py
neural_nets_SAC.py		neural_nets_SAC.py
qnet_agentsSAC_auto.py		qnet_agentsSAC_auto.py
train.py		train.py
watch-a-match.ipynb		watch-a-match.ipynb

bernomone/SoftActorCritic-ReinforcementLearning

Folders and files

Latest commit

History

Repository files navigation

SoftActorCritic_ReinforcementLearning

How to use it?

About

Resources

Stars

Watchers

Forks

Languages