Skip to content

Simple tensorflow implementation of Asynchronous Advantage Actor-Critic (A3C) for a 2-D grid environment

Notifications You must be signed in to change notification settings

akileshbadrinaaraayanan/A3C_grid_world

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Tensorflow implementation of A3C for a 2D grid environment

1) Environment description

The agent (red in color) should navigate to the purple circle (reward) by avoiding the moving obstacles (green triangles). The reward location is randomly placed in the 2D grid environment at every episode. The agent generalizes well to unseen reward locations as well as unseen grid sizes at test time.

1

2) Code organization

A3C_tensorflow directory

For train time, the below files are used:

grid_env_r1.py contains the implementation of A3C algorithm.

environment_a3c_r1.py is the code for 2D environment.

For test time, the below files are used:

grid_env_test.py - loads the saved model.

environment_a3c_load_weights.py : game logic for 2D environment.

renderenv_load_weights.py : To monitor how the agent behaves by rendering.

A3C_complex_environment directory

The environment is more complex in this case with more number of obstacles as well as obstacles that move in both the directions (left-to-right and right-to-left). For train time, the below files are used:

grid_env_r3.py contains the implementation of A3C algorithm.

environment_a3c_r3.py is the code for 2D environment.

For test time, the below files are used:

grid_env_test.py - loads the saved model.

environment_a3c_load_weights.py : game logic for 2D environment.

renderenv_load_weights.py : To monitor how the agent behaves by rendering.

In both these cases, a non-uniform reward decay is used for the convergence of RL agent.

3) How to train?

CUDA_VISIBLE_DEVICES="" python grid_env_r1.py

If you want to just test with pre-trained model (stored inside models directory). Rendering is also enabled at test time.

CUDA_VISIBLE_DEVICES="" python grid_env_test.py

4) Acknowledgement

The basic environment code is based on grid world environment here

About

Simple tensorflow implementation of Asynchronous Advantage Actor-Critic (A3C) for a 2-D grid environment

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages