CLGridWorld

Configurable Curriculum Learning Domain for Reinforcement Learning Agents

Curriculum Learning
Grid World Domain
Installation
Grid World Generation
- Degrees of freedom
- Rules
Basic Usage
- Grid World Generation
- Example Agents
Gym Environment
- Observations, State Space
- Action Space
Todo
Contributing
Running the Tests
Authors
License
References

Curriculum Learning

Quoted directly from [1]

"As reinforcement learning (RL) agents are challenged to learn increasingly complex tasks, some of these tasks may be in-feasible to learn directly. Various transfer learning methods and frameworks have been proposed that allow an agent to better learn a difficult target task by levering knowledge gained in one or more source tasks [Taylor and Stone, 2009;Lazaric, 2011]. Recently, these ideas have been extended to the problem of curriculum learning, where the goal is to design a curriculum consisting of a sequence of training tasks that are learned by the agent prior to learning the target task."

Grid World Domain

Quoted directly from [1]

"The world consists of a room, which can contain 4 types of objects. Keys are items the agent can pick up by moving to them and executing a pickup action. These are used to unlock locks. Each lock in a room is dependent on a set of keys. If the agent is holding the right keys, then moving to a lock and executing an unlock action opens the lock. Pits are obstacles placed throughout the domain. If the agent moves into a pit, the episode is terminated. Finally, beacons are landmarks that are placed on the corners of pits.

The goal of the learning agent is to traverse the world and unlock all the locks. At each time step, the learning agent can move in one of the four cardinal directions, execute a pickup action, or an unlock action. Moving into a wall causes no motion. Sucessfully picking up a key gives a reward of +500, and sucessfully unlocking a lock gives a reward of +1000. Falling into a pit terminates the episode with a reward of -200. All other actions receive a constant step penalty of -10."

Installation

Compatible with python 3.6 and upwards

git clone https://github.com/LeroyChristopherDunn/CLGridWorld.git
cd CLGridWorld
pip install -e .

Grid World Generation

The GridWorldGenerator can be used to create a variety of grid worlds. All generated grid worlds subclass the gym.Env class from and therefore can be used in a plug-and-play fashion with various rl agents developed by the community

Degrees of freedom

Currently the CL grid world has the following degrees of freedom:

grid size
player start location
key location (optional)
lock location (optional)
pit start location (optional)
pit end location (optional)

The degrees of freedom marked as optional, can be excluded from an generated grid world. For instance, a grid world may be created with or without a pit.

Rules

In it's standard form, an episode ends when an agent collects all keys and unlocks all locks. If the lock location is not specified to the grid world generator, a grid world without a lock will be generated and the episode will end when the agent collects all keys. If the key location is not specified to the grid generator, a grid world without a key will be generated and the agent will begin the episode with all keys. Either key location, lock location, or both must be passed to the grid world generator.

Pit start location and end location define the starting and end points of the pit rectangle. Either both locations must be passed to the grid world generator to create a grid world with a pit, or both excluded to created a grid world without a pit.

Basic Usage

Grid World Generation

Below are code snippets to generate grid worlds with varying features

Complete Spec (key, lock, and pit)

from clgridworld.grid_world_builder import GridWorldBuilder, InitialStateParams

params = InitialStateParams(shape=(10, 10), player=(1, 4), key=(7, 5), lock=(1, 1), pit_start=(4, 2),
                            pit_end=(4, 7))    
env = GridWorldBuilder.create(params)

Key Only

from clgridworld.grid_world_builder import GridWorldBuilder, InitialStateParams

params = InitialStateParams(shape=(5, 5), player=(4, 4), key=(0, 0))
env = GridWorldBuilder.create(params)

Lock and Pit

from clgridworld.grid_world_builder import GridWorldBuilder, InitialStateParams

params = InitialStateParams(shape=(7, 7), player=(6, 5), lock=(0, 1), pit_start=(3, 2), pit_end=(3, 6))
env = GridWorldBuilder.create(params)

Example Agents

See the examples directory.

Run to run an simple random agent.
Run to run a basic q learning agent with epsilon greedy exploration
Run to run a basic q learning agent wiht epsilon decreasing exploration

Gym Environment

Observations, State Space

Each observation is a dictionary with the keys defined below:

Key	Type	Nullable ('None')
grid_size	tuple (int, int)
player	tuple (int, int)
lock	tuple (int, int)	x
key	tuple (int, int)	x
pit_start	tuple (int, int)	x
pit_end	tuple (int, int)	x
nw_beacon	tuple (int, int)	x
ne_beacon	tuple (int, int)	x
sw_beacon	tuple (int, int)	x
se_beacon	tuple (int, int)	x
has_key	boolean 0 or 1

For example to retrieve the player coords from the observation

player = observation["player"]

Action Space

Key	Description
0	North
1	East
2	South
3	West
4	Pick up key
5	Unlock lock

Todo

Create environment wrappers to expose alternative observations, such as those described in [1]
Pass additional parameters to grid world generator to configure the reward function
Pass additional parameters to grid world generator to define 'empty space' terminal states

Contributing

Please do

Running the Tests

from the project root

python -m unittest

Authors

Leroy Dunn - Initial work - LeroyChristopherDunn

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details

References

[1] Narvekar, Sanmit, Jivko Sinapov, and Peter Stone. "Autonomous Task Sequencing for Customized Curriculum Design in Reinforcement Learning." IJCAI. 2017.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
clgridworld		clgridworld
example		example
img		img
tests		tests
LICENSE		LICENSE
README.md		README.md
colab_demo.ipynb		colab_demo.ipynb
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLGridWorld

Curriculum Learning

Grid World Domain

Installation

Grid World Generation

Degrees of freedom

Rules

Basic Usage

Grid World Generation

Complete Spec (key, lock, and pit)

Key Only

Lock and Pit

Example Agents

Gym Environment

Observations, State Space

Action Space

Todo

Contributing

Running the Tests

Authors

License

References

About

Releases

Packages

Languages

License

LeroyChristopherDunn/CLGridWorld

Folders and files

Latest commit

History

Repository files navigation

CLGridWorld

Curriculum Learning

Grid World Domain

Installation

Grid World Generation

Degrees of freedom

Rules

Basic Usage

Grid World Generation

Complete Spec (key, lock, and pit)

Key Only

Lock and Pit

Example Agents

Gym Environment

Observations, State Space

Action Space

Todo

Contributing

Running the Tests

Authors

License

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages