Skip to content
No description, website, or topics provided.
Python C++
Branch: master
Clone or download
Latest commit e5510b4 Nov 21, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
craft_env @ 3f5f0d1 update submodule Nov 18, 2018
.gitmodules submodule rename Nov 18, 2018
CONTRIBUTING.md Initial commit Jun 25, 2018
LICENSE Initial commit Jun 25, 2018
README.md Update README.md Nov 21, 2018
__init__.py Update environment repo name Jul 12, 2018
batcher.cc
dynamic_batching.py Initial commit Jun 25, 2018
dynamic_batching_test.py from six.moves import range Jun 27, 2018
environments.py Flag for reusing the same environment after reset Jul 25, 2018
experiment.py Added option to run IMPALA-like random task-actors Aug 15, 2018
py_process.py Initial commit Jun 25, 2018
py_process_test.py from six.moves import range Jun 27, 2018
vtrace.py Initial commit Jun 25, 2018
vtrace_test.py Fix error in ground truth calculation in vtrace_test. And change the … Jun 26, 2018

README.md

Automated Curriculum Learning

How would you make an agent capable of solving the complex hierarchical tasks?

Imagine a problem that is complex and requires a collection of skills, which are extremely hard to learn in one go with sparse rewards (e.g. solving complex object manipulation in robotics). Hence, one might need to learn to generate a curriculum of simpler tasks, so that overall a student network can learn to perform a complex task efficiently. Designing this curriculum by hand is inefficient. In this project, I set out to train an automatic curriculum generator using a Teacher network which keeps track of the progress of the student network, and proposes new tasks as a function of how well the student is learning. I adapted an state-of-the-art distributed reinforcement learning algorithm, for training the student network, while using an adversarial multi-armed bandit algorithm, for teacher network. I also developed an environment, Craft Env, with possibility of hierarchical task design with a range of complexity that is fast to iterate through. I analysed how using different metrics for quantifying student progress affect the curriculum that the teacher learns to propose and demonstrate that this approach can accelerate learning and interpretability of how the agent is learning to perform complex tasks. In order to start, I adapted the Craft Environment from work by Andreas et al.,[1] as it has a nice simple structure with possibility of hierarchical task design with a range of complexity that is fast to iterate through. I have developed a fixed curriculum of simpler target sub-tasks (in craft environment: "get wood" "get grass" "get iron" "make cloth" "get gold"), and in the future will make a teacher network who proposes tasks for the student to learn. I could also kick-start the student with demonstrations from an expert.

I have interfaced IMPALA[2], a GPU utilised version of A3C architecture which uses multiple distributed actors with V-Trace off-policy correction, with my Craft Environment to train on all the possible Craft tasks concurrently. This is possible by providing the hash of the task name as instruction to the network (similar setup to DMLab IMPALA, using an LSTM to process the instruction).

Other papers that I am inspired by in this work include [3], [4].

Usage:

Run with

python experiment.py --num_actors=48 --batch_size=32

Dependencies:

Note:

I have done multiple modifications to the Craft Environment and IMPALA code for integration. This code should also be able to integrate with Gym environments with minor changes, which will be added soon. Currently the wrapper for the Craft Env mimics DMLab interface.

Results


Visualisation of a trained agent solving multiple tasks in Craft Env.
The instruction is shown at the top, the 2D grid in the middle and the inventory is shown at the bottom. Each color correspond to a different object or workshop, the player is the red dot. For example in the "get grass" task, when the player picks up a green square (grass), the screen flashes to indicate a positive reward.

A link to an accompanying blog detailing all the methods and discussing the results: rlcurriculum.github.io

Acknowledgements

References

You can’t perform that action at this time.