Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
README.md
explore_goal_locations_large.lua
explore_goal_locations_small.lua [levels] Release DMLab-30 levels. Feb 7, 2018
explore_object_locations_large.lua [levels] Release DMLab-30 levels. Feb 7, 2018
explore_object_locations_small.lua
explore_object_rewards_few.lua [levels] Release DMLab-30 levels. Feb 7, 2018
explore_object_rewards_many.lua [levels] Release DMLab-30 levels. Feb 7, 2018
explore_obstructed_goals_large.lua [levels] Release DMLab-30 levels. Feb 7, 2018
explore_obstructed_goals_small.lua [levels] Release DMLab-30 levels. Feb 7, 2018
language_answer_quantitative_question.lua [levels] Release DMLab-30 levels. Feb 7, 2018
language_execute_random_task.lua
language_select_described_object.lua
language_select_located_object.lua [levels] Release DMLab-30 levels. Feb 7, 2018
lasertag_one_opponent_large.lua [levels] Release DMLab-30 levels. Feb 7, 2018
lasertag_one_opponent_small.lua [levels] Release DMLab-30 levels. Feb 7, 2018
lasertag_three_opponents_large.lua
lasertag_three_opponents_small.lua
natlab_fixed_large_map.lua [levels] Release DMLab-30 levels. Feb 7, 2018
natlab_varying_map_randomized.lua [levels] Release DMLab-30 levels. Feb 7, 2018
natlab_varying_map_regrowth.lua [levels] Release DMLab-30 levels. Feb 7, 2018
psychlab_arbitrary_visuomotor_mapping.lua [DMLab-30] Release remaining two levels. May 14, 2018
psychlab_continuous_recognition.lua
psychlab_sequential_comparison.lua [levels] Release DMLab-30 levels. Feb 7, 2018
psychlab_visual_search.lua [levels] Release DMLab-30 levels. Feb 7, 2018
rooms_collect_good_objects_test.lua
rooms_collect_good_objects_train.lua [levels] Release DMLab-30 levels. Feb 7, 2018
rooms_exploit_deferred_effects_test.lua [levels] Release DMLab-30 levels. Feb 7, 2018
rooms_exploit_deferred_effects_train.lua [levels] Release DMLab-30 levels. Feb 7, 2018
rooms_keys_doors_puzzle.lua
rooms_select_nonmatching_object.lua [levels] Release DMLab-30 levels. Feb 7, 2018
rooms_watermaze.lua
skymaze_irreversible_path_hard.lua [levels] Release DMLab-30 levels. Feb 7, 2018
skymaze_irreversible_path_varied.lua [levels] Release DMLab-30 levels. Feb 7, 2018

README.md

DMLab-30

DMLab-30 is a set of environments designed for DeepMind Lab. These environments enable a researcher to develop agents for a large spectrum of interesting tasks either individually or in a multi-task setting.

  1. rooms_collect_good_objects_{test,train}
  2. rooms_exploit_deferred_effects_{test,train}
  3. rooms_select_nonmatching_object
  4. rooms_watermaze
  5. rooms_keys_doors_puzzle
  6. language_select_described_object
  7. language_select_located_object
  8. language_execute_random_task
  9. language_answer_quantitative_question
  10. lasertag_one_opponent_small
  11. lasertag_three_opponents_small
  12. lasertag_one_opponent_large
  13. lasertag_three_opponents_large
  14. natlab_fixed_large_map
  15. natlab_varying_map_regrowth
  16. natlab_varying_map_randomized
  17. skymaze_irreversible_path_hard
  18. skymaze_irreversible_path_varied
  19. psychlab_arbitrary_visuomotor_mapping
  20. psychlab_continuous_recognition
  21. psychlab_sequential_comparison
  22. psychlab_visual_search
  23. explore_object_locations_small
  24. explore_object_locations_large
  25. explore_obstructed_goals_small
  26. explore_obstructed_goals_large
  27. explore_goal_locations_small
  28. explore_goal_locations_large
  29. explore_object_rewards_few
  30. explore_object_rewards_many

Rooms

Collect Good Objects

The agent must learn to collect good objects and avoid bad objects in two environments. During training, only some combinations of objects/environments are shown, hence the agent could assume the environment matters to the task due to this correlational structure. However it does not and will be detrimental in a transfer setting. We explicitly verify that by testing transfer performance on a held-out objects/environment combination. For more details, please see: Higgins, Irina et al. "DARLA: Improving Zero-Shot Transfer in Reinforcement Learning" (2017).

Test Regime: Test set consists of held-out combinations of objects/environments never seen during training.

Observation Spec: RGBD

Level Name: rooms_collect_good_objects_{test,train}

Exploit Deferred Effects

This task requires the agent to make a conceptual leap from picking up a special object to getting access to more rewards later on, even though this is never shown in a single environment and is costly. Expected to be hard for model-free agents to learn, but should be simple when using some model-based/predictive strategy.

Test Regime: Tested in a room configuration never seen during training, where picking up a special object suddenly becomes useful.

Observation Spec: RGBD

Level Name: rooms_exploit_deferred_effects_{test,train}

Select Non-matching Object

This task requires the agent to choose and collect an object that is different from the one it is shown. The agent is placed into a small room containing an out-of-reach object and a teleport pad. Touching the pad awards the agent with 1 point, and teleports them to a second room. The second room contains two objects, one of which matches the object in the previous room.

  • Collect matching object: -10 points.
  • Collect non-matching object: +10 points.

Once either object is collected the agent is returned to the first room, with the same initial object being shown.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: rooms_select_nonmatching_object

Watermaze

The agent must find a hidden platform which, when found, generates a reward. This is difficult to find the first time, but in subsequent trials the agent should try to remember where it is and go straight back to this place. Tests episodic memory and navigation ability.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: rooms_watermaze

Keys Doors Puzzle

A procedural planning puzzle. The agent must reach the goal object, located in a position that is blocked by a series of coloured doors. Single use coloured keys can be used to open matching doors and only one key can be held at a time. The objective is to figure out the correct sequence in which the keys must be collected and the rooms traversed. Visiting the rooms or collecting keys in the wrong order can make the goal unreachable.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: rooms_keys_doors_puzzle

Language

For details on the addition of language instructions, see: Hermann, Karl Moritz, & Hill, Felix et al. "Grounded language learning in a simulated 3D world. (2017)".

Select Described Object

The agent is placed into a small room containing two objects. An instruction is used to describe one of the objects. The agent must successfully follow the instruction and collect the goal object.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD and language

Level Name: language_select_described_object

Select Located Object

The agent is asked to collect a specified coloured object in a specified coloured room. Example instruction: “Pick the red object in the blue room.” There are four variants of the task, each of which have an equal chance of being selected. Variants have a different amount of rooms (between 2-6). Variants with more rooms have more distractors, making the task more challenging.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD and language

Level Name: language_select_located_object

Execute Random Task

The agent is given one of seven possible tasks, each with a different type of language instruction. Example instruction: “Get the red hat from the blue room.” The agent is rewarded for collecting the correct object, and penalised for collecting the wrong object. When any object is collected, the level restarts and a new task is selected.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD and language

Level Name: language_execute_random_task

Answer Quantitative Question

The agent is given a yes or no question based on object colors and counts. The agent selects a certain object to respond:

  • White sphere = yes
  • Black sphere = no
  • Example questions:
  • “Are all cars blue?”
  • “Is any car blue?”
  • “Is anything blue?”
  • “Are most cars blue?”

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD and language

Level Name: language_answer_quantitative_question

LaserTag

One Opponent Small

This task requires the agent to play laser tag in a procedurally generated map containing random gadgets and power-ups. The map is small and there is 1 opponent bot of difficulty level 4. The agent begins the episode with the default Rapid Gadget and a limit of 100 tags. The agent’s Shield will begin at 125 and slowly drop to the max amount of 100. The gadgets, powerups and map layout are random per episode and so the agent must adapt to each new environment.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: lasertag_one_opponent_small

Three Opponents Small

This task requires the agent to play laser tag in a procedurally generated map containing random gadgets and power-ups. The map is small and there are 3 opponent bots of difficulty level 4. The agent begins the episode with the default Rapid Gadget and a limit of 100 tags. The agent’s Shield will begin at 125 and slowly drop to the max amount of 100. The gadgets, powerups and map layout are random per episode and so the agent must adapt to each new environment.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: lasertag_three_opponents_small

One Opponent Large

This task requires the agent to play laser tag in a procedurally generated map containing random gadgets and power-ups. The map is large and there is 1 opponent bot of difficulty level 4. The agent begins the episode with the default Rapid Gadget and a limit of 100 tags. The agent’s Shield will begin at 125 and slowly drop to the max amount of 100. The gadgets, powerups and map layout are random per episode and so the agent must adapt to each new environment.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: lasertag_one_opponent_large

Three Opponents Large

This task requires the agent to play laser tag in a procedurally generated map containing random gadgets and power-ups. The map is large and there are 3 opponent bots of difficulty level 4. The agent begins the episode with the default Rapid Gadget and a limit of 100 tags. The agent’s Shield will begin at 125 and slowly drop to the max amount of 100. The gadgets, powerups and map layout are random per episode and so the agent must adapt to each new environment.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: lasertag_three_opponents_large

NatLab

Fixed Large Map

This is a long term memory variation of a mushroom foraging task. The agent must collect mushrooms within a naturalistic terrain environment to maximise score. The mushrooms do not regrow. The map is a fixed large-sized environment. The time of day is randomised (day, dawn, night). Each episode the spawn location is picked randomly from a set of potential spawn locations.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: natlab_fixed_large_map

Varying Map Regrowth

This is a short term memory variation of a mushroom foraging task. The agent must collect mushrooms within a naturalistic terrain environment to maximise score. The mushrooms regrow after around one minute in the same location throughout the episode. The map is a randomized small-sized environment. The topographical variation, and number, position, orientation and sizes of shrubs, cacti and rocks are all randomized. The time of day is randomised (day, dawn, night). The spawn location is randomised for each episode.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: natlab_varying_map_regrowth

Varying Map Randomized

This is a randomized variation of a mushroom foraging task. The agent must collect mushrooms within a naturalistic terrain environment to maximise score. The mushrooms do not regrow. The map is randomly generated and of intermediate size. The topographical variation, and number, position, orientation and sizes of shrubs, cacti and rocks are all randomised. Locations of mushrooms are randomized. The time of day is randomized (day, dawn, night). The spawn location is randomized for each episode.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: natlab_varying_map_randomized

SkyMaze

Irreversible Path Hard

This task requires agents to reach a goal located at a distance from the agent’s starting position. The goal and target are connected by a sequence of platforms placed at different heights. Jumping is disabled, so higher platforms are unreachable and the agent won’t be able to backtrack to a higher platform. This means that the agent is required to plan their route to ensure they do not become stuck and fail the task.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: skymaze_irreversible_path_hard

Irreversible Path Varied

A variation of the Irreversible Path Hard task. This version of the task will select a map layout of random difficulty for the agent to solve. The jump action is disabled (NOOP) for this task.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: skymaze_irreversible_path_varied

PsychLab

For details, see: Leibo, Joel Z. et al. "Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents (2018)".

Arbitrary Visuomotor Mapping

In this task, the agent is shown consecutive images with which they must remember associations with specific movement patterns (locations to point at). The agent is rewarded if it can remember the action associated with a given object. The images are drawn from a set of ~ 2500, and the specific associations are randomly generated and different in each episode.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: psychlab_arbitrary_visuomotor_mapping

Continuous Recognition

This task tests familiarity memory. Consecutive images are shown, and the agent must indicate whether or not they have seen the image before during that episode. Looking at the left square indicates no, and right indicates yes. The images (drawn from a set of ~2500) are shown in a different random order in every episode.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: psychlab_continuous_recognition

Sequential Comparison

Two consecutive patterns are shown to the agent. The agent must indicate whether or not the two patterns are identical. The delay time between the study pattern and the test pattern is variable.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: psychlab_sequential_comparison

Visual Search

A collection of shapes are shown to the agent. The agent must identify whether or not a specific shape is present in the collection. Each trial consists of the agent searching for a pink ‘T’ shape. Two black squares at the bottom of the screen are used for ‘yes’ and ‘no’ responses.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: psychlab_visual_search

Explore

Object Locations Small

This task requires agents to collect apples. Apples are placed in rooms within the maze. The agent must collect as many apples as possible before the episode ends to maximise their score. Upon collecting all of the apples, the level will reset, repeating until the episode ends. Apple locations, level layout and theme are randomized per episode. Agent spawn location is randomised per reset.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: explore_object_locations_small

Object Locations Large

This task is the same as Object Locations Small, but with a larger map and longer episode duration. Apple locations, level layout and theme are randomised per episode. Agent spawn location is randomised per reset.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: explore_object_locations_large

Obstructed Goals Small

This task is similar to Goal Locations Small - agents are required to find the goal as fast as possible, but now with randomly opened and closed doors. After the goal is found, the level restarts. Goal location, level layout and theme are randomized per episode. Agent spawn location is randomised per reset. Door states (open/closed) are randomly selected per reset, but a path to the goal always exists.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: explore_obstructed_goals_small

Obstructed Goals Large

This task is the same as Obstructed Goals Small, but with a larger map and longer episode duration. Goal location, level layout and theme are randomised per episode. Agent spawn location is randomised per reset. Door states (open/closed) are randomly selected per reset, but a path to the goal always exists.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: explore_obstructed_goals_large

Goal Locations Small

This task requires agents to find the goal object as fast as possible. After the goal object is found, the level restarts. Goal location, level layout and theme are randomised per episode. Agent spawn location is randomised per reset.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: explore_goal_locations_small

Goal Locations Large

This task is the same as Goal Locations Small, but with a larger map and longer episode duration. Goal location, level layout and theme are randomised per episode. Agent spawn location is randomised per reset.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: explore_goal_locations_large

Object Rewards Few

This task requires agents to collect human-recognisable objects placed around a room. Some objects are from a positive rewarding category, and some are negative. After all positive category objects are collected, the level restarts. Level theme, object categories and object reward per category are randomised per episode. Agent spawn location, object locations and number of objects per category are randomised per reset.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: explore_object_rewards_few

Object Rewards Many

This task is a more difficult variant of Object Rewards Few, with an increased number of goal objects and longer episode duration. Level theme, object categories and object reward per category are randomised per episode. Agent spawn location, object locations and number of objects per category are randomised per reset.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: explore_object_rewards_many