BLP Minimax Regret: A ReMiDi 💉 to Irreducible Regret

This is the repository for the paper Refining Minimax Regret for Unsupervised Environment Design. We introduce a new objective and solution concept (called Bayesian Level Perfect Minimax Regret (BLP)) for Unsupervised Environment Design (UED) that does not fail when there are environments with irreducible regret.

This repository contains an algorithm, ReMiDi that, at equilibrium, results in a BLP policy.

What's Included

Environments

We have implementations of the the three primary domains used in our paper.

T-Maze & Mazes

This environment can either result in a T-Maze

or mazes like this:

Blindfold

The blindfold experiment allows the adversary to zero out the agent's observation; in all other aspects it is the same as the normal maze.

Lever Game

The lever game there are 64 levers to pull, one of which is correct (reward of +1), and pulling a wrong lever results in a reward of -1. The adversary can make the correct lever known or unknown to the agent. In the latter case, the reward is multiplied by 10 (to simulate a harder problem having a higher reward).

Algorithms

PLR

The code for PLR is implemented in x_minigrid_plr.py.

ReMiDi

The code for our algorithm, ReMiDi, is implemented in x_minigrid_remidi.py.

How it works

ReMiDi has multiple PLR buffers, and we start with the first one, performing standard PLR on it for a certain number of iterations. Thereafter, we go on to the next buffer and again perform PLR. This time, however, any level that has perfect trajectory overlap with a level in any of the previous buffers is ignored. The agent is also only updated on parts of the trajectories that do not overlap with the previous ones.

How we determine trajectory overlap is by using the parallel step environment (lib/wrappers/parallel_step.py). This allows us to perform the same action that we take on our current level on a set of other levels, in our case all levels from a previous buffer. Since our environments are deterministic, we can compare the observations of the current level to each of the previous ones to determine trajectory overlap.

Reproduction

Installation

Install Jax, for instance, using something like:

pip install --upgrade "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

And then the other packages:

pip install orbax-checkpoint==0.4.8 flax numpy wandb matplotlib pandas gymnax pygame distrax moviepy imageio

Finally, install JaxUED by following the instructions here.

A requirements.txt file is also provided.

Running

In general, you can run

python x_minigrid_plr.py

for PLR, with arguments specified in lib/common/arguments.py.

Or, to run ReMiDi (which has four additional arguments specified in x_minigrid_remidi.py)

python x_minigrid_remidi.py

The scripts in the scripts directory have the command line flags to run each of our experiments.

Acknowledgements

This code uses JaxUED as a library, and also uses code from its examples.

Citation

For attribution in academic work, please cite our work as

@inproceedings{beukman2024Refining,
  title={Refining Minimax Regret for Unsupervised Environment Design},
  author={Beukman, Michael and Coward, Samuel and Matthews, Michael and Fellows, Mattie and Jiang, Minqi and Dennis, Michael and Foerster, Jakob},
  booktitle={International Conference on Machine Learning},
  year={2024},
organization={PMLR}

}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
figures		figures
lib		lib
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
x_minigrid_plr.py		x_minigrid_plr.py
x_minigrid_remidi.py		x_minigrid_remidi.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BLP Minimax Regret: A ReMiDi 💉 to Irreducible Regret

What's Included

Environments

T-Maze & Mazes

Blindfold

Lever Game

Algorithms

PLR

ReMiDi

How it works

Reproduction

Installation

Running

Acknowledgements

Citation

About

Releases

Packages

Languages

Michael-Beukman/ReMiDi

Folders and files

Latest commit

History

Repository files navigation

BLP Minimax Regret: A ReMiDi 💉 to Irreducible Regret

What's Included

Environments

T-Maze & Mazes

Blindfold

Lever Game

Algorithms

PLR

ReMiDi

How it works

Reproduction

Installation

Running

Acknowledgements

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages