DIScriminator DisAgreement INtrinsic Reward (DISDAIN), a self-contained JAX implementation

This is a simplified version of the code used in Learning more skills through optimistic exploration (appearing at ICLR 2022).

This Colab trains an agent with a tabular Q function and a tabular discriminator ensemble on a scaled down version of the Four Rooms environment. It will parallelize across all available devices. We recommend training on a GPU backend or Colab Pro TPU backend.

The environment has 24 states. With 8 transitions, all but one state is reachable from the initial state in the top left corner. This means that at most 23 distinguishable skills can be learned.

With the default hyperparameters on a single accelerator, skill learning with DISDAIN achieves approximately 15 effective skills in 500,000 steps and 21 effective skills in 1,000,000 steps, while a matched hyperparameter baseline (with discriminator ensemble disabled) attains approximately 11 effective skills through the course of training (approximately 12 if deriving rewards from an ensemble average, without the DISDAIN bonus). Each agent trains in approximately 12 minutes on the default GPU backend.

This implementation broadly matches the setting of the Four Rooms experiments from the paper, with the following differences:

the Four Rooms grid world has been scaled down;
trajectories are generated online, rather than placed in and sampled from a replay buffer;
the learning rate and bonus weight have been re-tuned in light of the above.

Installation

Simply open the file in Google Colab and run the cells in order.

Citing this work

BibTeX for citing the DISDAIN paper:

@inproceedings{
  strouse2022learning,
  title={Learning more skills through optimistic exploration},
  author={DJ Strouse and Kate Baumli and David Warde-Farley and Volodymyr Mnih and Steven Stenberg Hansen},
  booktitle={International Conference on Learning Representations},
  year={2022},
  url={https://openreview.net/forum?id=cU8rknuhxc}
}

License and disclaimer

All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0

All other materials are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode

Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.

This is not an official Google product.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
disdain.ipynb		disdain.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

README.md

README.md

disdain.ipynb

disdain.ipynb

Repository files navigation

DIScriminator DisAgreement INtrinsic Reward (DISDAIN), a self-contained JAX implementation

Installation

Citing this work

License and disclaimer

About

Releases

Packages

Contributors 2

Languages

License

google-deepmind/disdain

Folders and files

Latest commit

History

Repository files navigation

DIScriminator DisAgreement INtrinsic Reward (DISDAIN), a self-contained JAX implementation

Installation

Citing this work

License and disclaimer

About

Resources

License

Stars

Watchers

Forks

Languages