In-Context Reinforcement Learning from Noise Distillation

This repo contains official implementation of In-Context Reinforcement Learning from Noise Distillation. The experiments in different environments are separated in folders. We did this on purpose not to overload our code with unnecessary if-else statements depending on environment to preserve readability.

Dependencies

The Dark environments could be installed through pip or Docker, while Watermaze is installed only with Docker. Watermaze depends heavily on the modified dm_lab code, so the dependencies are not that easily managed.

Dark Environments

As easy as it gets, you may just install all the requirements with:

python install -r requirements.txt

Tested on python 3.8.

If you'd like to use Docker, then do

# implying you are in dark_room 
# or key_to_door directory
docker build -t <IMAGE_NAME> .

To run the code, use:

docker run -it \
    --gpus=all \
    --rm \
    --name <CONTAINER_NAME> \
    <IMAGE_NAME> bash

and then execute scripts.

Watermaze

You have two options, the first one is to obtain a Docker image from DockerHub:

docker push suessmann/btd_dm_lab:1.1ad

The second option is to build a container yourself:

docker build -t <IMAGE_NAME> .

To run the scripts, use the following code:

# implying you are in the root of directory
docker run -it \ 
    --workdir /workspace \
    --rm \
    --volume ./watermaze/:/workspace/ \
    --name <CONTAINER_NAME> \
    <IMAGE_NAME> bash

Running experiments

Dark Environemnts

To run an experiment, simply run ad_<env_name>.py script, the data will generate automatically.

For example, if you wish to train AD$^\eps$ on Key-to-Door env with 50% performance from an optimal demonstrator:

python ad_dark_key2door.py --config_path="configs/ad-k2d.yaml" \
       --max_perf=0.5

Watermaze

Since data for Watermaze is heavy (~500GB), we cannot provide it to you. However, you can generate it yourself, first you obtain demonstrator policy by running

python ppo_watermaze.py --goal_x=<x> --goal_y=<y>

as many times as many goals you want. Then, generate noisy trajectories with 50% performance with

# note that we provide eps=0.7 
# for max_perf=0.5
python generate_watermaze_traj.py --num_goals=<num_goals> --hist_len=<hist_len> --eps 0.7

and then, finally, run the training script:

python ad_watermaze.py --config_path="configs/ad-watermaze.yaml" --learning_histories_path=<path>

there you go!

Citing

If you used this code for your research or a project, please cite us as:

@inproceedings{zisman2024emergence,
  title = 	 {Emergence of In-Context Reinforcement Learning from Noise Distillation},
  author =       {Zisman, Ilya and Kurenkov, Vladislav and Nikulin, Alexander and Sinii, Viacheslav and Kolesnikov, Sergey},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  year = 	 {2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
dark_room		dark_room
img		img
key_to_door		key_to_door
watermaze		watermaze
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

In-Context Reinforcement Learning from Noise Distillation

Dependencies

Dark Environments

Watermaze

Running experiments

Dark Environemnts

Watermaze

Citing

About

Releases

Packages

Contributors 2

Languages

corl-team/ad-eps

Folders and files

Latest commit

History

Repository files navigation

In-Context Reinforcement Learning from Noise Distillation

Dependencies

Dark Environments

Watermaze

Running experiments

Dark Environemnts

Watermaze

Citing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages