Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Off-policy training #111

Closed
Lucaweihs opened this issue Aug 5, 2020 · 4 comments · Fixed by #144
Closed

Off-policy training #111

Lucaweihs opened this issue Aug 5, 2020 · 4 comments · Fixed by #144
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@Lucaweihs
Copy link
Collaborator

Problem

The ADVISOR code-base has support for interleaving off-policy updates (from an arbitrary pytorch dataset and with arbitrary losses) with on-policy updates. It would be great to have similar capabilities here. In particular, we should be able to:

  1. Define a pipeline stage so that it performs some fixed number of off-policy updates.
  2. Allow for off-policy updates to be interleaved with on-policy updates.

Solution

This requires:

  • A way to specify how this training will occur (e.g. defining off-policy losses and updating the pipeline stage to allow for these types of losses + dataset).
  • Updating the runner/light_engine to implement these types of updates.

Possible issues:

  • Currently we log based on the number of rollout steps. This will not be reasonable if we allow a stage to do purely off-policy updates.

Dependencies

None

@Lucaweihs
Copy link
Collaborator Author

@jordis-ai2 @klemenkotar -- it might be good to have some discussion about this issue before starting on it to see what our various needs are (e.g. does ALFRED require off-policy training in a certain way?)

@anikem anikem added this to the 0.1 milestone Aug 12, 2020
@jordis-ai2
Copy link
Collaborator

jordis-ai2 commented Aug 22, 2020

@jordis-ai2 jordis-ai2 linked a pull request Aug 22, 2020 that will close this issue
@jordis-ai2
Copy link
Collaborator

From Luca:
to run this code you'll want to first run

python extensions/rl_babyai/babyai_scripts/download_babyai_expert_demos.py GoToLocal # Downloads the data
python extensions/rl_babyai/babyai_scripts/truncate_expert_demos.py # Makes a small version of the IL dataset you can work with locally

and then
python go_to_local.pure_offpolicy --experiment_base projects/babyai_baselines/experiments

@Lucaweihs
Copy link
Collaborator Author

Lucaweihs commented Aug 22, 2020

Feel free to make improvements as you see them: the current implementation is far from perfect so I would not be unhappy if you changed the API. Also of note: off-policy might be a bit tricky when considering the distributed setting (as you'd presumably need some way of partitioning the off-policy dataset into different chunks to be used by the various processes). Not sure what the best solution for this is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants