-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Off-policy training #111
Comments
@jordis-ai2 @klemenkotar -- it might be good to have some discussion about this issue before starting on it to see what our various needs are (e.g. does ALFRED require off-policy training in a certain way?) |
I'll try to port https://github.com/allenai/hex-embodied-rl/blob/advisor_into_master/projects/babyai_baselines/experiments/go_to_local/pure_offpolicy.py before release... |
From Luca:
and then |
Feel free to make improvements as you see them: the current implementation is far from perfect so I would not be unhappy if you changed the API. Also of note: off-policy might be a bit tricky when considering the distributed setting (as you'd presumably need some way of partitioning the off-policy dataset into different chunks to be used by the various processes). Not sure what the best solution for this is. |
Problem
The ADVISOR code-base has support for interleaving off-policy updates (from an arbitrary pytorch dataset and with arbitrary losses) with on-policy updates. It would be great to have similar capabilities here. In particular, we should be able to:
Solution
This requires:
Possible issues:
Dependencies
None
The text was updated successfully, but these errors were encountered: