-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adgudime/async sac #39
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just have one question. otherwise. looks good to me! Thank you @AdityaGudimella!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just took a quick look, so maybe I missed important details, but my thought is the following:
- Is there a simplest way to have this flag available without duplicating all these code? (I think it's possible).
- Why do you want to turn off the prioritized replay in first place?
- Why is this reducing the sensitivity to episode horizon?
rllib/agents/sac/apex.py
Outdated
@@ -41,6 +69,243 @@ | |||
# __sphinx_doc_end__ | |||
# yapf: enable | |||
|
|||
|
|||
class LocalVanillaReplayBuffer(LocalReplayBuffer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference here with the LocalReplayBuffer beside not including the weights and the indices in the Batch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The three changes are
- PrioritizedReplayBuffer vs VanillaReplayBuffer
- SampleBatch containing different elements
- Stats not containing update_priorities time.
rllib/agents/sac/apex.py
Outdated
"rewards": rewards, | ||
"new_obs": obses_tp1, | ||
"dones": dones, | ||
# "weights": weights, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we excluding the weights and the indices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you want is to avoid the update_priorities method to perform its job, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. update_priorities should not do anything, but also, weights and batch_idxs are not returned by the vanilla replay buffer. These are only returned by the PrioritizedReplayBuffer.
@Edilmo First, Tuning "no-done-at-end" shouldn't be the right way to go. Two major reasons: Second, back to question Why turn-off prioritized replay buffer and why reducing the sensitivity to episode horizon? This high-level details may not give you full understanding, if you don't have deep understanding on the energy based RL algorithm, prioritized replay buffer, etc. So I listed all recommended papers to help deeply understand:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Great Finding guys!!!
I think you totally misunderstood my comments. They are just about avoid code duplication. |
Why are these changes needed?
Fixes Bug in execution plan where setting prioritized_replay in config to False does not actually turn it off.
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.