-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Munchausen RL #1221
base: pytorch
Are you sure you want to change the base?
Munchausen RL #1221
Conversation
749e344
to
28f1cef
Compare
Are there any experiment results? |
When submitting the PR, only tested on pendulum for sanity check. On pendulum, no clear difference compared to SAC. |
fd151e6
to
4866283
Compare
4866283
to
b330630
Compare
@@ -267,6 +273,11 @@ def __init__(self, | |||
critic_network_cls, q_network_cls) | |||
|
|||
self._use_entropy_reward = use_entropy_reward | |||
self._munchausen_reward_weight = max(0, munchausen_reward_weight) | |||
if munchausen_reward_weight > 0: | |||
assert not normalize_entropy_reward, ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
normalize_entropy_reward has no conflict with munchausen_reward_weight > 0?
(sum(nest.flatten(log_pi_rollout_a[0])), | ||
sum(nest.flatten(log_pi_rollout_a[1])))) | ||
else: | ||
log_pi_rollout_a = sum(nest.flatten(log_pi_rollout_a)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to first add log_pi here? (Unless for each log_alpha you have multiple log_pis)
Implementation of the Munchausen RL.
Tested on Breakout. The reward-weight is an import hyper-parameter for tuning.
Currently, it seems that with a proper weight (e.g. 0.5), there are some gain initially, but eventually similar to SAC.