Munchausen RL #1221

Haichao-Zhang · 2022-03-09T00:19:47Z

Implementation of the Munchausen RL.

Tested on Breakout. The reward-weight is an import hyper-parameter for tuning.
Currently, it seems that with a proper weight (e.g. 0.5), there are some gain initially, but eventually similar to SAC.

emailweixu · 2022-03-09T01:12:58Z

Are there any experiment results?

alf/algorithms/sac_algorithm.py

Haichao-Zhang · 2022-03-09T01:18:44Z

Are there any experiment results?

When submitting the PR, only tested on pendulum for sanity check. On pendulum, no clear difference compared to SAC.
I'm running more experiments. Will post some here when available.

alf/algorithms/sac_algorithm.py

hnyu · 2022-03-10T18:25:02Z

alf/algorithms/sac_algorithm.py

@@ -267,6 +273,11 @@ def __init__(self,
            critic_network_cls, q_network_cls)

        self._use_entropy_reward = use_entropy_reward
+        self._munchausen_reward_weight = max(0, munchausen_reward_weight)
+        if munchausen_reward_weight > 0:
+            assert not normalize_entropy_reward, (


normalize_entropy_reward has no conflict with munchausen_reward_weight > 0?

hnyu · 2022-03-10T18:23:57Z

alf/algorithms/sac_algorithm.py

+                        (sum(nest.flatten(log_pi_rollout_a[0])),
+                         sum(nest.flatten(log_pi_rollout_a[1]))))
+                else:
+                    log_pi_rollout_a = sum(nest.flatten(log_pi_rollout_a))


Why do you want to first add log_pi here? (Unless for each log_alpha you have multiple log_pis)

Haichao-Zhang requested a review from emailweixu March 9, 2022 00:20

Munchausen RL

28f1cef

Haichao-Zhang force-pushed the Munchausen_RL branch from 749e344 to 28f1cef Compare March 9, 2022 01:07

hnyu reviewed Mar 9, 2022

View reviewed changes

alf/algorithms/sac_algorithm.py Show resolved Hide resolved

hnyu reviewed Mar 9, 2022

View reviewed changes

alf/algorithms/sac_algorithm.py Show resolved Hide resolved

Haichao-Zhang force-pushed the Munchausen_RL branch 2 times, most recently from fd151e6 to 4866283 Compare March 9, 2022 01:37

Fix alignment

b330630

Haichao-Zhang force-pushed the Munchausen_RL branch from 4866283 to b330630 Compare March 9, 2022 01:43

hnyu requested changes Mar 10, 2022

View reviewed changes

hnyu reviewed Mar 10, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Munchausen RL #1221

Munchausen RL #1221

Haichao-Zhang commented Mar 9, 2022 •

edited

Loading

emailweixu commented Mar 9, 2022

Haichao-Zhang commented Mar 9, 2022 •

edited

Loading

hnyu Mar 10, 2022

hnyu Mar 10, 2022

Munchausen RL #1221

Are you sure you want to change the base?

Munchausen RL #1221

Conversation

Haichao-Zhang commented Mar 9, 2022 • edited Loading

emailweixu commented Mar 9, 2022

Haichao-Zhang commented Mar 9, 2022 • edited Loading

hnyu Mar 10, 2022

Choose a reason for hiding this comment

hnyu Mar 10, 2022

Choose a reason for hiding this comment

Haichao-Zhang commented Mar 9, 2022 •

edited

Loading

Haichao-Zhang commented Mar 9, 2022 •

edited

Loading