Issue on reproducing pointmaze experiments #4

wognl0402 · 2022-03-01T12:10:52Z

Hi, thanks for sharing your work.

Currently I'm trying to reproduce the result in pointmaze environment. I am wondering why there is a negation in
visualize_reward function in vis/maze_vis.py (line 144).

Also, I would like to know whether only_expert_state option works in pointmaze environment. If so, is there a suitable set of hyperparameters for pointmaze environment? Thank you!

The text was updated successfully, but these errors were encountered:

Div99 · 2022-03-08T06:16:50Z

Earlier in our experiments, we were using rewards as gamma * V(s') - Q(s, a) i.e. the negation of how it's defined in the paper. We changed this later, but the policy used for visualization was trained using this and so an extra negation is present.
(For training a new policy and evaluating this is not needed)

These are some hyperparams that should work well:
agent=sac agent.actor_lr=3e-05 agent.init_temperature=0.01 agent.learnable_temperature=False env=pointmaze_right method.loss=v0 method.regularize=True num_actor_updates=4 num_seed_steps=0 only_expert_states=True train.batch=256 train.use_target=True

Div99 closed this as completed Mar 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue on reproducing pointmaze experiments #4

Issue on reproducing pointmaze experiments #4

wognl0402 commented Mar 1, 2022 •

edited

Loading

Div99 commented Mar 8, 2022

Issue on reproducing pointmaze experiments #4

Issue on reproducing pointmaze experiments #4

Comments

wognl0402 commented Mar 1, 2022 • edited Loading

Div99 commented Mar 8, 2022

wognl0402 commented Mar 1, 2022 •

edited

Loading