You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently I'm trying to reproduce the result in pointmaze environment. I am wondering why there is a negation in visualize_reward function in vis/maze_vis.py (line 144).
Also, I would like to know whether only_expert_state option works in pointmaze environment. If so, is there a suitable set of hyperparameters for pointmaze environment? Thank you!
The text was updated successfully, but these errors were encountered:
Earlier in our experiments, we were using rewards as gamma * V(s') - Q(s, a) i.e. the negation of how it's defined in the paper. We changed this later, but the policy used for visualization was trained using this and so an extra negation is present.
(For training a new policy and evaluating this is not needed)
These are some hyperparams that should work well: agent=sac agent.actor_lr=3e-05 agent.init_temperature=0.01 agent.learnable_temperature=False env=pointmaze_right method.loss=v0 method.regularize=True num_actor_updates=4 num_seed_steps=0 only_expert_states=True train.batch=256 train.use_target=True
Hi, thanks for sharing your work.
Currently I'm trying to reproduce the result in pointmaze environment. I am wondering why there is a negation in
visualize_reward
function invis/maze_vis.py
(line 144).Also, I would like to know whether
only_expert_state
option works in pointmaze environment. If so, is there a suitable set of hyperparameters for pointmaze environment? Thank you!The text was updated successfully, but these errors were encountered: