Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue on reproducing pointmaze experiments #4

Closed
wognl0402 opened this issue Mar 1, 2022 · 1 comment
Closed

Issue on reproducing pointmaze experiments #4

wognl0402 opened this issue Mar 1, 2022 · 1 comment

Comments

@wognl0402
Copy link

wognl0402 commented Mar 1, 2022

Hi, thanks for sharing your work.

Currently I'm trying to reproduce the result in pointmaze environment. I am wondering why there is a negation in
visualize_reward function in vis/maze_vis.py (line 144).

Also, I would like to know whether only_expert_state option works in pointmaze environment. If so, is there a suitable set of hyperparameters for pointmaze environment? Thank you!

@Div99
Copy link
Owner

Div99 commented Mar 8, 2022

Earlier in our experiments, we were using rewards as gamma * V(s') - Q(s, a) i.e. the negation of how it's defined in the paper. We changed this later, but the policy used for visualization was trained using this and so an extra negation is present.
(For training a new policy and evaluating this is not needed)

These are some hyperparams that should work well:
agent=sac agent.actor_lr=3e-05 agent.init_temperature=0.01 agent.learnable_temperature=False env=pointmaze_right method.loss=v0 method.regularize=True num_actor_updates=4 num_seed_steps=0 only_expert_states=True train.batch=256 train.use_target=True

@Div99 Div99 closed this as completed Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants