Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couldn't reproduce the result on Mujoco suite. #6

Closed
sweetice opened this issue Jul 20, 2020 · 1 comment
Closed

Couldn't reproduce the result on Mujoco suite. #6

sweetice opened this issue Jul 20, 2020 · 1 comment

Comments

@sweetice
Copy link

sweetice commented Jul 20, 2020

Couldn't reproduce the result on the Mujoco suite.
Setting: We run the BEAR with the recommend settings: ** mmd_sigma = 20.0 , kernel_type = gaussian , num_samples_match = 5 , version = 0 or 2 , lagrange_thresh = 10.0 , `mode = auto**
The batch dataset is produced by training a DDPG agent for 1 million time steps. For reproducing, we use the DDPG code in BCQ repository.

We utilize the final buffer setting in BCQ paper.
Here are the whole results.
Note that the "behavioral" means the evaluation of the DDPG agent when training.

Sweetice 0720 update: Upload the png file for easily reading.
Ant-v2
HalfCheetah-v2
Hopper-v2
InvertedDoublePendulum-v2
InvertedPendulum-v2
Reacher-v2
Swimmer-v2
Walker2d-v2

For more clear reading.
Ant-v2.pdf
HalfCheetah-v2.pdf
Hopper-v2.pdf
InvertedDoublePendulum-v2.pdf
InvertedPendulum-v2.pdf
Reacher-v2.pdf
Swimmer-v2.pdf
Walker2d-v2.pdf

@aviralkumar2907
Copy link
Owner

Hi! In the BEAR paper, we didn't test on the final buffer setting, so I am not sure if the hyperparameters are optimal for this setting as well. I would also recommend trying out the new cleaned up implementation of BEAR, where we have searched over hyperparameters: https://github.com/rail-berkeley/d4rl_evaluations and would also recommend using the D4RL datasets (https://github.com/rail-berkeley/d4rl).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants