Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I have some questions about the adjustment of experiment parameters. #8

Closed
chillybird opened this issue May 20, 2022 · 3 comments
Closed

Comments

@chillybird
Copy link

I ran the default experiment "ant-v2, 2x4" and used the default parameters to get the results in the first picture. Later, I modified the parameters (n_rollout_threads: 24, num_mini_batch: 4, ppo_epoch: 40) and got the results in the second picture.
image
I have also made other modifications to the experimental parameters, but I have not achieved the performance shown in the article in the experiment of "ant-v2, 2x4" provided by the code.
So I want to ask if there are some rules or skills in parameter adjustment of HAPPO / HATRPO algorithm.

@chillybird
Copy link
Author

chillybird commented May 20, 2022

I also experimented with the pybullet.minitaur environment(the return of episode with good results is about 6.5). The first picture is the result of using HAPPO, and the latter is the result of using MAPPO.
image
The red curve in the first picture represents entropy and the blue one represents reward

@chillybird chillybird changed the title I have some questions about the adjustment of experimental parameters. I have some questions about the adjustment of experiment parameters. May 20, 2022
@cyanrain7
Copy link
Owner

I also experimented with the pybullet.minitaur environment(the return of episode with good results is about 6.5). The first picture is the result of using HAPPO, and the latter is the result of using MAPPO. image The red curve in the first picture represents entropy and the blue one represents reward

Well, if you want to get better performance on Mujoco by using happo, you can adjust the learning rate of actor as 5e-4 (--lr 5e-4) and modify the num_mini_batch as 1 (--num_mini_batch 1 ), the performance of happo is underestimated by our origin experiment, hope this can help you! (the other parameter not need to modify)

@chillybird
Copy link
Author

I also experimented with the pybullet.minitaur environment(the return of episode with good results is about 6.5). The first picture is the result of using HAPPO, and the latter is the result of using MAPPO. image The red curve in the first picture represents entropy and the blue one represents reward

Well, if you want to get better performance on Mujoco by using happo, you can adjust the learning rate of actor as 5e-4 (--lr 5e-4) and modify the num_mini_batch as 1 (--num_mini_batch 1 ), the performance of happo is underestimated by our origin experiment, hope this can help you! (the other parameter not need to modify)

Thank you for your reply. I will use these parameters for the next experiment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants