I have some questions about the adjustment of experiment parameters. #8

chillybird · 2022-05-20T02:10:28Z

I ran the default experiment "ant-v2, 2x4" and used the default parameters to get the results in the first picture. Later, I modified the parameters (n_rollout_threads: 24, num_mini_batch: 4, ppo_epoch: 40) and got the results in the second picture.

I have also made other modifications to the experimental parameters, but I have not achieved the performance shown in the article in the experiment of "ant-v2, 2x4" provided by the code.
So I want to ask if there are some rules or skills in parameter adjustment of HAPPO / HATRPO algorithm.

chillybird · 2022-05-20T02:21:51Z

I also experimented with the pybullet.minitaur environment(the return of episode with good results is about 6.5). The first picture is the result of using HAPPO, and the latter is the result of using MAPPO.

The red curve in the first picture represents entropy and the blue one represents reward

cyanrain7 · 2022-05-20T02:29:29Z

I also experimented with the pybullet.minitaur environment(the return of episode with good results is about 6.5). The first picture is the result of using HAPPO, and the latter is the result of using MAPPO. The red curve in the first picture represents entropy and the blue one represents reward

Well, if you want to get better performance on Mujoco by using happo, you can adjust the learning rate of actor as 5e-4 (--lr 5e-4) and modify the num_mini_batch as 1 (--num_mini_batch 1 ), the performance of happo is underestimated by our origin experiment, hope this can help you! (the other parameter not need to modify)

chillybird · 2022-05-20T03:12:06Z

I also experimented with the pybullet.minitaur environment(the return of episode with good results is about 6.5). The first picture is the result of using HAPPO, and the latter is the result of using MAPPO. The red curve in the first picture represents entropy and the blue one represents reward

Well, if you want to get better performance on Mujoco by using happo, you can adjust the learning rate of actor as 5e-4 (--lr 5e-4) and modify the num_mini_batch as 1 (--num_mini_batch 1 ), the performance of happo is underestimated by our origin experiment, hope this can help you! (the other parameter not need to modify)

Thank you for your reply. I will use these parameters for the next experiment.

chillybird changed the title ~~I have some questions about the adjustment of experimental parameters.~~ I have some questions about the adjustment of experiment parameters. May 20, 2022

cyanrain7 closed this as completed Jul 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I have some questions about the adjustment of experiment parameters. #8

I have some questions about the adjustment of experiment parameters. #8

chillybird commented May 20, 2022

chillybird commented May 20, 2022 •

edited

Loading

cyanrain7 commented May 20, 2022

chillybird commented May 20, 2022

I have some questions about the adjustment of experiment parameters. #8

I have some questions about the adjustment of experiment parameters. #8

Comments

chillybird commented May 20, 2022

chillybird commented May 20, 2022 • edited Loading

cyanrain7 commented May 20, 2022

chillybird commented May 20, 2022

chillybird commented May 20, 2022 •

edited

Loading