Multi-agent reinforcement learning algorithms #7

reso1 · 2021-11-18T02:25:18Z

I'm trying to use Isaac Gym for multi-agent reinforcement learning (MARL), is there any future plan on this?

If not, can you give any suggestions on how should I integrate MARL algorithms into Issac Gym? Should I start from rl_games repo or maybe integrate other MARL-supported repos?

Thanks for your help!

Denys88 · 2021-11-30T18:10:45Z

rl_games supports MARL.
You can use independent PPO or with central value.
Starcraft multi agent is the good example: https://github.com/Denys88/rl_games/blob/master/rl_games/envs/smac_env.py
In env you just need to implement this function:

    def get_number_of_agents(self):
        return self.n_agents

If you want to use central_value you need to set
use_central_value = True
and return dict:

{
'obs' : obs,
'state' : state
}

for observations.
In this case obs shape will be (envs_count * agents_count, *) and state shape is (envs_count, *).
You can take a look config example for this use case:
https://github.com/Denys88/rl_games/blob/master/rl_games/configs/smac/3m.yaml

I think if you try to create simple MA env from ant, for example every leg can be agent :) I can help to adapt and test it with rl_games.
Thanks,
Denys

reso1 · 2021-12-01T03:03:04Z

Thanks for your reply and kind advices! I will try to implement this idea in IsaacGym and share further results.

I've read the rl_games MARL part code on SMAC env, it supports MARL indeed. But I have 2 further questions on the MARL algorithm implementation of rl_games, based on my code reading and understanding.

Is the multi-agent PPO implementation in rl_games identical to MAPPO algorithm in this paper?
Can the current rl_games MARL algorithm support heterogeneous agents w/ different action/observation space (i.e. different actor/critic nets)?

Best,
Jingtao

Denys88 · 2021-12-01T18:03:08Z

Almost yes. A little bit other value normalization code. And I didn't apply death masking. it should be easy to make it exact the same.
My goal was to achieve maximum performance on gpu so I didn't add support for cases like this directly. But if you are using discrete action space you can provide different action masks to disable some actions. And right now only different obs space for agent and critic is supported.

reso1 · 2021-12-02T07:52:53Z

Thanks a lot for explanations! I've implemented the MA ant env (each leg corresponds to an agent), however the training result is quite bad for ant, maybe using MAPPO for multi-joints robot control is that suitable? Anyway thanks again!

Denys88 · 2021-12-02T18:23:27Z

Could you try to use central value with a whole observation?
But I believe this case should work fine even with independent ppo. Can you share your env?

reso1 · 2021-12-03T02:57:01Z

Hi Denys, actually I used central value for the task however not getting good results, maybe there are some problems in my code and I'm very happy if you could help to check my code.

I've forked IsaacGymEnvs and committed my MA_Ant env, you can check this link and test it, the registered env task name is MA_Ant.

Apart from the new env class ma_ant, there are also 2 changes in the original IsaacGymEnvs repo:

The ma_ant class is inherited from the ma_vec_task class, where only function allocate_buffers is modified to adapt to the buffer shape changes.
In the get_env_info function in RLGPUEnv class, number of agents and use_central_value are added to the info dict.

Denys88 · 2021-12-03T04:48:49Z

Thanks, Ill play with it tomorrow evening.
Just wondering what best score did you achieve?

reso1 · 2021-12-03T07:13:54Z

The best reward of MA_Ant is below 10 (Ant env is around 4k), but I did not fine-tune training parameters used in MA_Ant, so I don't whether there are some problems in my code, or it's just not suitable to consider each leg as an agent in MAPPO.

Denys88 · 2021-12-04T01:57:07Z

@reso1 I've found bug in your code:
expand_env_ids was buggy, so you actually always returned done=True for 3of4 legs.
I rewrote it in that way to make sure it works:

@torch.jit.script
def expand_env_ids(env_ids, n_agents):
    # type: (Tensor, int) -> Tensor
    device = env_ids.device
    agent_env_ids = torch.zeros((n_agents * len(env_ids)), device=device, dtype=torch.long)
    agent_env_ids[0::4] = env_ids * n_agents + 0
    agent_env_ids[1::4] = env_ids * n_agents + 1
    agent_env_ids[2::4] = env_ids * n_agents + 2
    agent_env_ids[3::4] = env_ids * n_agents + 3
    return agent_env_ids

And I added one more change to the compute_ant_observations:

    obs_idxs = torch.eye(4, dtype=torch.float32, device=self.device)
    obs_idxs = obs_idxs.repeat(state.size()[0], 1)
    obs = state.repeat_interleave(repeats, dim=0)
    obs = torch.cat([obs, obs_idxs], dim=-1)

So legs see a little bit different observations, I think it is not a must for your case but without it maximum reward was 2000.
And achieved 4k+ reward in less than 2 minutes without using central value(any way all legs see the same observations).

As you see movement is not as good and we need some tuning to make it perfect but it works somehow.

reso1 · 2021-12-05T01:22:26Z

Cool, thanks for your kind help!

BTW, if you have any plan on formally integrating MARL on IsaacGym using rl_games, I'm very glad to help it :)

reso1 · 2021-12-22T02:19:21Z

Hi Bryan, the following is the pointer to the MA ant repo, you can check the closed issue in IsaacGymEnvs where I mentioned details about this environment. https://github.com/reso1/IsaacGymEnvs Enjoy! Bryan Chen ***@***.***> 于2021年12月18日周六 10:41写道：

…

Hi @reso1 <https://github.com/reso1>, could you please give me some pointers if I wanted to fork your MA ant repo to add a simple mat to the environment? In particular I would like to recreate the RoboSumo environment https://github.com/openai/robosumo . Thank you! — Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFSEU4OXBVA4XC3YTXIZMT3URPYGFANCNFSM5IIPTREA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

Frank-Dz · 2022-06-15T12:15:05Z

Hi Bryan, the following is the pointer to the MA ant repo, you can check the closed issue in IsaacGymEnvs where I mentioned details about this environment. https://github.com/reso1/IsaacGymEnvs Enjoy! Bryan Chen @.> 于2021年12月18日周六 10:41写道：
…
Hi @reso1 https://github.com/reso1, could you please give me some pointers if I wanted to fork your MA ant repo to add a simple mat to the environment? In particular I would like to recreate the RoboSumo environment https://github.com/openai/robosumo . Thank you! — Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFSEU4OXBVA4XC3YTXIZMT3URPYGFANCNFSM5IIPTREA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you were mentioned.Message ID: @.>

Hi ~ Are you going to share the repo? I am excited to play with it:D

reso1 closed this as completed Dec 5, 2021

Asad-Shahid mentioned this issue Apr 12, 2023

MAPPO with isaac gym Denys88/rl_games#234

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-agent reinforcement learning algorithms #7

Multi-agent reinforcement learning algorithms #7

reso1 commented Nov 18, 2021

Denys88 commented Nov 30, 2021

reso1 commented Dec 1, 2021 •

edited

Denys88 commented Dec 1, 2021 •

edited

reso1 commented Dec 2, 2021

Denys88 commented Dec 2, 2021

reso1 commented Dec 3, 2021

Denys88 commented Dec 3, 2021

reso1 commented Dec 3, 2021

Denys88 commented Dec 4, 2021 •

edited

reso1 commented Dec 5, 2021

reso1 commented Dec 22, 2021 via email

Frank-Dz commented Jun 15, 2022

Multi-agent reinforcement learning algorithms #7

Multi-agent reinforcement learning algorithms #7

Comments

reso1 commented Nov 18, 2021

Denys88 commented Nov 30, 2021

reso1 commented Dec 1, 2021 • edited

Denys88 commented Dec 1, 2021 • edited

reso1 commented Dec 2, 2021

Denys88 commented Dec 2, 2021

reso1 commented Dec 3, 2021

Denys88 commented Dec 3, 2021

reso1 commented Dec 3, 2021

Denys88 commented Dec 4, 2021 • edited

reso1 commented Dec 5, 2021

reso1 commented Dec 22, 2021 via email

Frank-Dz commented Jun 15, 2022

reso1 commented Dec 1, 2021 •

edited

Denys88 commented Dec 1, 2021 •

edited

Denys88 commented Dec 4, 2021 •

edited