How to enable many agents share one same policy? #796

JosephZZ · 2023-01-14T01:09:22Z

It seems that each agent is forced to hold a distinct policy network now. But in my project, since we have a lot of agents, we want to have multiple agents to share one network.

For example, we have 200 good agents and 200 opponent agents in the environment. Each agent has only partial observation.
We want to have each group (200 agents) to use the same network to infer actions based on their distinct observation. When we train the network, all 200 agents' experience are put together as if they are only one agent's experience.

Is there anyway to do that?

(some background: we are using MAgents2 environment to be specific. This environment is now taken care by the foundation that supports PettingZoo, so the env has been integrated with PettingZoo. And it seems that tianshou is the only framework that integrates with PettingZoo too for now. So naturally we want to use Tianshou to work on our project)

Trinkle23897 · 2023-01-15T17:58:16Z

Yes but it requires some extra work to do it.

gold_agents = [GoldAgent(...) for _ in range(200)]
# treat gold_agents as part of env, and opponent_agents as one agent, 
# add extra dimension `agent_id` in obs/act/rew/done/info
env = MAgents2Wrapper(gold_agents=gold_agents, ...)

actor = Actor(...)
critic = Critic(...)
mapolicy = MAPPOPolicy(actor, critic, ...)
collector = Collector(mapolicy, env, ...)

For MAPPOPolicy or whatever policy:

First, treat it as one policy, all interfaces are the same as single-agent case;
Override forward and learn function: the input data contains extra dim agent_id as stated above; you need to handle this manually
Maybe use super().forward(...) and super().learn(...) to make your life easier

JosephZZ · 2023-01-15T19:23:51Z

Thanks for your reply!
Where to get GoldAgent and Magents2Wrapper? I did not find it by searching in the document and the code base somehow.

Trinkle23897 · 2023-01-15T19:24:56Z

It's just a pseudocode :)

RaffaeleGalliera · 2023-01-17T18:35:42Z

I started using Tianshou recently and I was wondering about something similar, too!
@Trinkle23897 mentioned that some extra care for the agent_id dimension should be done in case we want to implement MAPPO or whatever policy. Would it be something similar to what MAPM currently does to handle it in its implementation?

Also, let's say we are interested in a vanilla case, where we just want all the agents to use the same policy (eg. a DQNPolicy), would it be correct to just use MAPM as is by passing a list of policies containing the same reference to the same defined policy? (eg. MultiAgentPolicyManager([policy for _ in range(num_agents)], env)), by looking at the MAPM implementation it kind of sounds correct to me, but I might be missing something.

I'm asking because I went through various issues this morning such as #121 #136 #197. I thought the approach for policy sharing I mentioned above would have been correct, but after going through the issues it's a little bit unclear to me how the current situation actually stands.

Thanks for your work and help guys!

Trinkle23897 · 2023-01-18T17:10:05Z

MAPM can be the upper layer of shared policy but that would introduce extra overheads and less flexible. If you want to mix all available and trainable agent data in a single batch to update a share-weight agent, you need to write something similar to #796 (comment)

Trinkle23897 added the question Further information is requested label Jan 15, 2023

Trinkle23897 closed this as completed Mar 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to enable many agents share one same policy? #796

How to enable many agents share one same policy? #796

JosephZZ commented Jan 14, 2023

Trinkle23897 commented Jan 15, 2023 •

edited

JosephZZ commented Jan 15, 2023

Trinkle23897 commented Jan 15, 2023

RaffaeleGalliera commented Jan 17, 2023

Trinkle23897 commented Jan 18, 2023 •

edited

How to enable many agents share one same policy? #796

How to enable many agents share one same policy? #796

Comments

JosephZZ commented Jan 14, 2023

Trinkle23897 commented Jan 15, 2023 • edited

JosephZZ commented Jan 15, 2023

Trinkle23897 commented Jan 15, 2023

RaffaeleGalliera commented Jan 17, 2023

Trinkle23897 commented Jan 18, 2023 • edited

Trinkle23897 commented Jan 15, 2023 •

edited

Trinkle23897 commented Jan 18, 2023 •

edited