Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to enable many agents share one same policy? #796

Closed
JosephZZ opened this issue Jan 14, 2023 · 5 comments
Closed

How to enable many agents share one same policy? #796

JosephZZ opened this issue Jan 14, 2023 · 5 comments
Labels
question Further information is requested

Comments

@JosephZZ
Copy link

It seems that each agent is forced to hold a distinct policy network now. But in my project, since we have a lot of agents, we want to have multiple agents to share one network.

For example, we have 200 good agents and 200 opponent agents in the environment. Each agent has only partial observation.
We want to have each group (200 agents) to use the same network to infer actions based on their distinct observation. When we train the network, all 200 agents' experience are put together as if they are only one agent's experience.

Is there anyway to do that?

(some background: we are using MAgents2 environment to be specific. This environment is now taken care by the foundation that supports PettingZoo, so the env has been integrated with PettingZoo. And it seems that tianshou is the only framework that integrates with PettingZoo too for now. So naturally we want to use Tianshou to work on our project)

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Jan 15, 2023

Yes but it requires some extra work to do it.

gold_agents = [GoldAgent(...) for _ in range(200)]
# treat gold_agents as part of env, and opponent_agents as one agent, 
# add extra dimension `agent_id` in obs/act/rew/done/info
env = MAgents2Wrapper(gold_agents=gold_agents, ...)

actor = Actor(...)
critic = Critic(...)
mapolicy = MAPPOPolicy(actor, critic, ...)
collector = Collector(mapolicy, env, ...)

For MAPPOPolicy or whatever policy:

  • First, treat it as one policy, all interfaces are the same as single-agent case;
  • Override forward and learn function: the input data contains extra dim agent_id as stated above; you need to handle this manually
  • Maybe use super().forward(...) and super().learn(...) to make your life easier

@Trinkle23897 Trinkle23897 added the question Further information is requested label Jan 15, 2023
@JosephZZ
Copy link
Author

Thanks for your reply!
Where to get GoldAgent and Magents2Wrapper? I did not find it by searching in the document and the code base somehow.

@Trinkle23897
Copy link
Collaborator

It's just a pseudocode :)

@RaffaeleGalliera
Copy link

I started using Tianshou recently and I was wondering about something similar, too!
@Trinkle23897 mentioned that some extra care for the agent_id dimension should be done in case we want to implement MAPPO or whatever policy. Would it be something similar to what MAPM currently does to handle it in its implementation?

Also, let's say we are interested in a vanilla case, where we just want all the agents to use the same policy (eg. a DQNPolicy), would it be correct to just use MAPM as is by passing a list of policies containing the same reference to the same defined policy? (eg. MultiAgentPolicyManager([policy for _ in range(num_agents)], env)), by looking at the MAPM implementation it kind of sounds correct to me, but I might be missing something.

I'm asking because I went through various issues this morning such as #121 #136 #197. I thought the approach for policy sharing I mentioned above would have been correct, but after going through the issues it's a little bit unclear to me how the current situation actually stands.

Thanks for your work and help guys!

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Jan 18, 2023

MAPM can be the upper layer of shared policy but that would introduce extra overheads and less flexible. If you want to mix all available and trainable agent data in a single batch to update a share-weight agent, you need to write something similar to #796 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants