New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to enable many agents share one same policy? #796
Comments
Yes but it requires some extra work to do it. gold_agents = [GoldAgent(...) for _ in range(200)]
# treat gold_agents as part of env, and opponent_agents as one agent,
# add extra dimension `agent_id` in obs/act/rew/done/info
env = MAgents2Wrapper(gold_agents=gold_agents, ...)
actor = Actor(...)
critic = Critic(...)
mapolicy = MAPPOPolicy(actor, critic, ...)
collector = Collector(mapolicy, env, ...) For
|
Thanks for your reply! |
It's just a pseudocode :) |
I started using Tianshou recently and I was wondering about something similar, too! Also, let's say we are interested in a vanilla case, where we just want all the agents to use the same policy (eg. a DQNPolicy), would it be correct to just use MAPM as is by passing a list of policies containing the same reference to the same defined policy? (eg. I'm asking because I went through various issues this morning such as #121 #136 #197. I thought the approach for policy sharing I mentioned above would have been correct, but after going through the issues it's a little bit unclear to me how the current situation actually stands. Thanks for your work and help guys! |
MAPM can be the upper layer of shared policy but that would introduce extra overheads and less flexible. If you want to mix all available and trainable agent data in a single batch to update a share-weight agent, you need to write something similar to #796 (comment) |
It seems that each agent is forced to hold a distinct policy network now. But in my project, since we have a lot of agents, we want to have multiple agents to share one network.
For example, we have 200 good agents and 200 opponent agents in the environment. Each agent has only partial observation.
We want to have each group (200 agents) to use the same network to infer actions based on their distinct observation. When we train the network, all 200 agents' experience are put together as if they are only one agent's experience.
Is there anyway to do that?
(some background: we are using MAgents2 environment to be specific. This environment is now taken care by the foundation that supports PettingZoo, so the env has been integrated with PettingZoo. And it seems that tianshou is the only framework that integrates with PettingZoo too for now. So naturally we want to use Tianshou to work on our project)
The text was updated successfully, but these errors were encountered: