Multi-agent vectorized environment support for gym wrapper. #4120

abhayraw1 · 2020-06-13T19:00:32Z

Is your feature request related to a problem? Please describe.
I currently installed ml-agents to use for a research project. My use case is a multi-agent scenario involving coordination amongst the agents. The gym wrapper however only allows for single agents to be used as of now.

Also having a bunch of environments in parallel that can be handled by the gym wrapper would be awesome!

Describe the solution you'd like
Currently the low level python API, the UnityEnvironment class provides access to multiple agents. Exposing this to the gym wrapper would be really helpful. One of the implementations for a multiagent environment that I personally like was in ray-rllib here

For multiple environments, a vectorized approach could work like the openAI's VecEnv link

As I am going to be developing workarounds for my project, I would like to contribute towards this goal. As of now I am going to developing the solution to this as close as the ray-rllib 's implementation.
Inputs and critiques are welcome!

The text was updated successfully, but these errors were encountered:

Hsgngr · 2020-06-14T00:37:32Z

Hi, ray-rllib link is broken

abhayraw1 · 2020-06-14T12:18:50Z

@Hsgngr I updated the link!!

abhayraw1 · 2020-06-14T12:24:05Z

As of now I have zeroed in to this snippet of code that raises an exception whenever the number of agents is greater than one in the UnityEnvironment

ml-agents/gym-unity/gym_unity/envs/__init__.py

Lines 267 to 272 in 3c2fa4d

    
           @staticmethod 
        
           def _check_agents(n_agents: int) -> None: 
        
               if n_agents > 1: 
        
                   raise UnityGymException( 
        
                       f"There can only be one Agent in the environment but {n_agents} were detected." 
        
                   )

Is there any reason why this is the norm? Is it just to make sure that the environments are compatible with the standard RL libraries like OpenAI's baselines and Dopamine?

abhayraw1 · 2020-06-15T21:58:31Z

Hi @xiaomaogy,
So I've currently managed to get the data from multiple agents by bypassing the above mentioned check. For stepping I change the following line

ml-agents/gym-unity/gym_unity/envs/__init__.py

Line 169 in 3c2fa4d

action = np.array(action).reshape((1, spec.action_size))

using (-1, action_size) instead of (1, action_size). The check for whether the number of agents match is done in the set_actions method of the UnityEnvironment class so I didn't enforce any checks as of now.

ml-agents/ml-agents-envs/mlagents_envs/environment.py

Lines 338 to 345 in 20527d1

    
           expected_type = np.float32 if spec.is_action_continuous() else np.int32 
        
           expected_shape = (len(self._env_state[behavior_name][0]), spec.action_size) 
        
           if action.shape != expected_shape: 
        
               raise UnityActionException( 
        
                   "The behavior {0} needs an input of dimension {1} for " 
        
                   "(<number of agents>, <action size>) but received input of " 
        
                   "dimension {2}".format(behavior_name, expected_shape, action.shape) 
        
               )

The issue that I am facing now is that when the episode ends, I only get the observations from the agent that is responsible for termination.

My use case however is quite different. I want the episode to be agent dependent, and even if some agent might "die", the rest of the agent should continue. The dead agent would spawn somewhere else in the map! Is this achievable? And could you give me some pointers on some of the possible pitfalls that I should look out for. Thanks in advance!!

abhayraw1 · 2020-06-15T22:14:58Z

This snippet is actually responsible for sending the observations when some agent reaches its terminal condition.

ml-agents/gym-unity/gym_unity/envs/__init__.py

Lines 175 to 180 in 3c2fa4d

    
           if len(terminal_step) != 0: 
        
               # The agent is done 
        
               self.game_over = True 
        
               return self._single_step(terminal_step) 
        
           else: 
        
               return self._single_step(decision_step)

In a multi-agent/vectorized setting would it be okay to return the observations/rewards/dones considering both decision_step and terminal_step rather than only one?

P.S.: One workaround that I can think of for my particular use case is to not call the EndEpisode() method in the C# script for the agents. But then, I do need the information whether the agent terminates or not. I don't know if that makes sense!

laszukdawid · 2020-11-15T18:49:22Z

In a similar question to the original poster: is there any reason why the multi agent isn't supported? Code changes to make the gym-like API support isn't difficult so I'm trying figure out whether I'm missing something or this is purely conceptual difficulty.

laszukdawid · 2020-11-18T04:31:16Z

Since I need this for my own purpose, I've added my own wrapper (based on Unity's wrapper) which can be found here https://github.com/laszukdawid/ai-traineree/blob/master/ai_traineree/tasks.py#L101 (or with associated commit laszukdawid/ai-traineree@39dcf31).

I'd appreciate any reply from Unity's team. I'm planning on adding more support for Multi Agent use cases and wouldn't mind contributing a bit.

Ademord · 2021-06-05T15:13:28Z

@laszukdawid can you provide a simple collab to learn how to use your wrapper? i am in the dark with the Python API and Gym Wrapper's outdated documentation.

laszukdawid · 2021-06-06T00:24:26Z

For log continuuity:
I replied to Ademord on an issue they created in my deep reinforcement learning repository. I'm happy to assist with things I can assist.

dynamicwebpaige · 2022-04-11T22:18:23Z

Are there any updates for this issue? It would be great to see support for Ray's RLlib in ML Agents - particularly multi-agent reinforcement learning.

xcao65 · 2022-04-12T20:28:22Z

Sorry, we are not currently supporting multi-agent vectorized env for gym wrapper.

dynamicwebpaige · 2022-04-13T04:12:01Z

Understood, and thanks for the update, @xcao65!

github-actions · 2022-11-04T20:02:48Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

abhayraw1 added the request Issue contains a feature request. label Jun 13, 2020

xiaomaogy self-assigned this Jun 15, 2020

vincentpierre assigned andrewcoh and unassigned xiaomaogy Oct 16, 2020

miguelalonsojr closed this as completed Oct 5, 2022

github-actions bot locked as resolved and limited conversation to collaborators Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-agent vectorized environment support for gym wrapper. #4120

Multi-agent vectorized environment support for gym wrapper. #4120

abhayraw1 commented Jun 13, 2020 •

edited

Hsgngr commented Jun 14, 2020

abhayraw1 commented Jun 14, 2020

abhayraw1 commented Jun 14, 2020

abhayraw1 commented Jun 15, 2020

abhayraw1 commented Jun 15, 2020

laszukdawid commented Nov 15, 2020

laszukdawid commented Nov 18, 2020

Ademord commented Jun 5, 2021

laszukdawid commented Jun 6, 2021

dynamicwebpaige commented Apr 11, 2022

xcao65 commented Apr 12, 2022

dynamicwebpaige commented Apr 13, 2022

github-actions bot commented Nov 4, 2022

Multi-agent vectorized environment support for gym wrapper. #4120

Multi-agent vectorized environment support for gym wrapper. #4120

Comments

abhayraw1 commented Jun 13, 2020 • edited

Hsgngr commented Jun 14, 2020

abhayraw1 commented Jun 14, 2020

abhayraw1 commented Jun 14, 2020

abhayraw1 commented Jun 15, 2020

abhayraw1 commented Jun 15, 2020

laszukdawid commented Nov 15, 2020

laszukdawid commented Nov 18, 2020

Ademord commented Jun 5, 2021

laszukdawid commented Jun 6, 2021

dynamicwebpaige commented Apr 11, 2022

xcao65 commented Apr 12, 2022

dynamicwebpaige commented Apr 13, 2022

github-actions bot commented Nov 4, 2022

abhayraw1 commented Jun 13, 2020 •

edited