[Draft] PettingZoo Support #45

jkterry1 · 2021-09-12T18:05:04Z

This is a nonworking draft of adding support for the PettingZoo API for multi-agent RL. This will make it able to be used with a bunch of other multi-agent libraries (SB3 and similar via SuperSuit, the ALL, Tianshou, etc.) RLlib also has native PettingZoo support too. Feel free to take a look and hopefully finish this in the next week.

jkterry1 · 2021-09-12T18:40:12Z

This PR also adds basic CI

LucasAlegre · 2021-09-25T19:37:14Z

@LucasAlegre

Also, in the current structure, the pettingzoo class inherits from the RLlib env class. This is not ideal because it means that people who are training on these environments with libraries other than RLlib (like me) still have it install it, and RLlib is an incredibly onerous dependency. Given that RLlib has full pettingzoo support (as do a lot of other MARL libraries), how would you feel about only having a pettingzoo environment and just changing the examples to use the pettingzoo class instead of having a pettingzoo specific example?

That is something I was already thinking about!
In my last commit 1cbc8ea I removed the rllib dependence. I also made the agents act synchronously, as I think it is better this way.

LucasAlegre · 2021-09-25T19:41:08Z

@jkterry1 pytest is failing because of:

/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/pettingzoo/utils/save_observation.py:5: in <module>
    from PIL import Image
E   ModuleNotFoundError: No module named 'PIL'

I guess you should add PIL as pettinzoo dependence.

jkterry1 · 2021-09-26T22:08:22Z

@LucasAlegre

Thanks a ton! A few things:

-The PIL issue will be fixed in the next pettingzoo release (in a few weeks), can you please just add it as a dependency for now so tests properly function?
-Did you see my comment about the expected behavior of rewards and observations above?
-What do you mean by "I also made the agents act synchronously, as I think it is better this way."?

LucasAlegre · 2021-09-27T01:32:59Z

@LucasAlegre

Thanks a ton! A few things:

-The PIL issue will be fixed in the next pettingzoo release (in a few weeks), can you please just add it as a dependency for now so tests properly function?

Done!

-Did you see my comment about the expected behavior of rewards and observations above?
-What do you mean by "I also made the agents act synchronously, as I think it is better this way."?

I made all traffic signal agents act synchronously every 'delta' sumo time-steps. Then, if an agent can't change phase because it has not passed 'min_green' seconds yet, it will just keep the same phase no matter the action is. It solves the problem of agents popping in and out of life, and thus addresses your comment about the expected behavior of rewards and observations.

jkterry1 · 2021-09-27T03:22:12Z

That sounds great, but just to confirm, what's the observation and reward for a light that can't change phase?

I'll run a last round of tests in the morning.

LucasAlegre · 2021-09-27T12:27:50Z

That sounds great, but just to confirm, what's the observation and reward for a light that can't change phase?

I'll run a last round of tests in the morning.

The observation is whatever it is at the moment, and the reward too. You can think of it as a state where all actions have the same effect. It does not affect learning.

jkterry1 · 2021-09-27T19:31:24Z

I don't believe that giving whatever reward the agent can get at the moment, when it was not able to act, is an expected behavior? This generally is not how reward works in reinforcement learning.

LucasAlegre · 2021-09-27T19:49:09Z

I don't believe that giving whatever reward the agent can get at the moment, when it was not able to act, is an expected behavior? This generally is not how reward works in reinforcement learning.

Sorry if I was not clear. It is not giving whatever reward the agent can get at the moment. It receives the actual reward as a consequence of its action.
Example: Suppose there are two actions/phases, A1 and A2. A1 is currently active, but it has not already passed min_green seconds yet. Now, if the agent selects A1 or A2, A1 will stay active anyway and the reward will be the difference between the delay before and after the action was executed (as in the Readme).

An analogy of this situation is a grid world where the agent is facing an upper-right corner wall. If he moves UP or RIGHT, it will have the same effect and it will receive the reward of the current cell.

jkterry1 · 2021-09-27T21:29:28Z

Hey, so I just sat down to run tests and I added the specific learning file that I'll be using with "sb3.py" in experiments. I had a few questions and issues that came up in the process:

-I know you said that you'd document the different environment xml files you created eventually, but in the mean time how many agents are in the 4x4 environment I'm using for testing so I can properly space evaluations?
-How many steps are you training these environments for, just so I have reference? Maybe I'm blind but I'm not seeing it defined in your rllib learning files.
-Looking at the code you've added, it seems like the way to create PZ environments is by using the "from sumo_rl import make_env" and using the make_env method and the way to create RLlib API environments is with "from sumo_rl import SumoEnvironment" function and using the "SumoEnvironment" method. This is kind of confusing behavior to me? Additionally, if that's the case there are unused "SumoEnvironmentPZ" imports
-Some changes probably need to be made to the readme need to be made in light of this PR, e.g. you have the line "The main class SumoEnvironment inherits MultiAgentEnv from RLlib"
-My test learning code fails, seemingly due to an error with how you added the pettingzoo internal wrappers with this error:

jkterry@prophet:~/sumo-rl$ python3 experiments/sb3.py
/home/jkterry/.local/lib/python3.6/site-packages/supersuit/__init__.py:20: UserWarning: You're using SuperSuit 3.0, released 7/7/21. The entire codebase has been rewritten or refactored as part of this release. While we've tested it thoroughly, please ensure everything you're doing still works properly and report any issues at https://github.com/PettingZoo-Team/SuperSuit. This warning will be removed 2 months after release.
  warnings.warn("You're using SuperSuit 3.0, released 7/7/21. The entire codebase has been rewritten or refactored as part of this release. While we've tested it thoroughly, please ensure everything you're doing still works properly and report any issues at https://github.com/PettingZoo-Team/SuperSuit. This warning will be removed 2 months after release.")
Traceback (most recent call last):
  File "experiments/sb3.py", line 24, in <module>
    env = env.parallel_env()
  File "/home/jkterry/.local/lib/python3.6/site-packages/pettingzoo/utils/wrappers/order_enforcing.py", line 45, in __getattr__
    f"'{type(self).__name__}' object has no attribute '{value}'"
AttributeError: 'OrderEnforcingWrapper' object has no attribute 'parallel_env'

LucasAlegre · 2021-09-27T22:28:00Z

Hey, so I just sat down to run tests and I added the specific learning file that I'll be using with "sb3.py" in experiments. I had a few questions and issues that came up in the process:

-I know you said that you'd document the different environment xml files you created eventually, but in the mean time how many agents are in the 4x4 environment I'm using for testing so I can properly space evaluations?

There are 16 agents in the 4x4 grid.

-How many steps are you training these environments for, just so I have reference? Maybe I'm blind but I'm not seeing it defined in your rllib learning files.

100 k steps should be enough.

-Looking at the code you've added, it seems like the way to create PZ environments is by using the "from sumo_rl import make_env" and using the make_env method and the way to create RLlib API environments is with "from sumo_rl import SumoEnvironment" function and using the "SumoEnvironment" method. This is kind of confusing behavior to me? Additionally, if that's the case there are unused "SumoEnvironmentPZ" imports

To use RLlib API I'm using the RLlib wrapper, check the file experiments/a3c_4x4grid.py in this PR. But I agree, I will update the imports.

-Some changes probably need to be made to the readme need to be made in light of this PR, e.g. you have the line "The main class SumoEnvironment inherits MultiAgentEnv from RLlib"

I already did, now it says "The main class SumoEnvironment behaves like a MultiAgentEnv from RLlib." But I will improve the README when I add the new environments.

-My test learning code fails, seemingly due to an error with how you added the pettingzoo internal wrappers with this error:

jkterry@prophet:~/sumo-rl$ python3 experiments/sb3.py
/home/jkterry/.local/lib/python3.6/site-packages/supersuit/__init__.py:20: UserWarning: You're using SuperSuit 3.0, released 7/7/21. The entire codebase has been rewritten or refactored as part of this release. While we've tested it thoroughly, please ensure everything you're doing still works properly and report any issues at https://github.com/PettingZoo-Team/SuperSuit. This warning will be removed 2 months after release.
  warnings.warn("You're using SuperSuit 3.0, released 7/7/21. The entire codebase has been rewritten or refactored as part of this release. While we've tested it thoroughly, please ensure everything you're doing still works properly and report any issues at https://github.com/PettingZoo-Team/SuperSuit. This warning will be removed 2 months after release.")
Traceback (most recent call last):
  File "experiments/sb3.py", line 24, in <module>
    env = env.parallel_env()
  File "/home/jkterry/.local/lib/python3.6/site-packages/pettingzoo/utils/wrappers/order_enforcing.py", line 45, in __getattr__
    f"'{type(self).__name__}' object has no attribute '{value}'"
AttributeError: 'OrderEnforcingWrapper' object has no attribute 'parallel_env'

This is because the method .parallel_env() does not exist. Where should it come from?

LucasAlegre · 2021-09-27T22:35:34Z

@jkterry1 I also notice that RLLib PettingZooEnv wrapper does not work together with the OrderEnforcingWrapper:

File "/home/lucas/miniconda3/lib/python3.7/site-packages/ray/rllib/env/pettingzoo_env.py", line 72, in __init__
    self.agents = self.aec_env.agents
  File "/home/lucas/miniconda3/lib/python3.7/site-packages/pettingzoo/utils/wrappers/order_enforcing.py", line 42, in __getattr__
    raise AttributeError(f"{value} cannot be accessed before reset")
AttributeError: agents cannot be accessed before reset

benblack769 · 2021-09-28T19:52:22Z

So the parallel_env() issue is that in pettingzoo, there are these modules with a env() and parallel_env() functions. https://github.com/PettingZoo-Team/PettingZoo/blob/master/test/example_envs/generated_agents_parallel_v0.py#L12 Environments don't spawn parallel environments, modules do. This isn't a requirement of the API, it is just a convention. If you want to turn an existing environment into a parallel environment, then you can just use

from pettingzoo.utils.conversions import to_parallel
to_parallel(env)

benblack769 · 2021-09-28T19:54:54Z

As for the rllib issue, what version of rllib are you using? That appears to be a fairly old version.

jkterry1 · 2021-09-28T21:01:13Z

@LucasAlegre Could you please add the parallel extension that ben described with the other changes?

LucasAlegre · 2021-09-29T02:24:21Z

As for the rllib issue, what version of rllib are you using? That appears to be a fairly old version.

Updating solved the issue, thanks!

LucasAlegre · 2021-09-29T03:19:18Z

@jkterry1 @benblack769
The code in 'experiments/sb3.py' is running now, but there are the following "issues":

When using LIBSUMO, it is not possible to instantiate more than 1 simulation at the same time. This means that eval_callback can't work. If you use TRACI, I can easily implement multi-client support and this would be possible. But remember that TRACI is way slower than LIBSUMO, so I'm not sure whether this is advantageous.
As I explained before, render() does nothing. You need to instantiate the env with 'use_gui=True' to run the SUMO-GUI and watch the simulation.

I believe we could merge this PR, as there are already many changes. Next I can:

Add the new environments with their documentations (that should be really quick, I already have the sumo files)
Improve README and examples.
Implement the multi-client support in case you really need it. (this should be done changing very few lines of code)

What do you think?

jkterry1 · 2021-09-29T13:59:55Z

If you want to merge now and open a new PR for future changes that's fine with me, I'll reply tonight about the render and environment duplication problems.

jkterry1 added 5 commits September 12, 2021 13:58

trial pettingzoo support

f934ca9

more fixes

9793165

tests and more fixes

4453e0c

add more files

260251e

typo

944e5b8

jkterry1 added 24 commits September 12, 2021 14:44

more fixes

a943cd5

more fixes

37a8fa4

fix tests

37c2fd3

typo

5638583

more fixes

811ea3b

fix sumo install

79cea21

try again

cc48d0a

typo

49fde94

try something new

bbc5898

change order

a468ba5

fix echos

8d72c3d

typo

27654af

try again

1dc3d87

fix seeding import

8442537

fix

e842edc

fix

8504228

try new thing

56e3c16

try again

88dfaf9

typo

18c611b

more fixes

439e49a

more fixes

5ce02c3

more fixes

a22ef57

fix rewards

34a4e29

more fixes

316c40a

LucasAlegre added 3 commits September 25, 2021 13:04

Removed Stdout capture

1cd9f97

Fix seed

662964a

Removed rllib dependence

1cbc8ea

LucasAlegre added 2 commits September 26, 2021 22:25

Add PIL dependency

88ab32b

Add PIL dependency

d1ddd97

jkterry1 added 4 commits September 27, 2021 17:17

sb3 experiments

9b5153f

fix

b74a102

remove comments

b4b4a13

white space

cba4590

LucasAlegre added 2 commits September 28, 2021 23:32

env and parallel_env API

0998de4

comment eval and render

af33499

LucasAlegre merged commit dd5bf4a into LucasAlegre:master Sep 29, 2021

jkterry1 mentioned this pull request Sep 29, 2021

Questions for SB3 Script #47

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] PettingZoo Support #45

[Draft] PettingZoo Support #45

jkterry1 commented Sep 12, 2021 •

edited

jkterry1 commented Sep 12, 2021

LucasAlegre commented Sep 25, 2021

LucasAlegre commented Sep 25, 2021

jkterry1 commented Sep 26, 2021

LucasAlegre commented Sep 27, 2021

jkterry1 commented Sep 27, 2021

LucasAlegre commented Sep 27, 2021

jkterry1 commented Sep 27, 2021

LucasAlegre commented Sep 27, 2021

jkterry1 commented Sep 27, 2021

LucasAlegre commented Sep 27, 2021

LucasAlegre commented Sep 27, 2021

benblack769 commented Sep 28, 2021

benblack769 commented Sep 28, 2021

jkterry1 commented Sep 28, 2021

LucasAlegre commented Sep 29, 2021

LucasAlegre commented Sep 29, 2021

jkterry1 commented Sep 29, 2021 •

edited

[Draft] PettingZoo Support #45

[Draft] PettingZoo Support #45

Conversation

jkterry1 commented Sep 12, 2021 • edited

jkterry1 commented Sep 12, 2021

LucasAlegre commented Sep 25, 2021

LucasAlegre commented Sep 25, 2021

jkterry1 commented Sep 26, 2021

LucasAlegre commented Sep 27, 2021

jkterry1 commented Sep 27, 2021

LucasAlegre commented Sep 27, 2021

jkterry1 commented Sep 27, 2021

LucasAlegre commented Sep 27, 2021

jkterry1 commented Sep 27, 2021

LucasAlegre commented Sep 27, 2021

LucasAlegre commented Sep 27, 2021

benblack769 commented Sep 28, 2021

benblack769 commented Sep 28, 2021

jkterry1 commented Sep 28, 2021

LucasAlegre commented Sep 29, 2021

LucasAlegre commented Sep 29, 2021

jkterry1 commented Sep 29, 2021 • edited

jkterry1 commented Sep 12, 2021 •

edited

jkterry1 commented Sep 29, 2021 •

edited