Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] PettingZoo Support #45

Merged
merged 61 commits into from
Sep 29, 2021
Merged

[Draft] PettingZoo Support #45

merged 61 commits into from
Sep 29, 2021

Conversation

jkterry1
Copy link
Contributor

@jkterry1 jkterry1 commented Sep 12, 2021

This is a nonworking draft of adding support for the PettingZoo API for multi-agent RL. This will make it able to be used with a bunch of other multi-agent libraries (SB3 and similar via SuperSuit, the ALL, Tianshou, etc.) RLlib also has native PettingZoo support too. Feel free to take a look and hopefully finish this in the next week.

@jkterry1
Copy link
Contributor Author

This PR also adds basic CI

@LucasAlegre
Copy link
Owner

@LucasAlegre

Also, in the current structure, the pettingzoo class inherits from the RLlib env class. This is not ideal because it means that people who are training on these environments with libraries other than RLlib (like me) still have it install it, and RLlib is an incredibly onerous dependency. Given that RLlib has full pettingzoo support (as do a lot of other MARL libraries), how would you feel about only having a pettingzoo environment and just changing the examples to use the pettingzoo class instead of having a pettingzoo specific example?

That is something I was already thinking about!
In my last commit 1cbc8ea I removed the rllib dependence. I also made the agents act synchronously, as I think it is better this way.

@LucasAlegre
Copy link
Owner

@jkterry1 pytest is failing because of:

/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/pettingzoo/utils/save_observation.py:5: in <module>
    from PIL import Image
E   ModuleNotFoundError: No module named 'PIL'

I guess you should add PIL as pettinzoo dependence.

@jkterry1
Copy link
Contributor Author

@LucasAlegre

Thanks a ton! A few things:

-The PIL issue will be fixed in the next pettingzoo release (in a few weeks), can you please just add it as a dependency for now so tests properly function?
-Did you see my comment about the expected behavior of rewards and observations above?
-What do you mean by "I also made the agents act synchronously, as I think it is better this way."?

@LucasAlegre
Copy link
Owner

@LucasAlegre

Thanks a ton! A few things:

-The PIL issue will be fixed in the next pettingzoo release (in a few weeks), can you please just add it as a dependency for now so tests properly function?

Done!

-Did you see my comment about the expected behavior of rewards and observations above?
-What do you mean by "I also made the agents act synchronously, as I think it is better this way."?

I made all traffic signal agents act synchronously every 'delta' sumo time-steps. Then, if an agent can't change phase because it has not passed 'min_green' seconds yet, it will just keep the same phase no matter the action is. It solves the problem of agents popping in and out of life, and thus addresses your comment about the expected behavior of rewards and observations.

@jkterry1
Copy link
Contributor Author

That sounds great, but just to confirm, what's the observation and reward for a light that can't change phase?

I'll run a last round of tests in the morning.

@LucasAlegre
Copy link
Owner

That sounds great, but just to confirm, what's the observation and reward for a light that can't change phase?

I'll run a last round of tests in the morning.

The observation is whatever it is at the moment, and the reward too. You can think of it as a state where all actions have the same effect. It does not affect learning.

@jkterry1
Copy link
Contributor Author

I don't believe that giving whatever reward the agent can get at the moment, when it was not able to act, is an expected behavior? This generally is not how reward works in reinforcement learning.

@LucasAlegre
Copy link
Owner

I don't believe that giving whatever reward the agent can get at the moment, when it was not able to act, is an expected behavior? This generally is not how reward works in reinforcement learning.

Sorry if I was not clear. It is not giving whatever reward the agent can get at the moment. It receives the actual reward as a consequence of its action.
Example: Suppose there are two actions/phases, A1 and A2. A1 is currently active, but it has not already passed min_green seconds yet. Now, if the agent selects A1 or A2, A1 will stay active anyway and the reward will be the difference between the delay before and after the action was executed (as in the Readme).

An analogy of this situation is a grid world where the agent is facing an upper-right corner wall. If he moves UP or RIGHT, it will have the same effect and it will receive the reward of the current cell.

@jkterry1
Copy link
Contributor Author

Hey, so I just sat down to run tests and I added the specific learning file that I'll be using with "sb3.py" in experiments. I had a few questions and issues that came up in the process:

-I know you said that you'd document the different environment xml files you created eventually, but in the mean time how many agents are in the 4x4 environment I'm using for testing so I can properly space evaluations?
-How many steps are you training these environments for, just so I have reference? Maybe I'm blind but I'm not seeing it defined in your rllib learning files.
-Looking at the code you've added, it seems like the way to create PZ environments is by using the "from sumo_rl import make_env" and using the make_env method and the way to create RLlib API environments is with "from sumo_rl import SumoEnvironment" function and using the "SumoEnvironment" method. This is kind of confusing behavior to me? Additionally, if that's the case there are unused "SumoEnvironmentPZ" imports
-Some changes probably need to be made to the readme need to be made in light of this PR, e.g. you have the line "The main class SumoEnvironment inherits MultiAgentEnv from RLlib"
-My test learning code fails, seemingly due to an error with how you added the pettingzoo internal wrappers with this error:

jkterry@prophet:~/sumo-rl$ python3 experiments/sb3.py
/home/jkterry/.local/lib/python3.6/site-packages/supersuit/__init__.py:20: UserWarning: You're using SuperSuit 3.0, released 7/7/21. The entire codebase has been rewritten or refactored as part of this release. While we've tested it thoroughly, please ensure everything you're doing still works properly and report any issues at https://github.com/PettingZoo-Team/SuperSuit. This warning will be removed 2 months after release.
  warnings.warn("You're using SuperSuit 3.0, released 7/7/21. The entire codebase has been rewritten or refactored as part of this release. While we've tested it thoroughly, please ensure everything you're doing still works properly and report any issues at https://github.com/PettingZoo-Team/SuperSuit. This warning will be removed 2 months after release.")
Traceback (most recent call last):
  File "experiments/sb3.py", line 24, in <module>
    env = env.parallel_env()
  File "/home/jkterry/.local/lib/python3.6/site-packages/pettingzoo/utils/wrappers/order_enforcing.py", line 45, in __getattr__
    f"'{type(self).__name__}' object has no attribute '{value}'"
AttributeError: 'OrderEnforcingWrapper' object has no attribute 'parallel_env'

@LucasAlegre
Copy link
Owner

Hey, so I just sat down to run tests and I added the specific learning file that I'll be using with "sb3.py" in experiments. I had a few questions and issues that came up in the process:

-I know you said that you'd document the different environment xml files you created eventually, but in the mean time how many agents are in the 4x4 environment I'm using for testing so I can properly space evaluations?

There are 16 agents in the 4x4 grid.

-How many steps are you training these environments for, just so I have reference? Maybe I'm blind but I'm not seeing it defined in your rllib learning files.

100 k steps should be enough.

-Looking at the code you've added, it seems like the way to create PZ environments is by using the "from sumo_rl import make_env" and using the make_env method and the way to create RLlib API environments is with "from sumo_rl import SumoEnvironment" function and using the "SumoEnvironment" method. This is kind of confusing behavior to me? Additionally, if that's the case there are unused "SumoEnvironmentPZ" imports

To use RLlib API I'm using the RLlib wrapper, check the file experiments/a3c_4x4grid.py in this PR. But I agree, I will update the imports.

-Some changes probably need to be made to the readme need to be made in light of this PR, e.g. you have the line "The main class SumoEnvironment inherits MultiAgentEnv from RLlib"

I already did, now it says "The main class SumoEnvironment behaves like a MultiAgentEnv from RLlib." But I will improve the README when I add the new environments.

-My test learning code fails, seemingly due to an error with how you added the pettingzoo internal wrappers with this error:

jkterry@prophet:~/sumo-rl$ python3 experiments/sb3.py
/home/jkterry/.local/lib/python3.6/site-packages/supersuit/__init__.py:20: UserWarning: You're using SuperSuit 3.0, released 7/7/21. The entire codebase has been rewritten or refactored as part of this release. While we've tested it thoroughly, please ensure everything you're doing still works properly and report any issues at https://github.com/PettingZoo-Team/SuperSuit. This warning will be removed 2 months after release.
  warnings.warn("You're using SuperSuit 3.0, released 7/7/21. The entire codebase has been rewritten or refactored as part of this release. While we've tested it thoroughly, please ensure everything you're doing still works properly and report any issues at https://github.com/PettingZoo-Team/SuperSuit. This warning will be removed 2 months after release.")
Traceback (most recent call last):
  File "experiments/sb3.py", line 24, in <module>
    env = env.parallel_env()
  File "/home/jkterry/.local/lib/python3.6/site-packages/pettingzoo/utils/wrappers/order_enforcing.py", line 45, in __getattr__
    f"'{type(self).__name__}' object has no attribute '{value}'"
AttributeError: 'OrderEnforcingWrapper' object has no attribute 'parallel_env'

This is because the method .parallel_env() does not exist. Where should it come from?

@LucasAlegre
Copy link
Owner

@jkterry1 I also notice that RLLib PettingZooEnv wrapper does not work together with the OrderEnforcingWrapper:

File "/home/lucas/miniconda3/lib/python3.7/site-packages/ray/rllib/env/pettingzoo_env.py", line 72, in __init__
    self.agents = self.aec_env.agents
  File "/home/lucas/miniconda3/lib/python3.7/site-packages/pettingzoo/utils/wrappers/order_enforcing.py", line 42, in __getattr__
    raise AttributeError(f"{value} cannot be accessed before reset")
AttributeError: agents cannot be accessed before reset

@benblack769
Copy link

So the parallel_env() issue is that in pettingzoo, there are these modules with a env() and parallel_env() functions. https://github.com/PettingZoo-Team/PettingZoo/blob/master/test/example_envs/generated_agents_parallel_v0.py#L12 Environments don't spawn parallel environments, modules do. This isn't a requirement of the API, it is just a convention. If you want to turn an existing environment into a parallel environment, then you can just use

from pettingzoo.utils.conversions import to_parallel
to_parallel(env)

@benblack769
Copy link

As for the rllib issue, what version of rllib are you using? That appears to be a fairly old version.

@jkterry1
Copy link
Contributor Author

@LucasAlegre Could you please add the parallel extension that ben described with the other changes?

@LucasAlegre
Copy link
Owner

As for the rllib issue, what version of rllib are you using? That appears to be a fairly old version.

Updating solved the issue, thanks!

@LucasAlegre
Copy link
Owner

@jkterry1 @benblack769
The code in 'experiments/sb3.py' is running now, but there are the following "issues":

  • When using LIBSUMO, it is not possible to instantiate more than 1 simulation at the same time. This means that eval_callback can't work. If you use TRACI, I can easily implement multi-client support and this would be possible. But remember that TRACI is way slower than LIBSUMO, so I'm not sure whether this is advantageous.
  • As I explained before, render() does nothing. You need to instantiate the env with 'use_gui=True' to run the SUMO-GUI and watch the simulation.

I believe we could merge this PR, as there are already many changes. Next I can:

  • Add the new environments with their documentations (that should be really quick, I already have the sumo files)
  • Improve README and examples.
  • Implement the multi-client support in case you really need it. (this should be done changing very few lines of code)

What do you think?

@jkterry1
Copy link
Contributor Author

jkterry1 commented Sep 29, 2021

If you want to merge now and open a new PR for future changes that's fine with me, I'll reply tonight about the render and environment duplication problems.

@LucasAlegre LucasAlegre merged commit dd5bf4a into LucasAlegre:master Sep 29, 2021
@jkterry1 jkterry1 mentioned this pull request Sep 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants