Merge pull request #419 from Rohan138/master

Farama-Foundation · Jul 16, 2021 · 383c152 · 383c152
2 parents 07e96c6 + f35548d
commit 383c152
Show file tree

Hide file tree

Showing 23 changed files with 140 additions and 84 deletions.
diff --git a/docs/mpe.md b/docs/mpe.md
@@ -11,7 +11,7 @@ pip install pettingzoo[mpe]
 
 Multi Particle Environments (MPE) are a set of communication oriented environment where particle agents can (sometimes) move, communicate, see each other, push each other around, and interact with fixed landmarks.
 
-These environments are from [OpenAI's MPE](https://github.com/openai/multiagent-particle-envs) codebase, with several minor fixes, mostly related to making the action space discrete, making the rewards consistent and cleaning up the observation space of certain environments.
+These environments are from [OpenAI's MPE](https://github.com/openai/multiagent-particle-envs) codebase, with several minor fixes, mostly related to making the action space discrete by default, making the rewards consistent and cleaning up the observation space of certain environments.
 
 ### Types of Environments
 
@@ -43,8 +43,16 @@ If an agent cannot see or observe the communication of a second agent, then the
 
 ### Action Space
 
+Note: [OpenAI's MPE](https://github.com/openai/multiagent-particle-envs) uses continuous action spaces by default.
+
+Discrete action space (Default):
+
 The action space is a discrete action space representing the combinations of movements and communications an agent can perform. Agents that can move can choose between the 4 cardinal directions or do nothing. Agents that can communicate choose between 2 and 10 environment-dependent communication options, which broadcast a message to all agents that can hear it.
 
+Continuous action space (Set by continuous_actions=True):
+
+The action space is a continuous action space representing the movements and communication an agent can perform. Agents that can move can input a velocity between 0.0 and 1.0 in each of the four cardinal directions, where opposing velocities e.g. left and right are summed together. Agents that can communicate can output a continuous value over each communication channel in the environment which they have access to.
+
 ### Rendering
 
 Rendering displays the scene in a window that automatically grows if agents wander beyond its border. Communication is rendered at the bottom of the scene. The `render()` method also returns the pixel map of the rendered area.

diff --git a/docs/mpe/simple.md b/docs/mpe/simple.md
@@ -1,10 +1,10 @@
 ---
-actions: "Discrete"
+actions: "Discrete/Continuous"
 title: "Simple"
 agents: "1"
 manual-control: "No"
 action-shape: "(5)"
-action-values: "Discrete(5)"
+action-values: "Discrete(5)/Box(0.0, 1.0, (5,))"
 observation-shape: "(4)"
 observation-values: "(-inf,inf)"
 import: "from pettingzoo.mpe import simple_v2"
@@ -22,10 +22,12 @@ Observation space: `[self_vel, landmark_rel_position]`
 ### Arguments
 
 ```
-simple_v2.env(max_cycles=25)
+simple_v2.env(max_cycles=25, continuous_actions=False)
 ```
 
 
 
 `max_cycles`:  number of frames (a step for each agent) until game terminates
 
+`continuous_actions`: Whether agent action spaces are discrete(default) or continuous
+
diff --git a/docs/mpe/simple_adversary.md b/docs/mpe/simple_adversary.md
@@ -1,10 +1,10 @@
 ---
-actions: "Discrete"
+actions: "Discrete/Continuous"
 title: "Simple Adversary"
 agents: "3"
 manual-control: "No"
 action-shape: "(5)"
-action-values: "Discrete(5)"
+action-values: "Discrete(5)/Box(0.0, 1.0, (5))"
 observation-shape: "(8),(10)"
 observation-values: "(-inf,inf)"
 import: "from pettingzoo.mpe import simple_adversary_v2"
@@ -28,11 +28,13 @@ Adversary action space: `[no_action, move_left, move_right, move_down, move_up]`
 ### Arguments
 
 ```
-simple_adversary_v2.env(N=2, max_cycles=25)
+simple_adversary_v2.env(N=2, max_cycles=25, continuous_actions=False)
 ```
 
 
 
 `N`:  number of good agents and landmarks
 
 `max_cycles`:  number of frames (a step for each agent) until game terminates
+
+`continuous_actions`: Whether agent action spaces are discrete(default) or continuous
diff --git a/docs/mpe/simple_crypto.md b/docs/mpe/simple_crypto.md
@@ -1,10 +1,10 @@
 ---
-actions: "Discrete"
+actions: "Discrete/Continuous"
 title: "Simple Crypto"
 agents: "2"
 manual-control: "No"
 action-shape: "(4)"
-action-values: "Discrete(4)"
+action-values: "Discrete(4)/Box(0.0, 1.0, (4))"
 observation-shape: "(4),(8)"
 observation-values: "(-inf,inf)"
 import: "from pettingzoo.mpe import simple_crypto_v2"
@@ -35,9 +35,11 @@ For Bob and Eve, their communication is checked to be the 1 bit of information t
 ### Arguments
 
 ```
-simple_crypto_v2.env(max_cycles=25)
+simple_crypto_v2.env(max_cycles=25, continuous_actions=False)
 ```
 
 
 
 `max_cycles`:  number of frames (a step for each agent) until game terminates
+
+`continuous_actions`: Whether agent action spaces are discrete(default) or continuous
diff --git a/docs/mpe/simple_push.md b/docs/mpe/simple_push.md
@@ -1,10 +1,10 @@
 ---
-actions: "Discrete"
+actions: "Discrete/Continuous"
 title: "Simple Push"
 agents: "2"
 manual-control: "No"
 action-shape: "(5)"
-action-values: "Discrete(5)"
+action-values: "Discrete(5)/Box(0.0, 1.0, (5,))"
 observation-shape: "(8),(19)"
 observation-values: "(-inf,inf)"
 import: "from pettingzoo.mpe import simple_push_v2"
@@ -28,7 +28,7 @@ Adversary action space: `[no_action, move_left, move_right, move_down, move_up]`
 ### Arguments
 
 ```
-simple_push_v2.env(max_cycles=25)
+simple_push_v2.env(max_cycles=25, continuous_actions=False)
 ```
 
 

diff --git a/docs/mpe/simple_reference.md b/docs/mpe/simple_reference.md
@@ -1,10 +1,10 @@
 ---
-actions: "Discrete"
+actions: "Discrete/Continuous"
 title: "Simple Reference"
 agents: "2"
 manual-control: "No"
 action-shape: "(50)"
-action-values: "Discrete(50)"
+action-values: "Discrete(50)/Box(0.0, 1.0, (15))"
 observation-shape: "(21)"
 observation-values: "(-inf,inf)"
 average-total-reward: "-57.1"
@@ -22,19 +22,24 @@ Locally, the agents are rewarded by their distance to their target landmark. Glo
 
 Agent observation space: `[self_vel, all_landmark_rel_positions, landmark_ids, goal_id, communication]`
 
-Agent action space: `[say_0, say_1, say_2, say_3, say_4, say_5, say_6, say_7, say_8, say_9] X [no_action, move_left, move_right, move_down, move_up]`
+Agent discrete action space: `[say_0, say_1, say_2, say_3, say_4, say_5, say_6, say_7, say_8, say_9] X [no_action, move_left, move_right, move_down, move_up]`
 
 Where X is the Cartesian product (giving a total action space of 50).
 
+Agent continuous action space: `[no_action, move_left, move_right, move_down, move_up, say_0, say_1, say_2, say_3, say_4, say_5, say_6, say_7, say_8, say_9]`
+
 ### Arguments
 
 
 ```
-simple_reference_v2.env(local_ratio=0.5, max_cycles=25)
+simple_reference_v2.env(local_ratio=0.5, max_cycles=25, continuous_actions=False)
 ```
 
 
 
 `local_ratio`:  Weight applied to local reward and global reward. Global reward weight will always be 1 - local reward weight.
 
 `max_cycles`:  number of frames (a step for each agent) until game terminates
+
+`continuous_actions`: Whether agent action spaces are discrete(default) or continuous
+
diff --git a/docs/mpe/simple_speaker_listener.md b/docs/mpe/simple_speaker_listener.md
@@ -1,10 +1,10 @@
 ---
-actions: "Discrete"
+actions: "Discrete/Continuous"
 title: "Simple Speaker Listener"
 agents: "2"
 manual-control: "No"
 action-shape: "(3),(5)"
-action-values: "Discrete(3),(5)"
+action-values: "Discrete(3),(5)/Box(0.0, 1.0, (3)), Box(0.0, 1.0, (5))"
 observation-shape: "(3),(11)"
 observation-values: "(-inf,inf)"
 average-total-reward: "-80.9"
@@ -29,9 +29,11 @@ Listener action space: `[no_action, move_left, move_right, move_down, move_up]`
 ### Arguments
 
 ```
-simple_speaker_listener_v2.env(max_cycles=25)
+simple_speaker_listener_v2.env(max_cycles=25, continuous_actions=False)
 ```
 
 
 
 `max_cycles`:  number of frames (a step for each agent) until game terminates
+
+`continuous_actions`: Whether agent action spaces are discrete(default) or continuous
diff --git a/docs/mpe/simple_spread.md b/docs/mpe/simple_spread.md
@@ -1,10 +1,10 @@
 ---
-actions: "Discrete"
+actions: "Discrete/Continuous"
 title: "Simple Spread"
 agents: "3"
 manual-control: "No"
 action-shape: "(5)"
-action-values: "Discrete(5)"
+action-values: "Discrete(5)/Box(0.0, 1.0, (5))"
 observation-shape: "(18)"
 observation-values: "(-inf,inf)"
 average-total-reward: "-115.6"
@@ -27,7 +27,7 @@ Agent action space: `[no_action, move_left, move_right, move_down, move_up]`
 ### Arguments
 
 ```
-simple_spread_v2.env(N=3, local_ratio=0.5, max_cycles=25)
+simple_spread_v2.env(N=3, local_ratio=0.5, max_cycles=25, continuous_actions=False)
 ```
 
 
@@ -37,3 +37,5 @@ simple_spread_v2.env(N=3, local_ratio=0.5, max_cycles=25)
 `local_ratio`:  Weight applied to local reward and global reward. Global reward weight will always be 1 - local reward weight.
 
 `max_cycles`:  number of frames (a step for each agent) until game terminates
+
+`continuous_actions`: Whether agent action spaces are discrete(default) or continuous
diff --git a/docs/mpe/simple_tag.md b/docs/mpe/simple_tag.md
@@ -1,10 +1,10 @@
 ---
-actions: "Discrete"
+actions: "Discrete/Continuous"
 title: "Simple Tag"
 agents: "4"
 manual-control: "No"
 action-shape: "(5)"
-action-values: "Discrete(5)"
+action-values: "Discrete(5)/Box(0.0, 1.0, (50))"
 observation-shape: "(14),(16)"
 observation-values: "(-inf,inf)"
 import: "from pettingzoo.mpe import simple_tag_v2"
@@ -34,7 +34,7 @@ Agent and adversary action space: `[no_action, move_left, move_right, move_down,
 ### Arguments
 
 ```
-simple_tag_v2.env(num_good=1, num_adversaries=3, num_obstacles=2 , max_cycles=25)
+simple_tag_v2.env(num_good=1, num_adversaries=3, num_obstacles=2, max_cycles=25, continuous_actions=False)
 ```
 
 
@@ -47,3 +47,5 @@ simple_tag_v2.env(num_good=1, num_adversaries=3, num_obstacles=2 , max_cycles=25
 
 `max_cycles`:  number of frames (a step for each agent) until game terminates
 
+`continuous_actions`: Whether agent action spaces are discrete(default) or continuous
+
diff --git a/docs/mpe/simple_world_comm.md b/docs/mpe/simple_world_comm.md
@@ -1,10 +1,10 @@
 ---
-actions: "Discrete"
+actions: "Discrete/Continuous"
 title: "Simple World Comm"
 agents: "6"
 manual-control: "No"
 action-shape: "(5),(20)"
-action-values: "Discrete(5),(20)"
+action-values: "Discrete(5),(20)/Box(0.0, 1.0, (5)), Box(0.0, 1.0, (9))"
 observation-shape: "(28),(34)"
 observation-values: "(-inf,inf)"
 import: "from pettingzoo.mpe import simple_world_comm_v2"
@@ -31,16 +31,17 @@ Good agent action space: `[no_action, move_left, move_right, move_down, move_up]
 
 Normal adversary action space: `[no_action, move_left, move_right, move_down, move_up]`
 
-Adversary leader observation space: `[say_0, say_1, say_2, say_3] X [no_action, move_left, move_right, move_down, move_up]`
+Adversary leader discrete action space: `[say_0, say_1, say_2, say_3] X [no_action, move_left, move_right, move_down, move_up]`
 
 Where X is the Cartesian product (giving a total action space of 50).
 
+Adversary leader continuous action space: `[no_action, move_left, move_right, move_down, move_up, say_0, say_1, say_2, say_3]`
 
 ### Arguments
 
 ```
 simple_world_comm.env(num_good=2, num_adversaries=4, num_obstacles=1,
-                num_food=2, max_cycles=25, num_forests=2)
+                num_food=2, max_cycles=25, num_forests=2, continuous_actions=False)
 ```
 
 
@@ -57,3 +58,5 @@ simple_world_comm.env(num_good=2, num_adversaries=4, num_obstacles=1,
 
 `num_forests`: number of forests that can hide agents inside from being seen
 
+`continuous_actions`: Whether agent action spaces are discrete(default) or continuous
+
diff --git a/pettingzoo/mpe/_mpe_utils/rendering.py b/pettingzoo/mpe/_mpe_utils/rendering.py
@@ -283,7 +283,7 @@ def set_text(self, text):
         self.label = pyglet.text.Label(text,
                                        font_name=font,
                                        color=(0, 0, 0, 255),
-                                       font_size=25,
+                                       font_size=20,
                                        x=0, y=self.idx * 40 + 20,
                                        anchor_x="left", anchor_y="bottom")