### Step 1: Install and import libraries

In [30]:
pip install momaland

Note: you may need to restart the kernel to use updated packages.


### Step 2: Create an environment

In [10]:
from momaland.envs.multiwalker import momultiwalker_v0

env=momultiwalker_v0.env()

### Step 3: Extract environment information

In [11]:
env.observation_spaces

{'walker_0': Box(-inf, inf, (31,), float32),
 'walker_1': Box(-inf, inf, (31,), float32),
 'walker_2': Box(-inf, inf, (31,), float32)}

In [12]:
env.action_spaces

{'walker_0': Box(-1.0, 1.0, (4,), float32),
 'walker_1': Box(-1.0, 1.0, (4,), float32),
 'walker_2': Box(-1.0, 1.0, (4,), float32)}

In [13]:
env.reward_spaces

{'walker_0': Box([  -0.46666667 -110.         -100.        ], [0.46666667 0.         0.        ], (3,), float32),
 'walker_1': Box([  -0.46666667 -110.         -100.        ], [0.46666667 0.         0.        ], (3,), float32),
 'walker_2': Box([  -0.46666667 -110.         -100.        ], [0.46666667 0.         0.        ], (3,), float32)}

### Step 4.1: AEC API Demo
Observation, rewards, termination, truncation, and info are returned by the `last()` function, as in PZ. Except the rewards are vectorial!

In [14]:
env.reset()
episode_rewards = []
for agent in env.agent_iter():
    # the rewards are vectors!
    observation, vec_reward, termination, truncation, info = env.last()
    episode_rewards.append(vec_reward)
    if termination or truncation:
        action = None
    else:
        action = env.action_space(agent).sample()

    env.step(action)
env.close()
print(episode_rewards)

[array([0., 0., 0.], dtype=float32), array([0., 0., 0.], dtype=float32), array([0., 0., 0.], dtype=float32), array([0., 0., 0.], dtype=float32), array([0., 0., 0.], dtype=float32), array([0., 0., 0.]), array([0., 0., 0.], dtype=float32), array([0., 0., 0.], dtype=float32), array([0., 0., 0.]), array([0., 0., 0.], dtype=float32), array([0., 0., 0.], dtype=float32), array([0., 0., 0.]), array([0., 0., 0.], dtype=float32), array([0., 0., 0.], dtype=float32), array([0., 0., 0.]), array([0., 0., 0.], dtype=float32), array([0., 0., 0.], dtype=float32), array([0., 0., 0.]), array([0., 0., 0.], dtype=float32), array([0., 0., 0.], dtype=float32), array([0., 0., 0.]), array([0., 0., 0.], dtype=float32), array([0., 0., 0.], dtype=float32), array([0., 0., 0.]), array([0., 0., 0.], dtype=float32), array([0., 0., 0.], dtype=float32), array([0., 0., 0.]), array([0., 0., 0.], dtype=float32), array([0., 0., 0.], dtype=float32), array([0., 0., 0.]), array([0., 0., 0.], dtype=float32), array([0., 0., 0.]

### Step 4.2: Parallel API Demo
Environment is initialized with the `parallel_env()` function. Agents `step()` through the environment with their actions and receive observation/reward, similar to Gym. Key difference between AEC and Parallel is that actions and observations are dictionaries!

In [15]:
env = momultiwalker_v0.parallel_env()
obs, info = env.reset()
episode_rewards = []
while env.agents:
    actions = {agent: env.action_space(agent).sample() for agent in env.agents}
    observations, vec_rewards, terminations, truncations, infos = env.step(actions)
    episode_rewards.append(vec_rewards)
    print(vec_reward)
env.close()
print(episode_rewards)

[-9.98497009e-03 -1.03333336e+02  0.00000000e+00]
[-9.98497009e-03 -1.03333336e+02  0.00000000e+00]
[-9.98497009e-03 -1.03333336e+02  0.00000000e+00]
[-9.98497009e-03 -1.03333336e+02  0.00000000e+00]
[-9.98497009e-03 -1.03333336e+02  0.00000000e+00]
[-9.98497009e-03 -1.03333336e+02  0.00000000e+00]
[-9.98497009e-03 -1.03333336e+02  0.00000000e+00]
[-9.98497009e-03 -1.03333336e+02  0.00000000e+00]
[-9.98497009e-03 -1.03333336e+02  0.00000000e+00]
[-9.98497009e-03 -1.03333336e+02  0.00000000e+00]
[-9.98497009e-03 -1.03333336e+02  0.00000000e+00]
[-9.98497009e-03 -1.03333336e+02  0.00000000e+00]
[-9.98497009e-03 -1.03333336e+02  0.00000000e+00]
[-9.98497009e-03 -1.03333336e+02  0.00000000e+00]
[-9.98497009e-03 -1.03333336e+02  0.00000000e+00]
[-9.98497009e-03 -1.03333336e+02  0.00000000e+00]
[-9.98497009e-03 -1.03333336e+02  0.00000000e+00]
[-9.98497009e-03 -1.03333336e+02  0.00000000e+00]
[-9.98497009e-03 -1.03333336e+02  0.00000000e+00]
[-9.98497009e-03 -1.03333336e+02  0.00000000e+00]


### Step 5: Training with MOMAPPO with OLS for Weight Generation for Cooperative Settings
Find the full single-file implementation [here](https://github.com/Farama-Foundation/momaland/blob/main/momaland/learning/continuous/cooperative_momappo.py).