# Multi-Agent Environments

Two multiagent environments are given in the package:

* [GeneralSatelliteTasking](../api_reference/index.rst#bsk_rl.GeneralSatelliteTasking), 
  a [Gymnasium](https://gymnasium.farama.org)-based environment and the basis for all other environments.
* [ConstellationTasking](../api_reference/index.rst#bsk_rl.ConstellationTasking), which
  implements the [PettingZoo parallel API](https://pettingzoo.farama.org/api/parallel/).

The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed
for this kind of API.

## Configuring the Environment

For this example, a multisatellite target imaging environment will be used. The goal is
to maximize the value of unique images taken.

As usual, the satellite type is defined first.

In [1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw

class ImagingSatellite(sats.ImagingSatellite):
    observation_spec = [
        obs.OpportunityProperties(
            dict(prop="priority"), 
            dict(prop="opportunity_open", norm=5700.0),
            n_ahead_observe=10,
        )
    ]
    action_spec = [act.Image(n_ahead_image=10)]
    dyn_type = dyn.FullFeaturedDynModel
    fsw_type = fsw.SteeringImagerFSWModel

Satellite properties are set to give the satellite near-unlimited power and storage
resources, and put the satellite at a 800 km orbit.

In [2]:

from bsk_rl.utils.orbital import random_orbit

sat_args = dict(
    imageAttErrorRequirement=0.01,
    imageRateErrorRequirement=0.01,
    batteryStorageCapacity=1e9,
    storedCharge_Init=1e9,
    dataStorageCapacity=1e12,
    u_max=0.4,
    K1=0.25,
    K3=3.0,
    omega_max=0.087,
    servo_Ki=5.0,
    servo_P=150 / 5,
    oe=lambda: random_orbit(alt=800),
)

## Gym API

GeneralSatelliteTasking uses tuples of actions and observations to interact with the
environment.

In [3]:
from bsk_rl import GeneralSatelliteTasking

env = GeneralSatelliteTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    log_level="INFO",
)
env.reset()

env.observation_space

[90;3m2024-07-24 14:33:50,326 [0m[mgym                            [0m[mINFO       [0m[mResetting environment with seed=364059936[0m


[90;3m2024-07-24 14:33:50,327 [0m[mscene.targets                  [0m[mINFO       [0m[mGenerating 1000 targets[0m


[90;3m2024-07-24 14:33:50,484 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-07-24 14:33:50,507 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-07-24 14:33:50,526 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-07-24 14:33:50,546 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-07-24 14:33:50,568 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[mSatellites requiring retasking: ['EO-1', 'EO-2', 'EO-3'][0m


[90;3m2024-07-24 14:33:50,568 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[mEnvironment reset[0m


Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))

In [4]:
env.action_space

Tuple(Discrete(10), Discrete(10), Discrete(10))

Consequently, actions are passed as a tuple. The step will stop the first time any
satellite completes an action.

In [5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])

[90;3m2024-07-24 14:33:50,578 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-07-24 14:33:50,578 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mtarget index 7 tasked[0m


[90;3m2024-07-24 14:33:50,578 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-312) tasked for imaging[0m


[90;3m2024-07-24 14:33:50,579 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-312) window enabled: 463.0 to 600.0[0m


[90;3m2024-07-24 14:33:50,579 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[msetting timed terminal event at 600.0[0m


[90;3m2024-07-24 14:33:50,580 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mtarget index 9 tasked[0m


[90;3m2024-07-24 14:33:50,580 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-596) tasked for imaging[0m


[90;3m2024-07-24 14:33:50,581 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-596) window enabled: 637.6 to 840.5[0m


[90;3m2024-07-24 14:33:50,581 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[msetting timed terminal event at 840.5[0m


[90;3m2024-07-24 14:33:50,581 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mtarget index 8 tasked[0m


[90;3m2024-07-24 14:33:50,581 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-266) tasked for imaging[0m


[90;3m2024-07-24 14:33:50,582 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-266) window enabled: 183.2 to 391.1[0m


[90;3m2024-07-24 14:33:50,582 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[msetting timed terminal event at 391.1[0m


[90;3m2024-07-24 14:33:50,619 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<186.00> [0m[34mEO-3: [0m[mimaged Target(tgt-266)[0m


[90;3m2024-07-24 14:33:50,621 [0m[mdata.base                      [0m[mINFO       [0m[33m<186.00> [0m[mData reward: {'EO-1': 0.0, 'EO-2': 0.0, 'EO-3': 0.980810596718567}[0m


[90;3m2024-07-24 14:33:50,625 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<186.00> [0m[34mEO-3: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-07-24 14:33:50,648 [0m[mgym                            [0m[mINFO       [0m[33m<186.00> [0m[mSatellites requiring retasking: ['EO-3'][0m


[90;3m2024-07-24 14:33:50,648 [0m[mgym                            [0m[mINFO       [0m[33m<186.00> [0m[mStep reward: 0.980810596718567[0m


In [6]:
observation

(array([ 0.10266454, -0.03263158,  0.71533671, -0.00090465,  0.19540212,
         0.01484726,  0.55646603,  0.00134835,  0.85425056,  0.01822934,
         0.90008031,  0.03082943,  0.09468858,  0.03683628,  0.20711376,
         0.04860097,  0.7621766 ,  0.05569078,  0.28356637,  0.04676379]),
 array([ 3.16778664e-01, -1.55577221e-02,  3.13362512e-01, -4.51636228e-04,
         4.23607178e-01,  2.67257980e-02,  9.51511809e-01,  4.12349336e-02,
         3.77115401e-01,  9.30911051e-03,  5.05290328e-01,  2.01168371e-02,
         2.92767745e-01,  6.90253813e-02,  9.40201363e-01,  7.92329937e-02,
         8.79113048e-01,  1.07538683e-01,  3.07751315e-02,  1.18902085e-01]),
 array([ 1.29054501e-01, -1.86807584e-02,  4.57080002e-01,  6.81807648e-04,
         9.17148967e-01,  5.59774162e-03,  1.26501398e-01,  1.05559275e-02,
         8.53539237e-01,  3.68830475e-02,  8.78845414e-01,  6.58270845e-02,
         3.82020810e-01,  5.50301632e-02,  5.10273560e-01,  5.67723740e-02,
         9.94356149e

At this point, either every satellite can be retasked, or satellites can continue their
previous action by passing `None` as the action. To see which satellites must be
retasked (i.e. their previous action is done and they have nothing more to do), look at
`info["requires_retasking"]`.

In [7]:
info["requires_retasking"]

['EO-3']

Based on this list, we decide here to only retask the satellite that needs it.

In [8]:
actions = [None, None, None]
actions[int(info["requires_retasking"][0][3]) - 1] = 7
actions

[None, None, 7]

In [9]:
observation, reward, terminated, truncated, info = env.step(actions)

[90;3m2024-07-24 14:33:50,662 [0m[mgym                            [0m[mINFO       [0m[33m<186.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-07-24 14:33:50,662 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<186.00> [0m[34mEO-3: [0m[mtarget index 7 tasked[0m


[90;3m2024-07-24 14:33:50,662 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<186.00> [0m[34mEO-3: [0m[mTarget(tgt-577) tasked for imaging[0m


[90;3m2024-07-24 14:33:50,663 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<186.00> [0m[34mEO-3: [0m[mTarget(tgt-577) window enabled: 509.6 to 689.7[0m


[90;3m2024-07-24 14:33:50,663 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<186.00> [0m[34mEO-3: [0m[msetting timed terminal event at 689.7[0m


[90;3m2024-07-24 14:33:50,716 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<466.00> [0m[36mEO-1: [0m[mimaged Target(tgt-312)[0m


[90;3m2024-07-24 14:33:50,718 [0m[mdata.base                      [0m[mINFO       [0m[33m<466.00> [0m[mData reward: {'EO-1': 0.20711376136200743, 'EO-2': 0.0, 'EO-3': 0.0}[0m


[90;3m2024-07-24 14:33:50,721 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<466.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-07-24 14:33:50,746 [0m[mgym                            [0m[mINFO       [0m[33m<466.00> [0m[mSatellites requiring retasking: ['EO-1'][0m


[90;3m2024-07-24 14:33:50,746 [0m[mgym                            [0m[mINFO       [0m[33m<466.00> [0m[mStep reward: 0.20711376136200743[0m


In this environment, the environment will stop if any agent dies. To demonstrate this,
one satellite is forcibly killed.

In [10]:
from Basilisk.architecture import messaging

def isnt_alive(log_failure=False):
    """Mock satellite 0 dying."""
    self = env.unwrapped.satellites[0]
    death_message = messaging.PowerStorageStatusMsgPayload()
    death_message.storageLevel = 0.0
    self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
    return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
        log_failure=log_failure
    )

env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])


[90;3m2024-07-24 14:33:50,750 [0m[mgym                            [0m[mINFO       [0m[33m<466.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-07-24 14:33:50,750 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<466.00> [0m[36mEO-1: [0m[mtarget index 6 tasked[0m


[90;3m2024-07-24 14:33:50,750 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<466.00> [0m[36mEO-1: [0m[mTarget(tgt-678) tasked for imaging[0m


[90;3m2024-07-24 14:33:50,751 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<466.00> [0m[36mEO-1: [0m[mTarget(tgt-678) window enabled: 531.3 to 731.1[0m


[90;3m2024-07-24 14:33:50,751 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<466.00> [0m[36mEO-1: [0m[msetting timed terminal event at 731.1[0m


[90;3m2024-07-24 14:33:50,752 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<466.00> [0m[92mEO-2: [0m[mtarget index 7 tasked[0m


[90;3m2024-07-24 14:33:50,753 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<466.00> [0m[92mEO-2: [0m[mTarget(tgt-15) tasked for imaging[0m


[90;3m2024-07-24 14:33:50,753 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<466.00> [0m[92mEO-2: [0m[mTarget(tgt-15) window enabled: 912.2 to 1111.4[0m


[90;3m2024-07-24 14:33:50,753 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<466.00> [0m[92mEO-2: [0m[msetting timed terminal event at 1111.4[0m


[90;3m2024-07-24 14:33:50,754 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<466.00> [0m[34mEO-3: [0m[mtarget index 9 tasked[0m


[90;3m2024-07-24 14:33:50,754 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<466.00> [0m[34mEO-3: [0m[mTarget(tgt-407) tasked for imaging[0m


[90;3m2024-07-24 14:33:50,754 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<466.00> [0m[34mEO-3: [0m[mTarget(tgt-407) window enabled: 905.2 to 1071.3[0m


[90;3m2024-07-24 14:33:50,755 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<466.00> [0m[34mEO-3: [0m[msetting timed terminal event at 1071.3[0m


[90;3m2024-07-24 14:33:50,768 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<534.00> [0m[36mEO-1: [0m[mimaged Target(tgt-678)[0m


[90;3m2024-07-24 14:33:50,770 [0m[mdata.base                      [0m[mINFO       [0m[33m<534.00> [0m[mData reward: {'EO-1': 0.7742074134252116, 'EO-2': 0.0, 'EO-3': 0.0}[0m




[90;3m2024-07-24 14:33:50,773 [0m[mgym                            [0m[mINFO       [0m[33m<534.00> [0m[mStep reward: -0.2257925865747884[0m


[90;3m2024-07-24 14:33:50,773 [0m[mgym                            [0m[mINFO       [0m[33m<534.00> [0m[mEpisode terminated: True[0m


[90;3m2024-07-24 14:33:50,773 [0m[mgym                            [0m[mINFO       [0m[33m<534.00> [0m[mEpisode truncated: False[0m


## PettingZoo API

The [PettingZoo parallel API](https://pettingzoo.farama.org/api/parallel/) environment, 
ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their
documentation for a full description of the API. It tends to separate things into
dictionaries keyed by agent, rather than tuples.

In [11]:
from bsk_rl import ConstellationTasking

env = ConstellationTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    log_level="INFO",
)
env.reset()

env.observation_spaces



[90;3m2024-07-24 14:33:50,978 [0m[mgym                            [0m[mINFO       [0m[mResetting environment with seed=1015846936[0m


[90;3m2024-07-24 14:33:50,978 [0m[mscene.targets                  [0m[mINFO       [0m[mGenerating 1000 targets[0m


[90;3m2024-07-24 14:33:51,131 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-07-24 14:33:51,157 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-07-24 14:33:51,177 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-07-24 14:33:51,199 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-07-24 14:33:51,225 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[mSatellites requiring retasking: ['EO-1', 'EO-2', 'EO-3'][0m


[90;3m2024-07-24 14:33:51,226 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[mEnvironment reset[0m


{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
 'EO-2': Box(-1e+16, 1e+16, (20,), float64),
 'EO-3': Box(-1e+16, 1e+16, (20,), float64)}

In [12]:
env.action_spaces

{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}

Actions are passed as a dictionary; the agent names can be accessed through the `agents`
property.

In [13]:
observation, reward, terminated, truncated, info = env.step(
    {
        env.agents[0]: 7,
        env.agents[1]: 9,
        env.agents[2]: 8,
    }
)

[90;3m2024-07-24 14:33:51,234 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-07-24 14:33:51,235 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mtarget index 7 tasked[0m


[90;3m2024-07-24 14:33:51,235 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-510) tasked for imaging[0m


[90;3m2024-07-24 14:33:51,236 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-510) window enabled: 104.5 to 293.9[0m


[90;3m2024-07-24 14:33:51,236 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[msetting timed terminal event at 293.9[0m


[90;3m2024-07-24 14:33:51,237 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mtarget index 9 tasked[0m


[90;3m2024-07-24 14:33:51,237 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-789) tasked for imaging[0m


[90;3m2024-07-24 14:33:51,237 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-789) window enabled: 752.6 to 894.9[0m


[90;3m2024-07-24 14:33:51,238 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[msetting timed terminal event at 894.9[0m


[90;3m2024-07-24 14:33:51,238 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mtarget index 8 tasked[0m


[90;3m2024-07-24 14:33:51,238 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-172) tasked for imaging[0m


[90;3m2024-07-24 14:33:51,239 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-172) window enabled: 235.3 to 446.2[0m


[90;3m2024-07-24 14:33:51,239 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[msetting timed terminal event at 446.2[0m


[90;3m2024-07-24 14:33:51,260 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<107.00> [0m[36mEO-1: [0m[mimaged Target(tgt-510)[0m


[90;3m2024-07-24 14:33:51,262 [0m[mdata.base                      [0m[mINFO       [0m[33m<107.00> [0m[mData reward: {'EO-1': 0.3034631583157591, 'EO-2': 0.0, 'EO-3': 0.0}[0m


[90;3m2024-07-24 14:33:51,269 [0m[mgym                            [0m[mINFO       [0m[33m<107.00> [0m[mSatellites requiring retasking: ['EO-1'][0m


[90;3m2024-07-24 14:33:51,271 [0m[mgym                            [0m[mINFO       [0m[33m<107.00> [0m[mStep reward: {'EO-1': 0.3034631583157591, 'EO-2': 0.0, 'EO-3': 0.0}[0m


[90;3m2024-07-24 14:33:51,271 [0m[mgym                            [0m[mINFO       [0m[33m<107.00> [0m[mEpisode terminated: {'EO-1': False, 'EO-2': False, 'EO-3': False}[0m


[90;3m2024-07-24 14:33:51,271 [0m[mgym                            [0m[mINFO       [0m[33m<107.00> [0m[mEpisode truncated: {'EO-1': False, 'EO-2': False, 'EO-3': False}[0m


In [14]:
observation

{'EO-1': array([ 0.86171052, -0.01877193,  0.69303712, -0.0135859 ,  0.16790374,
        -0.01304364,  0.33385166,  0.00310774,  0.60102755,  0.00870104,
         0.52261101,  0.02888215,  0.76429951,  0.03593432,  0.07664155,
         0.02073054,  0.35430813,  0.0209795 ,  0.50231586,  0.03031671]),
 'EO-2': array([ 9.05180563e-01, -1.00595601e-02,  3.15065049e-01, -6.81128220e-04,
         4.73979926e-01,  2.36974139e-02,  5.53744788e-01,  5.02846313e-02,
         2.50217878e-01,  7.63636114e-02,  5.56602328e-01,  8.65133317e-02,
         8.99994371e-01,  8.70776618e-02,  3.28329506e-02,  9.65382802e-02,
         4.08069922e-01,  1.13261186e-01,  5.48363445e-01,  1.05149522e-01]),
 'EO-3': array([ 0.11909024, -0.01877193,  0.97950646, -0.01877193,  0.82196041,
        -0.01155826,  0.1922363 ,  0.01060359,  0.78279305,  0.00933406,
         0.21389242,  0.02936151,  0.20356198,  0.02251592,  0.57044547,
         0.04668394,  0.183822  ,  0.06173636,  0.98970262,  0.03334367])}

Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API
is that it allows for individual agents to fail without terminating the entire environment.

In [15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents

['EO-2', 'EO-3']

In [16]:
observation, reward, terminated, truncated, info = env.step({
        env.agents[0]: 7,
        env.agents[1]: 9,
    }
)

[90;3m2024-07-24 14:33:51,282 [0m[mgym                            [0m[mINFO       [0m[33m<107.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-07-24 14:33:51,283 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<107.00> [0m[92mEO-2: [0m[mtarget index 7 tasked[0m


[90;3m2024-07-24 14:33:51,283 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<107.00> [0m[92mEO-2: [0m[mTarget(tgt-349) tasked for imaging[0m


[90;3m2024-07-24 14:33:51,284 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<107.00> [0m[92mEO-2: [0m[mTarget(tgt-349) window enabled: 657.3 to 826.2[0m


[90;3m2024-07-24 14:33:51,284 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<107.00> [0m[92mEO-2: [0m[msetting timed terminal event at 826.2[0m


[90;3m2024-07-24 14:33:51,284 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<107.00> [0m[34mEO-3: [0m[mtarget index 9 tasked[0m


[90;3m2024-07-24 14:33:51,285 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<107.00> [0m[34mEO-3: [0m[mTarget(tgt-568) tasked for imaging[0m


[90;3m2024-07-24 14:33:51,285 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<107.00> [0m[34mEO-3: [0m[mTarget(tgt-568) window enabled: 297.1 to 507.6[0m


[90;3m2024-07-24 14:33:51,285 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<107.00> [0m[34mEO-3: [0m[msetting timed terminal event at 507.6[0m


[90;3m2024-07-24 14:33:51,320 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<294.00> [0m[36mEO-1: [0m[mtimed termination at 293.9 for Target(tgt-510) window[0m


[90;3m2024-07-24 14:33:51,321 [0m[mdata.base                      [0m[mINFO       [0m[33m<294.00> [0m[mData reward: {'EO-1': 0.0, 'EO-2': 0.0, 'EO-3': 0.0}[0m


[90;3m2024-07-24 14:33:51,330 [0m[mgym                            [0m[mINFO       [0m[33m<294.00> [0m[mStep reward: {'EO-2': 0.0, 'EO-3': 0.0}[0m


[90;3m2024-07-24 14:33:51,330 [0m[mgym                            [0m[mINFO       [0m[33m<294.00> [0m[mEpisode terminated: {'EO-2': False, 'EO-3': False}[0m


[90;3m2024-07-24 14:33:51,330 [0m[mgym                            [0m[mINFO       [0m[33m<294.00> [0m[mEpisode truncated: {'EO-2': False, 'EO-3': False}[0m
