# Multi-Agent Environments

Two multiagent environments are given in the package:

* [GeneralSatelliteTasking](../api_reference/index.rst#bsk_rl.GeneralSatelliteTasking), 
  a [Gymnasium](https://gymnasium.farama.org)-based environment and the basis for all other environments.
* [ConstellationTasking](../api_reference/index.rst#bsk_rl.ConstellationTasking), which
  implements the [PettingZoo parallel API](https://pettingzoo.farama.org/api/parallel/).

The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed
for this kind of API.

## Configuring the Environment

For this example, a multisatellite target imaging environment will be used. The goal is
to maximize the value of unique images taken.

As usual, the satellite type is defined first.

In [1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw

class ImagingSatellite(sats.ImagingSatellite):
    observation_spec = [
        obs.OpportunityProperties(
            dict(prop="priority"), 
            dict(prop="opportunity_open", norm=5700.0),
            n_ahead_observe=10,
        )
    ]
    action_spec = [act.Image(n_ahead_image=10)]
    dyn_type = dyn.FullFeaturedDynModel
    fsw_type = fsw.SteeringImagerFSWModel

Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a ``sat_arg_randomizer`` is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.

In [2]:

from bsk_rl.utils.orbital import walker_delta_args

sat_args = dict(
    imageAttErrorRequirement=0.01,
    imageRateErrorRequirement=0.01,
    batteryStorageCapacity=1e9,
    storedCharge_Init=1e9,
    dataStorageCapacity=1e12,
    u_max=0.4,
    K1=0.25,
    K3=3.0,
    omega_max=0.087,
    servo_Ki=5.0,
    servo_P=150 / 5,
)
sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)

## Gym API

GeneralSatelliteTasking uses tuples of actions and observations to interact with the
environment.

In [3]:
from bsk_rl import GeneralSatelliteTasking

env = GeneralSatelliteTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_space

[90;3m2024-07-30 12:40:44,314 [0m[mgym                            [0m[mINFO       [0m[mResetting environment with seed=2142958712[0m


[90;3m2024-07-30 12:40:44,316 [0m[mscene.targets                  [0m[mINFO       [0m[mGenerating 1000 targets[0m


[90;3m2024-07-30 12:40:44,474 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-07-30 12:40:44,499 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-07-30 12:40:44,521 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-07-30 12:40:44,545 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[mEnvironment reset[0m


Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))

In [4]:
env.action_space

Tuple(Discrete(10), Discrete(10), Discrete(10))

Consequently, actions are passed as a tuple. The step will stop the first time any
satellite completes an action.

In [5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])

[90;3m2024-07-30 12:40:44,555 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-07-30 12:40:44,555 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mtarget index 7 tasked[0m


[90;3m2024-07-30 12:40:44,555 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-89) tasked for imaging[0m


[90;3m2024-07-30 12:40:44,556 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-89) window enabled: 419.6 to 599.2[0m


[90;3m2024-07-30 12:40:44,556 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[msetting timed terminal event at 599.2[0m


[90;3m2024-07-30 12:40:44,557 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mtarget index 9 tasked[0m


[90;3m2024-07-30 12:40:44,557 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-681) tasked for imaging[0m


[90;3m2024-07-30 12:40:44,558 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-681) window enabled: 312.2 to 518.0[0m


[90;3m2024-07-30 12:40:44,558 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[msetting timed terminal event at 518.0[0m


[90;3m2024-07-30 12:40:44,558 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mtarget index 8 tasked[0m


[90;3m2024-07-30 12:40:44,558 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-552) tasked for imaging[0m


[90;3m2024-07-30 12:40:44,559 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-552) window enabled: 276.1 to 415.0[0m


[90;3m2024-07-30 12:40:44,559 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[msetting timed terminal event at 415.0[0m


[90;3m2024-07-30 12:40:44,612 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<279.00> [0m[34mEO-3: [0m[mimaged Target(tgt-552)[0m


[90;3m2024-07-30 12:40:44,614 [0m[mdata.base                      [0m[mINFO       [0m[33m<279.00> [0m[mData reward: {'EO-1': 0.0, 'EO-2': 0.0, 'EO-3': 0.23859949615113163}[0m


[90;3m2024-07-30 12:40:44,617 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<279.00> [0m[34mEO-3: [0m[mSatellite EO-3 requires retasking[0m


[90;3m2024-07-30 12:40:44,617 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<279.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-07-30 12:40:44,644 [0m[mgym                            [0m[mINFO       [0m[33m<279.00> [0m[mStep reward: 0.23859949615113163[0m


In [6]:
observation

(array([1.37631046e-01, 8.25957726e-04, 9.93327017e-01, 2.59839112e-03,
        4.43327632e-02, 1.09387039e-02, 2.99336879e-01, 1.87863675e-02,
        9.55771152e-01, 3.53284555e-02, 6.71606423e-01, 2.46697393e-02,
        3.00129423e-02, 3.45903777e-02, 1.16838470e-01, 3.83034528e-02,
        3.01878533e-01, 2.34009694e-02, 3.32968212e-01, 5.68963001e-02]),
 array([ 0.84520705, -0.02545068,  0.99367782, -0.01802423,  0.5491786 ,
        -0.00141144,  0.14200417, -0.00838391,  0.19905633, -0.00483771,
         0.59727657,  0.00582492,  0.96702527,  0.00627942,  0.3466027 ,
         0.01304223,  0.70516972,  0.00876351,  0.75052959,  0.0128825 ]),
 array([ 0.42470981,  0.01601218,  0.79412353, -0.00267924,  0.33689931,
        -0.00608316,  0.59228585, -0.00758756,  0.14755157,  0.01264897,
         0.10799259,  0.02166538,  0.86361365,  0.03694236,  0.89680041,
         0.05338038,  0.18554625,  0.03314139,  0.7880328 ,  0.03213432]))

At this point, either every satellite can be retasked, or satellites can continue their
previous action by passing `None` as the action. To see which satellites must be
retasked (i.e. their previous action is done and they have nothing more to do), look at
`"requires_retasking"` in each satellite's info.

In [7]:
info

{'EO-1': {'requires_retasking': False},
 'EO-2': {'requires_retasking': False},
 'EO-3': {'requires_retasking': True},
 'd_ts': 279.0}

Based on this list, we decide here to only retask the satellite that needs it.

In [8]:
actions = [0 if info[sat.id]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions

[None, None, 0]

In [9]:
observation, reward, terminated, truncated, info = env.step(actions)

[90;3m2024-07-30 12:40:44,657 [0m[mgym                            [0m[mINFO       [0m[33m<279.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-07-30 12:40:44,658 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<279.00> [0m[34mEO-3: [0m[mtarget index 0 tasked[0m


[90;3m2024-07-30 12:40:44,658 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<279.00> [0m[34mEO-3: [0m[mTarget(tgt-157) tasked for imaging[0m


[90;3m2024-07-30 12:40:44,659 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<279.00> [0m[34mEO-3: [0m[mTarget(tgt-157) window enabled: 370.3 to 398.6[0m


[90;3m2024-07-30 12:40:44,659 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<279.00> [0m[34mEO-3: [0m[msetting timed terminal event at 398.6[0m


[90;3m2024-07-30 12:40:44,667 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<315.00> [0m[92mEO-2: [0m[mimaged Target(tgt-681)[0m


[90;3m2024-07-30 12:40:44,668 [0m[mdata.base                      [0m[mINFO       [0m[33m<315.00> [0m[mData reward: {'EO-1': 0.0, 'EO-2': 0.5972765722074374, 'EO-3': 0.0}[0m


[90;3m2024-07-30 12:40:44,669 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<315.00> [0m[92mEO-2: [0m[mSatellite EO-2 requires retasking[0m


[90;3m2024-07-30 12:40:44,671 [0m[mgym                            [0m[mINFO       [0m[33m<315.00> [0m[mStep reward: 0.5972765722074374[0m


In this environment, the environment will stop if any agent dies. To demonstrate this,
one satellite is forcibly killed.

In [10]:
from Basilisk.architecture import messaging

def isnt_alive(log_failure=False):
    """Mock satellite 0 dying."""
    self = env.unwrapped.satellites[0]
    death_message = messaging.PowerStorageStatusMsgPayload()
    death_message.storageLevel = 0.0
    self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
    return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
        log_failure=log_failure
    )

env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])


[90;3m2024-07-30 12:40:44,675 [0m[mgym                            [0m[mINFO       [0m[33m<315.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-07-30 12:40:44,675 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<315.00> [0m[36mEO-1: [0m[mtarget index 6 tasked[0m


[90;3m2024-07-30 12:40:44,676 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<315.00> [0m[36mEO-1: [0m[mTarget(tgt-81) tasked for imaging[0m


[90;3m2024-07-30 12:40:44,676 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<315.00> [0m[36mEO-1: [0m[mTarget(tgt-81) window enabled: 476.2 to 637.2[0m


[90;3m2024-07-30 12:40:44,677 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<315.00> [0m[36mEO-1: [0m[msetting timed terminal event at 637.2[0m


[90;3m2024-07-30 12:40:44,677 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<315.00> [0m[92mEO-2: [0m[mtarget index 7 tasked[0m


[90;3m2024-07-30 12:40:44,677 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<315.00> [0m[92mEO-2: [0m[mTarget(tgt-571) tasked for imaging[0m


[90;3m2024-07-30 12:40:44,678 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<315.00> [0m[92mEO-2: [0m[mTarget(tgt-571) window enabled: 329.0 to 538.4[0m


[90;3m2024-07-30 12:40:44,678 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<315.00> [0m[92mEO-2: [0m[msetting timed terminal event at 538.4[0m


[90;3m2024-07-30 12:40:44,678 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<315.00> [0m[34mEO-3: [0m[mtarget index 9 tasked[0m


[90;3m2024-07-30 12:40:44,679 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<315.00> [0m[34mEO-3: [0m[mTarget(tgt-981) tasked for imaging[0m


[90;3m2024-07-30 12:40:44,679 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<315.00> [0m[34mEO-3: [0m[mTarget(tgt-981) window enabled: 462.2 to 600.0[0m


[90;3m2024-07-30 12:40:44,680 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<315.00> [0m[34mEO-3: [0m[msetting timed terminal event at 600.0[0m


[90;3m2024-07-30 12:40:44,685 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<338.00> [0m[92mEO-2: [0m[mimaged Target(tgt-571)[0m


[90;3m2024-07-30 12:40:44,687 [0m[mdata.base                      [0m[mINFO       [0m[33m<338.00> [0m[mData reward: {'EO-1': 0.0, 'EO-2': 0.7051697151785014, 'EO-3': 0.0}[0m


[90;3m2024-07-30 12:40:44,688 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<338.00> [0m[92mEO-2: [0m[mSatellite EO-2 requires retasking[0m




[90;3m2024-07-30 12:40:44,689 [0m[mgym                            [0m[mINFO       [0m[33m<338.00> [0m[mStep reward: -0.29483028482149864[0m


[90;3m2024-07-30 12:40:44,689 [0m[mgym                            [0m[mINFO       [0m[33m<338.00> [0m[mEpisode terminated: True[0m


[90;3m2024-07-30 12:40:44,690 [0m[mgym                            [0m[mINFO       [0m[33m<338.00> [0m[mEpisode truncated: False[0m


## PettingZoo API

The [PettingZoo parallel API](https://pettingzoo.farama.org/api/parallel/) environment, 
ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their
documentation for a full description of the API. It tends to separate things into
dictionaries keyed by agent, rather than tuples.

In [11]:
from bsk_rl import ConstellationTasking

env = ConstellationTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_spaces



[90;3m2024-07-30 12:40:44,914 [0m[mgym                            [0m[mINFO       [0m[mResetting environment with seed=2742850712[0m


[90;3m2024-07-30 12:40:44,916 [0m[mscene.targets                  [0m[mINFO       [0m[mGenerating 1000 targets[0m


[90;3m2024-07-30 12:40:45,060 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-07-30 12:40:45,082 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-07-30 12:40:45,100 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-07-30 12:40:45,124 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[mEnvironment reset[0m


{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
 'EO-2': Box(-1e+16, 1e+16, (20,), float64),
 'EO-3': Box(-1e+16, 1e+16, (20,), float64)}

In [12]:
env.action_spaces

{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}

Actions are passed as a dictionary; the agent names can be accessed through the `agents`
property.

In [13]:
observation, reward, terminated, truncated, info = env.step(
    {
        env.agents[0]: 7,
        env.agents[1]: 9,
        env.agents[2]: 8,
    }
)

[90;3m2024-07-30 12:40:45,133 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-07-30 12:40:45,133 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mtarget index 7 tasked[0m


[90;3m2024-07-30 12:40:45,134 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-866) tasked for imaging[0m


[90;3m2024-07-30 12:40:45,134 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-866) window enabled: 202.9 to 393.3[0m


[90;3m2024-07-30 12:40:45,134 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[msetting timed terminal event at 393.3[0m


[90;3m2024-07-30 12:40:45,135 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mtarget index 9 tasked[0m


[90;3m2024-07-30 12:40:45,135 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-458) tasked for imaging[0m


[90;3m2024-07-30 12:40:45,136 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-458) window enabled: 210.1 to 397.8[0m


[90;3m2024-07-30 12:40:45,136 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[msetting timed terminal event at 397.8[0m


[90;3m2024-07-30 12:40:45,136 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mtarget index 8 tasked[0m


[90;3m2024-07-30 12:40:45,136 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-135) tasked for imaging[0m


[90;3m2024-07-30 12:40:45,137 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-135) window enabled: 449.2 to 600.0[0m


[90;3m2024-07-30 12:40:45,137 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[msetting timed terminal event at 600.0[0m


[90;3m2024-07-30 12:40:45,176 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<205.00> [0m[36mEO-1: [0m[mimaged Target(tgt-866)[0m


[90;3m2024-07-30 12:40:45,178 [0m[mdata.base                      [0m[mINFO       [0m[33m<205.00> [0m[mData reward: {'EO-1': 0.6059547080333733, 'EO-2': 0.0, 'EO-3': 0.0}[0m


[90;3m2024-07-30 12:40:45,181 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<205.00> [0m[36mEO-1: [0m[mSatellite EO-1 requires retasking[0m


[90;3m2024-07-30 12:40:45,181 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<205.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-07-30 12:40:45,206 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<205.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-07-30 12:40:45,230 [0m[mgym                            [0m[mINFO       [0m[33m<205.00> [0m[mStep reward: {'EO-1': 0.6059547080333733, 'EO-2': 0.0, 'EO-3': 0.0}[0m


[90;3m2024-07-30 12:40:45,230 [0m[mgym                            [0m[mINFO       [0m[33m<205.00> [0m[mEpisode terminated: {'EO-1': False, 'EO-2': False, 'EO-3': False}[0m


[90;3m2024-07-30 12:40:45,230 [0m[mgym                            [0m[mINFO       [0m[33m<205.00> [0m[mEpisode truncated: {'EO-1': False, 'EO-2': False, 'EO-3': False}[0m


In [14]:
observation

{'EO-1': array([ 0.76704749, -0.0236747 ,  0.37036486, -0.01564846,  0.96033236,
        -0.00559059,  0.01692447,  0.02981084,  0.90346699,  0.02030197,
         0.09099261,  0.00716401,  0.92480587,  0.02234665,  0.4189473 ,
         0.05915995,  0.62334272,  0.04972604,  0.22471615,  0.07475612]),
 'EO-2': array([ 1.11773252e-01, -1.77200346e-02,  2.99998923e-01,  3.38032121e-03,
         1.22763412e-02, -1.01367347e-02,  4.88912244e-01, -4.49631874e-04,
         6.51346560e-02,  8.95856042e-04,  8.15744721e-01,  2.48302837e-02,
         3.48180019e-02,  6.11291747e-02,  4.46263450e-01,  1.09503861e-01,
         1.65106055e-02,  1.16442466e-01,  6.11972032e-01,  1.37152888e-01]),
 'EO-3': array([ 0.79333587, -0.02478445,  0.53035796,  0.00819361,  0.30417879,
        -0.00496339,  0.61571795,  0.02761785,  0.20991683,  0.0316429 ,
         0.26797122,  0.03420781,  0.09587086,  0.04283784,  0.52230839,
         0.05943068,  0.43743559,  0.05048627,  0.2393317 ,  0.05307425])}

Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API
is that it allows for individual agents to fail without terminating the entire environment.

In [15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents

['EO-2', 'EO-3']

In [16]:
observation, reward, terminated, truncated, info = env.step({
        env.agents[0]: 7,
        env.agents[1]: 9,
    }
)

[90;3m2024-07-30 12:40:45,241 [0m[mgym                            [0m[mINFO       [0m[33m<205.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-07-30 12:40:45,242 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<205.00> [0m[92mEO-2: [0m[mtarget index 7 tasked[0m


[90;3m2024-07-30 12:40:45,242 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<205.00> [0m[92mEO-2: [0m[mTarget(tgt-332) tasked for imaging[0m


[90;3m2024-07-30 12:40:45,243 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<205.00> [0m[92mEO-2: [0m[mTarget(tgt-332) window enabled: 829.2 to 1032.0[0m


[90;3m2024-07-30 12:40:45,243 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<205.00> [0m[92mEO-2: [0m[msetting timed terminal event at 1032.0[0m


[90;3m2024-07-30 12:40:45,243 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<205.00> [0m[34mEO-3: [0m[mtarget index 9 tasked[0m


[90;3m2024-07-30 12:40:45,244 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<205.00> [0m[34mEO-3: [0m[mTarget(tgt-713) tasked for imaging[0m


[90;3m2024-07-30 12:40:45,244 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<205.00> [0m[34mEO-3: [0m[mTarget(tgt-713) window enabled: 507.5 to 600.0[0m


[90;3m2024-07-30 12:40:45,245 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<205.00> [0m[34mEO-3: [0m[msetting timed terminal event at 600.0[0m


[90;3m2024-07-30 12:40:45,280 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<394.00> [0m[36mEO-1: [0m[mtimed termination at 393.3 for Target(tgt-866) window[0m


[90;3m2024-07-30 12:40:45,282 [0m[mdata.base                      [0m[mINFO       [0m[33m<394.00> [0m[mData reward: {'EO-1': 0.0, 'EO-2': 0.0, 'EO-3': 0.0}[0m


[90;3m2024-07-30 12:40:45,284 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<394.00> [0m[36mEO-1: [0m[mSatellite EO-1 requires retasking[0m


[90;3m2024-07-30 12:40:45,285 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<394.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 1200.00 to 1800.00 seconds[0m


[90;3m2024-07-30 12:40:45,304 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<394.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 1800.00 to 2400.00 seconds[0m


[90;3m2024-07-30 12:40:45,330 [0m[mgym                            [0m[mINFO       [0m[33m<394.00> [0m[mStep reward: {'EO-2': 0.0, 'EO-3': 0.0}[0m


[90;3m2024-07-30 12:40:45,330 [0m[mgym                            [0m[mINFO       [0m[33m<394.00> [0m[mEpisode terminated: {'EO-2': False, 'EO-3': False}[0m


[90;3m2024-07-30 12:40:45,330 [0m[mgym                            [0m[mINFO       [0m[33m<394.00> [0m[mEpisode truncated: {'EO-2': False, 'EO-3': False}[0m
