# Multi-Agent Environments

Two multiagent environments are given in the package:

* [GeneralSatelliteTasking](../api_reference/index.rst#bsk_rl.GeneralSatelliteTasking), 
  a [Gymnasium](https://gymnasium.farama.org)-based environment and the basis for all other environments.
* [ConstellationTasking](../api_reference/index.rst#bsk_rl.ConstellationTasking), which
  implements the [PettingZoo parallel API](https://pettingzoo.farama.org/api/parallel/).

The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed
for this kind of API.

## Configuring the Environment

For this example, a multisatellite target imaging environment will be used. The goal is
to maximize the value of unique images taken.

As usual, the satellite type is defined first.

In [1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw

class ImagingSatellite(sats.ImagingSatellite):
    observation_spec = [
        obs.OpportunityProperties(
            dict(prop="priority"), 
            dict(prop="opportunity_open", norm=5700.0),
            n_ahead_observe=10,
        )
    ]
    action_spec = [act.Image(n_ahead_image=10)]
    dyn_type = dyn.FullFeaturedDynModel
    fsw_type = fsw.SteeringImagerFSWModel

Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a ``sat_arg_randomizer`` is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.

In [2]:

from bsk_rl.utils.orbital import walker_delta_args

sat_args = dict(
    imageAttErrorRequirement=0.01,
    imageRateErrorRequirement=0.01,
    batteryStorageCapacity=1e9,
    storedCharge_Init=1e9,
    dataStorageCapacity=1e12,
    u_max=0.4,
    K1=0.25,
    K3=3.0,
    omega_max=0.087,
    servo_Ki=5.0,
    servo_P=150 / 5,
)
sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)

## Gym API

GeneralSatelliteTasking uses tuples of actions and observations to interact with the
environment.

In [3]:
from bsk_rl import GeneralSatelliteTasking

env = GeneralSatelliteTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_space

[90;3m2024-09-11 09:20:15,357 [0m[mgym                            [0m[mINFO       [0m[mResetting environment with seed=25987008[0m


[90;3m2024-09-11 09:20:15,358 [0m[mscene.targets                  [0m[mINFO       [0m[mGenerating 1000 targets[0m


[90;3m2024-09-11 09:20:15,526 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-09-11 09:20:15,550 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-09-11 09:20:15,571 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-09-11 09:20:15,593 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-09-11 09:20:15,614 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[mEnvironment reset[0m


Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))

In [4]:
env.action_space

Tuple(Discrete(10), Discrete(10), Discrete(10))

Consequently, actions are passed as a tuple. The step will stop the first time any
satellite completes an action.

In [5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])

[90;3m2024-09-11 09:20:15,624 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-09-11 09:20:15,624 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mtarget index 7 tasked[0m


[90;3m2024-09-11 09:20:15,624 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-167) tasked for imaging[0m


[90;3m2024-09-11 09:20:15,625 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-167) window enabled: 381.4 to 558.0[0m


[90;3m2024-09-11 09:20:15,625 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[msetting timed terminal event at 558.0[0m


[90;3m2024-09-11 09:20:15,626 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mtarget index 9 tasked[0m


[90;3m2024-09-11 09:20:15,626 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-268) tasked for imaging[0m


[90;3m2024-09-11 09:20:15,627 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-268) window enabled: 603.4 to 803.3[0m


[90;3m2024-09-11 09:20:15,627 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[msetting timed terminal event at 803.3[0m


[90;3m2024-09-11 09:20:15,627 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mtarget index 8 tasked[0m


[90;3m2024-09-11 09:20:15,627 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-401) tasked for imaging[0m


[90;3m2024-09-11 09:20:15,628 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-401) window enabled: 408.8 to 591.8[0m


[90;3m2024-09-11 09:20:15,628 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[msetting timed terminal event at 591.8[0m


[90;3m2024-09-11 09:20:15,701 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<384.00> [0m[36mEO-1: [0m[mimaged Target(tgt-167)[0m


[90;3m2024-09-11 09:20:15,703 [0m[mdata.base                      [0m[mINFO       [0m[33m<384.00> [0m[mData reward: {'EO-1': 0.5381284648556194, 'EO-2': 0.0, 'EO-3': 0.0}[0m


[90;3m2024-09-11 09:20:15,707 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<384.00> [0m[36mEO-1: [0m[mSatellite EO-1 requires retasking[0m


[90;3m2024-09-11 09:20:15,707 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<384.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-09-11 09:20:15,731 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<384.00> [0m[34mEO-3: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-09-11 09:20:15,758 [0m[mgym                            [0m[mINFO       [0m[33m<384.00> [0m[mStep reward: 0.5381284648556194[0m


In [6]:
observation

(array([ 0.92439015, -0.0258305 ,  0.9758071 ,  0.03205828,  0.2907066 ,
         0.02643022,  0.07559279,  0.00374157,  0.17020381,  0.03158787,
         0.21351657,  0.02134166,  0.76973513,  0.04221848,  0.55815477,
         0.04413613,  0.51560163,  0.06060402,  0.59192149,  0.08233334]),
 array([ 0.06597468, -0.0132296 ,  0.78754443,  0.00767855,  0.5007186 ,
         0.01971264,  0.2153354 ,  0.02249443,  0.90520448,  0.03848525,
         0.97152065,  0.0592632 ,  0.98524112,  0.08112803,  0.71986469,
         0.08616652,  0.36460718,  0.09334229,  0.05876791,  0.12279916]),
 array([ 0.82723417, -0.02825309,  0.3590622 , -0.00395114,  0.3978725 ,
         0.01790092,  0.95391482,  0.00435569,  0.11215957,  0.02528452,
         0.06143226,  0.00967175,  0.77140286,  0.00815342,  0.47681183,
         0.05696725,  0.58880075,  0.05870218,  0.86364981,  0.07943668]))

At this point, either every satellite can be retasked, or satellites can continue their
previous action by passing `None` as the action. To see which satellites must be
retasked (i.e. their previous action is done and they have nothing more to do), look at
`"requires_retasking"` in each satellite's info.

In [7]:
info

{'EO-1': {'requires_retasking': True},
 'EO-2': {'requires_retasking': False},
 'EO-3': {'requires_retasking': False},
 'd_ts': 384.0}

Based on this list, we decide here to only retask the satellite that needs it.

In [8]:
actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions

[0, None, None]

In [9]:
observation, reward, terminated, truncated, info = env.step(actions)

[90;3m2024-09-11 09:20:15,772 [0m[mgym                            [0m[mINFO       [0m[33m<384.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-09-11 09:20:15,772 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<384.00> [0m[36mEO-1: [0m[mtarget index 0 tasked[0m


[90;3m2024-09-11 09:20:15,772 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<384.00> [0m[36mEO-1: [0m[mTarget(tgt-816) tasked for imaging[0m


[90;3m2024-09-11 09:20:15,773 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<384.00> [0m[36mEO-1: [0m[mTarget(tgt-816) window enabled: 236.8 to 405.6[0m


[90;3m2024-09-11 09:20:15,773 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<384.00> [0m[36mEO-1: [0m[msetting timed terminal event at 405.6[0m


[90;3m2024-09-11 09:20:15,778 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<406.00> [0m[36mEO-1: [0m[mtimed termination at 405.6 for Target(tgt-816) window[0m


[90;3m2024-09-11 09:20:15,780 [0m[mdata.base                      [0m[mINFO       [0m[33m<406.00> [0m[mData reward: {'EO-1': 0.0, 'EO-2': 0.0, 'EO-3': 0.0}[0m


[90;3m2024-09-11 09:20:15,781 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<406.00> [0m[36mEO-1: [0m[mSatellite EO-1 requires retasking[0m


[90;3m2024-09-11 09:20:15,782 [0m[mgym                            [0m[mINFO       [0m[33m<406.00> [0m[mStep reward: 0.0[0m


In this environment, the environment will stop if any agent dies. To demonstrate this,
one satellite is forcibly killed.

In [10]:
from Basilisk.architecture import messaging

def isnt_alive(log_failure=False):
    """Mock satellite 0 dying."""
    self = env.unwrapped.satellites[0]
    death_message = messaging.PowerStorageStatusMsgPayload()
    death_message.storageLevel = 0.0
    self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
    return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
        log_failure=log_failure
    )

env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])


[90;3m2024-09-11 09:20:15,786 [0m[mgym                            [0m[mINFO       [0m[33m<406.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-09-11 09:20:15,786 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<406.00> [0m[36mEO-1: [0m[mtarget index 6 tasked[0m


[90;3m2024-09-11 09:20:15,787 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<406.00> [0m[36mEO-1: [0m[mTarget(tgt-24) tasked for imaging[0m


[90;3m2024-09-11 09:20:15,787 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<406.00> [0m[36mEO-1: [0m[mTarget(tgt-24) window enabled: 635.6 to 841.9[0m


[90;3m2024-09-11 09:20:15,787 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<406.00> [0m[36mEO-1: [0m[msetting timed terminal event at 841.9[0m


[90;3m2024-09-11 09:20:15,788 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<406.00> [0m[92mEO-2: [0m[mtarget index 7 tasked[0m


[90;3m2024-09-11 09:20:15,788 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<406.00> [0m[92mEO-2: [0m[mTarget(tgt-324) tasked for imaging[0m


[90;3m2024-09-11 09:20:15,789 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<406.00> [0m[92mEO-2: [0m[mTarget(tgt-324) window enabled: 875.1 to 1080.7[0m


[90;3m2024-09-11 09:20:15,789 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<406.00> [0m[92mEO-2: [0m[msetting timed terminal event at 1080.7[0m


[90;3m2024-09-11 09:20:15,789 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<406.00> [0m[34mEO-3: [0m[mtarget index 9 tasked[0m


[90;3m2024-09-11 09:20:15,790 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<406.00> [0m[34mEO-3: [0m[mTarget(tgt-971) tasked for imaging[0m


[90;3m2024-09-11 09:20:15,790 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<406.00> [0m[34mEO-3: [0m[mTarget(tgt-971) window enabled: 869.8 to 1040.0[0m


[90;3m2024-09-11 09:20:15,790 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<406.00> [0m[34mEO-3: [0m[msetting timed terminal event at 1040.0[0m


[90;3m2024-09-11 09:20:15,835 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<638.00> [0m[36mEO-1: [0m[mimaged Target(tgt-24)[0m


[90;3m2024-09-11 09:20:15,837 [0m[mdata.base                      [0m[mINFO       [0m[33m<638.00> [0m[mData reward: {'EO-1': 0.5581547655786475, 'EO-2': 0.0, 'EO-3': 0.0}[0m


[90;3m2024-09-11 09:20:15,840 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<638.00> [0m[36mEO-1: [0m[mSatellite EO-1 requires retasking[0m




[90;3m2024-09-11 09:20:15,841 [0m[mgym                            [0m[mINFO       [0m[33m<638.00> [0m[mStep reward: -0.4418452344213525[0m


[90;3m2024-09-11 09:20:15,841 [0m[mgym                            [0m[mINFO       [0m[33m<638.00> [0m[mEpisode terminated: True[0m


[90;3m2024-09-11 09:20:15,842 [0m[mgym                            [0m[mINFO       [0m[33m<638.00> [0m[mEpisode truncated: False[0m


## PettingZoo API

The [PettingZoo parallel API](https://pettingzoo.farama.org/api/parallel/) environment, 
ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their
documentation for a full description of the API. It tends to separate things into
dictionaries keyed by agent, rather than tuples.

In [11]:
from bsk_rl import ConstellationTasking

env = ConstellationTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_spaces



[90;3m2024-09-11 09:20:16,030 [0m[mgym                            [0m[mINFO       [0m[mResetting environment with seed=698964008[0m


[90;3m2024-09-11 09:20:16,031 [0m[mscene.targets                  [0m[mINFO       [0m[mGenerating 1000 targets[0m


[90;3m2024-09-11 09:20:16,180 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-09-11 09:20:16,204 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-09-11 09:20:16,224 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-09-11 09:20:16,250 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[mEnvironment reset[0m


{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
 'EO-2': Box(-1e+16, 1e+16, (20,), float64),
 'EO-3': Box(-1e+16, 1e+16, (20,), float64)}

In [12]:
env.action_spaces

{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}

Actions are passed as a dictionary; the agent names can be accessed through the `agents`
property.

In [13]:
observation, reward, terminated, truncated, info = env.step(
    {
        env.agents[0]: 7,
        env.agents[1]: 9,
        env.agents[2]: 8,
    }
)

[90;3m2024-09-11 09:20:16,258 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-09-11 09:20:16,259 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mtarget index 7 tasked[0m


[90;3m2024-09-11 09:20:16,259 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-787) tasked for imaging[0m


[90;3m2024-09-11 09:20:16,259 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-787) window enabled: 417.3 to 597.4[0m


[90;3m2024-09-11 09:20:16,260 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[msetting timed terminal event at 597.4[0m


[90;3m2024-09-11 09:20:16,260 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mtarget index 9 tasked[0m


[90;3m2024-09-11 09:20:16,260 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-748) tasked for imaging[0m


[90;3m2024-09-11 09:20:16,261 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-748) window enabled: 379.2 to 571.6[0m


[90;3m2024-09-11 09:20:16,261 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[msetting timed terminal event at 571.6[0m


[90;3m2024-09-11 09:20:16,261 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mtarget index 8 tasked[0m


[90;3m2024-09-11 09:20:16,262 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-894) tasked for imaging[0m


[90;3m2024-09-11 09:20:16,262 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-894) window enabled: 343.3 to 480.0[0m


[90;3m2024-09-11 09:20:16,262 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[msetting timed terminal event at 480.0[0m


[90;3m2024-09-11 09:20:16,328 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<346.00> [0m[34mEO-3: [0m[mimaged Target(tgt-894)[0m


[90;3m2024-09-11 09:20:16,330 [0m[mdata.base                      [0m[mINFO       [0m[33m<346.00> [0m[mData reward: {'EO-1': 0.0, 'EO-2': 0.0, 'EO-3': 0.9497331569987684}[0m


[90;3m2024-09-11 09:20:16,333 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<346.00> [0m[34mEO-3: [0m[mSatellite EO-3 requires retasking[0m


[90;3m2024-09-11 09:20:16,334 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<346.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-09-11 09:20:16,357 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<346.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-09-11 09:20:16,385 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<346.00> [0m[34mEO-3: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-09-11 09:20:16,412 [0m[mgym                            [0m[mINFO       [0m[33m<346.00> [0m[mStep reward: {'EO-1': 0.0, 'EO-2': 0.0, 'EO-3': 0.9497331569987684}[0m


[90;3m2024-09-11 09:20:16,412 [0m[mgym                            [0m[mINFO       [0m[33m<346.00> [0m[mEpisode terminated: {'EO-1': False, 'EO-2': False, 'EO-3': False}[0m


[90;3m2024-09-11 09:20:16,413 [0m[mgym                            [0m[mINFO       [0m[33m<346.00> [0m[mEpisode truncated: {'EO-1': False, 'EO-2': False, 'EO-3': False}[0m


In [14]:
observation

{'EO-1': array([ 4.47414930e-02, -1.26998546e-02,  6.11742354e-01, -3.47691975e-04,
         9.45004884e-01,  1.25031390e-02,  7.35475167e-01,  4.38821431e-02,
         1.56877357e-01,  1.46635881e-02,  1.43002592e-01,  3.00665785e-02,
         6.67698350e-01,  2.88246490e-02,  5.49121977e-01,  4.84445699e-02,
         5.27375116e-01,  7.15403930e-02,  9.39748684e-01,  8.07412343e-02]),
 'EO-2': array([ 0.14040046, -0.02573272,  0.01332223, -0.0115793 ,  0.89221856,
         0.00326017,  0.15474838,  0.00582813,  0.79695345,  0.01328304,
         0.50684569,  0.03030545,  0.77030705,  0.03120217,  0.18387397,
         0.03053393,  0.12960873,  0.02203516,  0.65747167,  0.05894356]),
 'EO-3': array([ 0.07096022, -0.02589783,  0.08052276, -0.01332056,  0.01278707,
        -0.00431937,  0.03170583, -0.00384566,  0.12955341,  0.00637688,
         0.13227768,  0.02673708,  0.15239428,  0.03589975,  0.69182059,
         0.06117963,  0.11679984,  0.05862851,  0.5033205 ,  0.07064741])}

Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API
is that it allows for individual agents to fail without terminating the entire environment.

In [15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents

['EO-2', 'EO-3']

In [16]:
observation, reward, terminated, truncated, info = env.step({
        env.agents[0]: 7,
        env.agents[1]: 9,
    }
)

[90;3m2024-09-11 09:20:16,424 [0m[mgym                            [0m[mINFO       [0m[33m<346.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-09-11 09:20:16,424 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<346.00> [0m[92mEO-2: [0m[mtarget index 7 tasked[0m


[90;3m2024-09-11 09:20:16,425 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<346.00> [0m[92mEO-2: [0m[mTarget(tgt-523) tasked for imaging[0m


[90;3m2024-09-11 09:20:16,425 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<346.00> [0m[92mEO-2: [0m[mTarget(tgt-523) window enabled: 520.0 to 697.4[0m


[90;3m2024-09-11 09:20:16,425 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<346.00> [0m[92mEO-2: [0m[msetting timed terminal event at 697.4[0m


[90;3m2024-09-11 09:20:16,426 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<346.00> [0m[34mEO-3: [0m[mtarget index 9 tasked[0m


[90;3m2024-09-11 09:20:16,426 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<346.00> [0m[34mEO-3: [0m[mTarget(tgt-92) tasked for imaging[0m


[90;3m2024-09-11 09:20:16,427 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<346.00> [0m[34mEO-3: [0m[mTarget(tgt-92) window enabled: 748.7 to 839.9[0m


[90;3m2024-09-11 09:20:16,427 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<346.00> [0m[34mEO-3: [0m[msetting timed terminal event at 839.9[0m


[90;3m2024-09-11 09:20:16,441 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<420.00> [0m[36mEO-1: [0m[mimaged Target(tgt-787)[0m


[90;3m2024-09-11 09:20:16,443 [0m[mdata.base                      [0m[mINFO       [0m[33m<420.00> [0m[mData reward: {'EO-1': 0.9450048844759824, 'EO-2': 0.0, 'EO-3': 0.0}[0m


[90;3m2024-09-11 09:20:16,445 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<420.00> [0m[36mEO-1: [0m[mSatellite EO-1 requires retasking[0m


[90;3m2024-09-11 09:20:16,449 [0m[mgym                            [0m[mINFO       [0m[33m<420.00> [0m[mStep reward: {'EO-2': 0.0, 'EO-3': 0.0}[0m


[90;3m2024-09-11 09:20:16,450 [0m[mgym                            [0m[mINFO       [0m[33m<420.00> [0m[mEpisode terminated: {'EO-2': False, 'EO-3': False}[0m


[90;3m2024-09-11 09:20:16,450 [0m[mgym                            [0m[mINFO       [0m[33m<420.00> [0m[mEpisode truncated: {'EO-2': False, 'EO-3': False}[0m
