# Multi-Agent Environments

Two multiagent environments are given in the package:

* [GeneralSatelliteTasking](../api_reference/index.rst#bsk_rl.GeneralSatelliteTasking), 
  a [Gymnasium](https://gymnasium.farama.org)-based environment and the basis for all other environments.
* [ConstellationTasking](../api_reference/index.rst#bsk_rl.ConstellationTasking), which
  implements the [PettingZoo parallel API](https://pettingzoo.farama.org/api/parallel/).

The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed
for this kind of API.

## Configuring the Environment

For this example, a multisatellite target imaging environment will be used. The goal is
to maximize the value of unique images taken.

As usual, the satellite type is defined first.

In [1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw

class ImagingSatellite(sats.ImagingSatellite):
    observation_spec = [
        obs.OpportunityProperties(
            dict(prop="priority"), 
            dict(prop="opportunity_open", norm=5700.0),
            n_ahead_observe=10,
        )
    ]
    action_spec = [act.Image(n_ahead_image=10)]
    dyn_type = dyn.FullFeaturedDynModel
    fsw_type = fsw.SteeringImagerFSWModel

Satellite properties are set to give the satellite near-unlimited power and storage
resources, and put the satellite at a 800 km orbit.

In [2]:

from bsk_rl.utils.orbital import random_orbit

sat_args = dict(
    imageAttErrorRequirement=0.01,
    imageRateErrorRequirement=0.01,
    batteryStorageCapacity=1e9,
    storedCharge_Init=1e9,
    dataStorageCapacity=1e12,
    u_max=0.4,
    K1=0.25,
    K3=3.0,
    omega_max=0.087,
    servo_Ki=5.0,
    servo_P=150 / 5,
    oe=lambda: random_orbit(alt=800),
)

## Gym API

GeneralSatelliteTasking uses tuples of actions and observations to interact with the
environment.

In [3]:
from bsk_rl import GeneralSatelliteTasking

env = GeneralSatelliteTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    log_level="INFO",
)
env.reset()

env.observation_space

[90;3m2024-06-19 14:41:02,732 [0m[mgym                            [0m[mINFO       [0m[mResetting environment with seed=3846705016[0m


[90;3m2024-06-19 14:41:02,732 [0m[mscene.targets                  [0m[mINFO       [0m[mGenerating 1000 targets[0m


[90;3m2024-06-19 14:41:02,892 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-06-19 14:41:02,914 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-06-19 14:41:02,932 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-06-19 14:41:02,950 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-06-19 14:41:02,972 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[mSatellites requiring retasking: ['EO-1_5000043568', 'EO-2_5000045248', 'EO-3_11856148224'][0m


[90;3m2024-06-19 14:41:02,973 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[mEnvironment reset[0m


Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))

In [4]:
env.action_space

Tuple(Discrete(10), Discrete(10), Discrete(10))

Consequently, actions are passed as a tuple. The step will stop the first time any
satellite completes an action.

In [5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])

[90;3m2024-06-19 14:41:02,982 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-06-19 14:41:02,983 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mtarget index 7 tasked[0m


[90;3m2024-06-19 14:41:02,983 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-80) tasked for imaging[0m


[90;3m2024-06-19 14:41:02,984 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-80) window enabled: 581.8 to 778.5[0m


[90;3m2024-06-19 14:41:02,984 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[msetting timed terminal event at 778.5[0m


[90;3m2024-06-19 14:41:02,984 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mtarget index 9 tasked[0m


[90;3m2024-06-19 14:41:02,984 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-932) tasked for imaging[0m


[90;3m2024-06-19 14:41:02,985 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-932) window enabled: 584.7 to 600.0[0m


[90;3m2024-06-19 14:41:02,985 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[msetting timed terminal event at 600.0[0m


[90;3m2024-06-19 14:41:02,986 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mtarget index 8 tasked[0m


[90;3m2024-06-19 14:41:02,986 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-285) tasked for imaging[0m


[90;3m2024-06-19 14:41:02,986 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-285) window enabled: 457.9 to 587.3[0m


[90;3m2024-06-19 14:41:02,987 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[msetting timed terminal event at 587.3[0m


[90;3m2024-06-19 14:41:02,987 [0m[msim.simulator                  [0m[mINFO       [0m[33m<0.00> [0m[mRunning simulation at most to 1000000000.00 seconds[0m


[90;3m2024-06-19 14:41:03,076 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<460.00> [0m[34mEO-3: [0m[mimaged Target(tgt-285)[0m


[90;3m2024-06-19 14:41:03,078 [0m[mdata.base                      [0m[mINFO       [0m[33m<460.00> [0m[mData reward: {'EO-1_5000043568': 0.0, 'EO-2_5000045248': 0.0, 'EO-3_11856148224': 0.496731720382655}[0m


[90;3m2024-06-19 14:41:03,083 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<460.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-06-19 14:41:03,102 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<460.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 1200.00 to 1800.00 seconds[0m


[90;3m2024-06-19 14:41:03,124 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<460.00> [0m[34mEO-3: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-06-19 14:41:03,144 [0m[mgym                            [0m[mINFO       [0m[33m<460.00> [0m[mSatellites requiring retasking: ['EO-3_11856148224'][0m


[90;3m2024-06-19 14:41:03,145 [0m[mgym                            [0m[mINFO       [0m[33m<460.00> [0m[mStep reward: 0.496731720382655[0m


In [6]:
observation

(array([ 0.5262301 , -0.03022015,  0.28402522, -0.03579063,  0.38350178,
        -0.03109852,  0.39765738,  0.02136911,  0.89579007, -0.00978359,
         0.40971075,  0.03062524,  0.60757276,  0.07549993,  0.82740313,
         0.06661879,  0.63714539,  0.05866202,  0.41172308,  0.05572379]),
 array([0.27863449, 0.01223435, 0.29001943, 0.01815353, 0.52494685,
        0.02187046, 0.02328922, 0.07482971, 0.86722742, 0.12100024,
        0.08566248, 0.12243016, 0.75679172, 0.12717974, 0.26400625,
        0.15545994, 0.27951209, 0.14590077, 0.42070714, 0.1443381 ]),
 array([ 0.05204218, -0.01767668,  0.46197972,  0.00982897,  0.39575618,
        -0.00939553,  0.24479972, -0.00428013,  0.83055036, -0.00454   ,
         0.34339089,  0.03947019,  0.78233513,  0.08417281,  0.67326847,
         0.08212403,  0.91057798,  0.12312927,  0.08304434,  0.11884015]))

At this point, either every satellite can be retasked, or satellites can continue their
previous action by passing `None` as the action. To see which satellites must be
retasked (i.e. their previous action is done and they have nothing more to do), look at
`info["requires_retasking"]`.

In [7]:
info["requires_retasking"]

['EO-3_11856148224']

Based on this list, we decide here to only retask the satellite that needs it.

In [8]:
actions = [None, None, None]
actions[int(info["requires_retasking"][0][3]) - 1] = 7
actions

[None, None, 7]

In [9]:
observation, reward, terminated, truncated, info = env.step(actions)

[90;3m2024-06-19 14:41:03,158 [0m[mgym                            [0m[mINFO       [0m[33m<460.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-06-19 14:41:03,159 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<460.00> [0m[34mEO-3: [0m[mtarget index 7 tasked[0m


[90;3m2024-06-19 14:41:03,159 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<460.00> [0m[34mEO-3: [0m[mTarget(tgt-254) tasked for imaging[0m


[90;3m2024-06-19 14:41:03,160 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<460.00> [0m[34mEO-3: [0m[mTarget(tgt-254) window enabled: 928.1 to 1117.9[0m


[90;3m2024-06-19 14:41:03,160 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<460.00> [0m[34mEO-3: [0m[msetting timed terminal event at 1117.9[0m


[90;3m2024-06-19 14:41:03,160 [0m[msim.simulator                  [0m[mINFO       [0m[33m<460.00> [0m[mRunning simulation at most to 1000000460.00 seconds[0m


[90;3m2024-06-19 14:41:03,184 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<584.00> [0m[36mEO-1: [0m[mimaged Target(tgt-80)[0m


[90;3m2024-06-19 14:41:03,186 [0m[mdata.base                      [0m[mINFO       [0m[33m<584.00> [0m[mData reward: {'EO-1_5000043568': 0.3976573836286119, 'EO-2_5000045248': 0.0, 'EO-3_11856148224': 0.0}[0m


[90;3m2024-06-19 14:41:03,189 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<584.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 1200.00 to 1800.00 seconds[0m


[90;3m2024-06-19 14:41:03,215 [0m[mgym                            [0m[mINFO       [0m[33m<584.00> [0m[mSatellites requiring retasking: ['EO-1_5000043568'][0m


[90;3m2024-06-19 14:41:03,215 [0m[mgym                            [0m[mINFO       [0m[33m<584.00> [0m[mStep reward: 0.3976573836286119[0m


In this environment, the environment will stop if any agent dies. To demonstrate this,
one satellite is forcibly killed.

In [10]:
from Basilisk.architecture import messaging

def isnt_alive(log_failure=False):
    """Mock satellite 0 dying."""
    self = env.unwrapped.satellites[0]
    death_message = messaging.PowerStorageStatusMsgPayload()
    death_message.storageLevel = 0.0
    self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
    return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
        log_failure=log_failure
    )

env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])


[90;3m2024-06-19 14:41:03,220 [0m[mgym                            [0m[mINFO       [0m[33m<584.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-06-19 14:41:03,220 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<584.00> [0m[36mEO-1: [0m[mtarget index 6 tasked[0m


[90;3m2024-06-19 14:41:03,220 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<584.00> [0m[36mEO-1: [0m[mTarget(tgt-339) tasked for imaging[0m


[90;3m2024-06-19 14:41:03,221 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<584.00> [0m[36mEO-1: [0m[mTarget(tgt-339) window enabled: 797.1 to 1005.4[0m


[90;3m2024-06-19 14:41:03,221 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<584.00> [0m[36mEO-1: [0m[msetting timed terminal event at 1005.4[0m


[90;3m2024-06-19 14:41:03,221 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<584.00> [0m[92mEO-2: [0m[mtarget index 7 tasked[0m


[90;3m2024-06-19 14:41:03,221 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<584.00> [0m[92mEO-2: [0m[mTarget(tgt-312) tasked for imaging[0m


[90;3m2024-06-19 14:41:03,222 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<584.00> [0m[92mEO-2: [0m[mTarget(tgt-312) window enabled: 1346.1 to 1472.2[0m


[90;3m2024-06-19 14:41:03,222 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<584.00> [0m[92mEO-2: [0m[msetting timed terminal event at 1472.2[0m


[90;3m2024-06-19 14:41:03,222 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<584.00> [0m[34mEO-3: [0m[mtarget index 9 tasked[0m


[90;3m2024-06-19 14:41:03,223 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<584.00> [0m[34mEO-3: [0m[mTarget(tgt-774) tasked for imaging[0m


[90;3m2024-06-19 14:41:03,223 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<584.00> [0m[34mEO-3: [0m[mTarget(tgt-774) window enabled: 1189.2 to 1200.0[0m


[90;3m2024-06-19 14:41:03,223 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<584.00> [0m[34mEO-3: [0m[msetting timed terminal event at 1200.0[0m


[90;3m2024-06-19 14:41:03,224 [0m[msim.simulator                  [0m[mINFO       [0m[33m<584.00> [0m[mRunning simulation at most to 1000000584.00 seconds[0m


[90;3m2024-06-19 14:41:03,265 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<800.00> [0m[36mEO-1: [0m[mimaged Target(tgt-339)[0m


[90;3m2024-06-19 14:41:03,267 [0m[mdata.base                      [0m[mINFO       [0m[33m<800.00> [0m[mData reward: {'EO-1_5000043568': 0.7302026226885143, 'EO-2_5000045248': 0.0, 'EO-3_11856148224': 0.0}[0m


[90;3m2024-06-19 14:41:03,271 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<800.00> [0m[34mEO-3: [0m[mFinding opportunity windows from 1200.00 to 1800.00 seconds[0m


[90;3m2024-06-19 14:41:03,294 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<800.00> [0m[36mEO-1: [0m[mfailed battery_valid check[0m


[90;3m2024-06-19 14:41:03,295 [0m[mgym                            [0m[mINFO       [0m[33m<800.00> [0m[mStep reward: -0.2697973773114857[0m


[90;3m2024-06-19 14:41:03,295 [0m[mgym                            [0m[mINFO       [0m[33m<800.00> [0m[mEpisode terminated: True[0m


[90;3m2024-06-19 14:41:03,295 [0m[mgym                            [0m[mINFO       [0m[33m<800.00> [0m[mEpisode truncated: False[0m


## PettingZoo API

The [PettingZoo parallel API](https://pettingzoo.farama.org/api/parallel/) environment, 
ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their
documentation for a full description of the API. It tends to separate things into
dictionaries keyed by agent, rather than tuples.

In [11]:
from bsk_rl import ConstellationTasking

env = ConstellationTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    log_level="INFO",
)
env.reset()

env.observation_spaces



[90;3m2024-06-19 14:41:03,504 [0m[mgym                            [0m[mINFO       [0m[mResetting environment with seed=323861720[0m


[90;3m2024-06-19 14:41:03,504 [0m[mscene.targets                  [0m[mINFO       [0m[mGenerating 1000 targets[0m


[90;3m2024-06-19 14:41:03,659 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-06-19 14:41:03,685 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-06-19 14:41:03,708 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-06-19 14:41:03,726 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[mSatellites requiring retasking: ['EO-1_5000044240', 'EO-2_11858113248', 'EO-3_11858112672'][0m


[90;3m2024-06-19 14:41:03,728 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[mEnvironment reset[0m


{'EO-1_5000044240': Box(-1e+16, 1e+16, (20,), float64),
 'EO-2_11858113248': Box(-1e+16, 1e+16, (20,), float64),
 'EO-3_11858112672': Box(-1e+16, 1e+16, (20,), float64)}

In [12]:
env.action_spaces

{'EO-1_5000044240': Discrete(10),
 'EO-2_11858113248': Discrete(10),
 'EO-3_11858112672': Discrete(10)}

Actions are passed as a dictionary; the agent names can be accessed through the `agents`
property.

In [13]:
observation, reward, terminated, truncated, info = env.step(
    {
        env.agents[0]: 7,
        env.agents[1]: 9,
        env.agents[2]: 8,
    }
)

[90;3m2024-06-19 14:41:03,736 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-06-19 14:41:03,736 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mtarget index 7 tasked[0m


[90;3m2024-06-19 14:41:03,737 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-576) tasked for imaging[0m


[90;3m2024-06-19 14:41:03,737 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-576) window enabled: 244.2 to 457.0[0m


[90;3m2024-06-19 14:41:03,738 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[msetting timed terminal event at 457.0[0m


[90;3m2024-06-19 14:41:03,738 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mtarget index 9 tasked[0m


[90;3m2024-06-19 14:41:03,738 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-44) tasked for imaging[0m


[90;3m2024-06-19 14:41:03,739 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-44) window enabled: 356.6 to 568.8[0m


[90;3m2024-06-19 14:41:03,739 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[msetting timed terminal event at 568.8[0m


[90;3m2024-06-19 14:41:03,739 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mtarget index 8 tasked[0m


[90;3m2024-06-19 14:41:03,739 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-362) tasked for imaging[0m


[90;3m2024-06-19 14:41:03,740 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-362) window enabled: 467.2 to 512.6[0m


[90;3m2024-06-19 14:41:03,740 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[msetting timed terminal event at 512.6[0m


[90;3m2024-06-19 14:41:03,740 [0m[msim.simulator                  [0m[mINFO       [0m[33m<0.00> [0m[mRunning simulation at most to 1000000000.00 seconds[0m


[90;3m2024-06-19 14:41:03,790 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<247.00> [0m[36mEO-1: [0m[mimaged Target(tgt-576)[0m


[90;3m2024-06-19 14:41:03,792 [0m[mdata.base                      [0m[mINFO       [0m[33m<247.00> [0m[mData reward: {'EO-1_5000044240': 0.09602657711509033, 'EO-2_11858113248': 0.0, 'EO-3_11858112672': 0.0}[0m


[90;3m2024-06-19 14:41:03,797 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<247.00> [0m[34mEO-3: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-06-19 14:41:03,820 [0m[mgym                            [0m[mINFO       [0m[33m<247.00> [0m[mSatellites requiring retasking: ['EO-1_5000044240'][0m


[90;3m2024-06-19 14:41:03,821 [0m[mgym                            [0m[mINFO       [0m[33m<247.00> [0m[mStep reward: {'EO-1_5000044240': 0.09602657711509033, 'EO-2_11858113248': 0.0, 'EO-3_11858112672': 0.0}[0m


[90;3m2024-06-19 14:41:03,821 [0m[mgym                            [0m[mINFO       [0m[33m<247.00> [0m[mEpisode terminated: {'EO-1_5000044240': False, 'EO-2_11858113248': False, 'EO-3_11858112672': False}[0m


[90;3m2024-06-19 14:41:03,822 [0m[mgym                            [0m[mINFO       [0m[33m<247.00> [0m[mEpisode truncated: {'EO-1_5000044240': False, 'EO-2_11858113248': False, 'EO-3_11858112672': False}[0m


In [14]:
observation

{'EO-1_5000044240': array([ 0.56920273, -0.02151521,  0.2297232 ,  0.00552812,  0.78461514,
        -0.01913435,  0.27567832,  0.00218598,  0.63697124,  0.01513787,
         0.39431237,  0.01633222,  0.49897218,  0.03226431,  0.10961557,
         0.01170529,  0.53807617,  0.05884056,  0.79886864,  0.02951799]),
 'EO-2_11858113248': array([ 0.44960104, -0.03470313,  0.30476257, -0.018098  ,  0.64042319,
         0.01451007,  0.89693457,  0.01316759,  0.10734452,  0.00934328,
         0.5349646 ,  0.0192207 ,  0.78408702,  0.0428556 ,  0.200825  ,
         0.05328849,  0.99644682,  0.03574838,  0.65767339,  0.06120948]),
 'EO-3_11858112672': array([ 0.5304967 , -0.03082372,  0.83496702, -0.00682569,  0.31186574,
        -0.00212665,  0.79837208,  0.00735195,  0.91402653,  0.03863149,
         0.13639769,  0.04140865,  0.17595261,  0.07536455,  0.51064093,
         0.06956332,  0.64879405,  0.06676183,  0.59261791,  0.07797086])}

Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API
is that it allows for individual agents to fail without terminating the entire environment.

In [15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents

['EO-2_11858113248', 'EO-3_11858112672']

In [16]:
observation, reward, terminated, truncated, info = env.step({
        env.agents[0]: 7,
        env.agents[1]: 9,
    }
)

[90;3m2024-06-19 14:41:03,832 [0m[mgym                            [0m[mINFO       [0m[33m<247.00> [0m[93;1m=== STARTING STEP ===[0m




[90;3m2024-06-19 14:41:03,833 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<247.00> [0m[92mEO-2: [0m[mtarget index 7 tasked[0m


[90;3m2024-06-19 14:41:03,833 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<247.00> [0m[92mEO-2: [0m[mTarget(tgt-464) tasked for imaging[0m


[90;3m2024-06-19 14:41:03,834 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<247.00> [0m[92mEO-2: [0m[mTarget(tgt-464) window enabled: 550.7 to 600.0[0m


[90;3m2024-06-19 14:41:03,834 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<247.00> [0m[92mEO-2: [0m[msetting timed terminal event at 600.0[0m


[90;3m2024-06-19 14:41:03,835 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<247.00> [0m[34mEO-3: [0m[mtarget index 9 tasked[0m


[90;3m2024-06-19 14:41:03,835 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<247.00> [0m[34mEO-3: [0m[mTarget(tgt-781) tasked for imaging[0m


[90;3m2024-06-19 14:41:03,835 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<247.00> [0m[34mEO-3: [0m[mTarget(tgt-781) window enabled: 691.4 to 904.3[0m


[90;3m2024-06-19 14:41:03,835 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<247.00> [0m[34mEO-3: [0m[msetting timed terminal event at 904.3[0m


[90;3m2024-06-19 14:41:03,836 [0m[msim.simulator                  [0m[mINFO       [0m[33m<247.00> [0m[mRunning simulation at most to 1000000247.00 seconds[0m


[90;3m2024-06-19 14:41:03,877 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<457.00> [0m[36mEO-1: [0m[mtimed termination at 457.0 for Target(tgt-576) window[0m


[90;3m2024-06-19 14:41:03,879 [0m[mdata.base                      [0m[mINFO       [0m[33m<457.00> [0m[mData reward: {'EO-1_5000044240': 0.0, 'EO-2_11858113248': 0.0, 'EO-3_11858112672': 0.0}[0m


[90;3m2024-06-19 14:41:03,884 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<457.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-06-19 14:41:03,909 [0m[mgym                            [0m[mINFO       [0m[33m<457.00> [0m[mStep reward: {'EO-2_11858113248': 0.0, 'EO-3_11858112672': 0.0}[0m


[90;3m2024-06-19 14:41:03,910 [0m[mgym                            [0m[mINFO       [0m[33m<457.00> [0m[mEpisode terminated: {'EO-2_11858113248': False, 'EO-3_11858112672': False}[0m


[90;3m2024-06-19 14:41:03,910 [0m[mgym                            [0m[mINFO       [0m[33m<457.00> [0m[mEpisode truncated: {'EO-2_11858113248': False, 'EO-3_11858112672': False}[0m
