# Multi-Agent Environments

Two multiagent environments are given in the package:

* [GeneralSatelliteTasking](../api_reference/index.rst#bsk_rl.GeneralSatelliteTasking), 
  a [Gymnasium](https://gymnasium.farama.org)-based environment and the basis for all other environments.
* [ConstellationTasking](../api_reference/index.rst#bsk_rl.ConstellationTasking), which
  implements the [PettingZoo parallel API](https://pettingzoo.farama.org/api/parallel/).

The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed
for this kind of API.

## Configuring the Environment

For this example, a multisatellite target imaging environment will be used. The goal is
to maximize the value of unique images taken.

As usual, the satellite type is defined first.

In [1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw

class ImagingSatellite(sats.ImagingSatellite):
    observation_spec = [
        obs.OpportunityProperties(
            dict(prop="priority"), 
            dict(prop="opportunity_open", norm=5700.0),
            n_ahead_observe=10,
        )
    ]
    action_spec = [act.Image(n_ahead_image=10)]
    dyn_type = dyn.FullFeaturedDynModel
    fsw_type = fsw.SteeringImagerFSWModel

Satellite properties are set to give the satellite near-unlimited power and storage
resources, and put the satellite at a 800 km orbit.

In [2]:

from bsk_rl.utils.orbital import random_orbit

sat_args = dict(
    imageAttErrorRequirement=0.01,
    imageRateErrorRequirement=0.01,
    batteryStorageCapacity=1e9,
    storedCharge_Init=1e9,
    dataStorageCapacity=1e12,
    u_max=0.4,
    K1=0.25,
    K3=3.0,
    omega_max=0.087,
    servo_Ki=5.0,
    servo_P=150 / 5,
    oe=lambda: random_orbit(alt=800),
)

## Gym API

GeneralSatelliteTasking uses tuples of actions and observations to interact with the
environment.

In [3]:
from bsk_rl import GeneralSatelliteTasking

env = GeneralSatelliteTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    log_level="INFO",
)
env.reset()

env.observation_space

[90;3m2024-05-31 16:24:49,395 [0m[mgym                            [0m[mINFO       [0m[mResetting environment with seed=3731917456[0m


[90;3m2024-05-31 16:24:49,395 [0m[mscene.targets                  [0m[mINFO       [0m[mGenerating 1000 targets[0m


[90;3m2024-05-31 16:24:49,559 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-05-31 16:24:49,581 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-05-31 16:24:49,604 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-05-31 16:24:49,628 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[mSatellites requiring retasking: ['EO-1_4594229376', 'EO-2_4594228272', 'EO-3_11383596672'][0m


[90;3m2024-05-31 16:24:49,628 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[mEnvironment reset[0m


Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))

In [4]:
env.action_space

Tuple(Discrete(10), Discrete(10), Discrete(10))

Consequently, actions are passed as a tuple. The step will stop the first time any
satellite completes an action.

In [5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])

[90;3m2024-05-31 16:24:49,638 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-05-31 16:24:49,638 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mtarget index 7 tasked[0m


[90;3m2024-05-31 16:24:49,638 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-48) tasked for imaging[0m


[90;3m2024-05-31 16:24:49,639 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-48) window enabled: 128.7 to 311.2[0m


[90;3m2024-05-31 16:24:49,640 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[msetting timed terminal event at 311.2[0m


[90;3m2024-05-31 16:24:49,640 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mtarget index 9 tasked[0m


[90;3m2024-05-31 16:24:49,640 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-358) tasked for imaging[0m


[90;3m2024-05-31 16:24:49,641 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-358) window enabled: 274.0 to 465.7[0m


[90;3m2024-05-31 16:24:49,641 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[msetting timed terminal event at 465.7[0m


[90;3m2024-05-31 16:24:49,641 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mtarget index 8 tasked[0m


[90;3m2024-05-31 16:24:49,641 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-492) tasked for imaging[0m


[90;3m2024-05-31 16:24:49,642 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-492) window enabled: 360.4 to 445.8[0m


[90;3m2024-05-31 16:24:49,642 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[msetting timed terminal event at 445.8[0m


[90;3m2024-05-31 16:24:49,643 [0m[msim.simulator                  [0m[mINFO       [0m[33m<0.00> [0m[mRunning simulation at most to 1000000000.00 seconds[0m


[90;3m2024-05-31 16:24:49,670 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<131.00> [0m[36mEO-1: [0m[mimaged Target(tgt-48)[0m


[90;3m2024-05-31 16:24:49,672 [0m[mdata.base                      [0m[mINFO       [0m[33m<131.00> [0m[mData reward: {'EO-1_4594229376': 0.8490970043045871, 'EO-2_4594228272': 0.0, 'EO-3_11383596672': 0.0}[0m


[90;3m2024-05-31 16:24:49,677 [0m[mgym                            [0m[mINFO       [0m[33m<131.00> [0m[mSatellites requiring retasking: ['EO-1_4594229376'][0m


[90;3m2024-05-31 16:24:49,677 [0m[mgym                            [0m[mINFO       [0m[33m<131.00> [0m[mStep reward: 0.8490970043045871[0m


In [6]:
observation

(array([ 0.92952453, -0.02171651,  0.09485104, -0.01888697,  0.69626008,
        -0.01323879,  0.0592699 , -0.01956626,  0.31841486,  0.02726043,
         0.26016688,  0.03952533,  0.32174418,  0.04452711,  0.60532024,
         0.05192149,  0.64168396,  0.05674144,  0.59718288,  0.04868392]),
 array([ 0.46677668, -0.02298246,  0.26183512, -0.00185962,  0.12937522,
        -0.00525075,  0.31902371,  0.00385037,  0.22778555,  0.00161488,
         0.94978415,  0.01525485,  0.83953352,  0.01127847,  0.54275875,
         0.02438937,  0.46276273,  0.02508408,  0.38153857,  0.02664706]),
 array([ 0.52066306, -0.02298246,  0.95595429, -0.01953964,  0.90617328,
        -0.00112268,  0.78384523,  0.01969529,  0.13597152,  0.01157903,
         0.63195448,  0.0099964 ,  0.0967253 ,  0.04024514,  0.21411318,
         0.03755702,  0.6472315 ,  0.04509715,  0.77396428,  0.0801271 ]))

At this point, either every satellite can be retasked, or satellites can continue their
previous action by passing `None` as the action. To see which satellites must be
retasked (i.e. their previous action is done and they have nothing more to do), look at
`info["requires_retasking"]`.

In [7]:
info["requires_retasking"]

['EO-1_4594229376']

Based on this list, we decide here to only retask the satellite that needs it.

In [8]:
actions = [None, None, None]
actions[int(info["requires_retasking"][0][3]) - 1] = 7
actions

[7, None, None]

In [9]:
observation, reward, terminated, truncated, info = env.step(actions)

[90;3m2024-05-31 16:24:49,690 [0m[mgym                            [0m[mINFO       [0m[33m<131.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-05-31 16:24:49,691 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<131.00> [0m[36mEO-1: [0m[mtarget index 7 tasked[0m


[90;3m2024-05-31 16:24:49,691 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<131.00> [0m[36mEO-1: [0m[mTarget(tgt-565) tasked for imaging[0m


[90;3m2024-05-31 16:24:49,692 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<131.00> [0m[36mEO-1: [0m[mTarget(tgt-565) window enabled: 427.0 to 596.3[0m


[90;3m2024-05-31 16:24:49,692 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<131.00> [0m[36mEO-1: [0m[msetting timed terminal event at 596.3[0m


[90;3m2024-05-31 16:24:49,692 [0m[msim.simulator                  [0m[mINFO       [0m[33m<131.00> [0m[mRunning simulation at most to 1000000131.00 seconds[0m


[90;3m2024-05-31 16:24:49,721 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<276.00> [0m[92mEO-2: [0m[mimaged Target(tgt-358)[0m


[90;3m2024-05-31 16:24:49,722 [0m[mdata.base                      [0m[mINFO       [0m[33m<276.00> [0m[mData reward: {'EO-1_4594229376': 0.0, 'EO-2_4594228272': 0.46276272986490175, 'EO-3_11383596672': 0.0}[0m


[90;3m2024-05-31 16:24:49,726 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<276.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-05-31 16:24:49,752 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<276.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-05-31 16:24:49,776 [0m[mgym                            [0m[mINFO       [0m[33m<276.00> [0m[mSatellites requiring retasking: ['EO-2_4594228272'][0m


[90;3m2024-05-31 16:24:49,776 [0m[mgym                            [0m[mINFO       [0m[33m<276.00> [0m[mStep reward: 0.46276272986490175[0m


In this environment, the environment will stop if any agent dies. To demonstrate this,
one satellite is forcibly killed.

In [10]:
from Basilisk.architecture import messaging

def isnt_alive(log_failure=False):
    """Mock satellite 0 dying."""
    self = env.unwrapped.satellites[0]
    death_message = messaging.PowerStorageStatusMsgPayload()
    death_message.storageLevel = 0.0
    self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
    return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
        log_failure=log_failure
    )

env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])


[90;3m2024-05-31 16:24:49,781 [0m[mgym                            [0m[mINFO       [0m[33m<276.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-05-31 16:24:49,781 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<276.00> [0m[36mEO-1: [0m[mtarget index 6 tasked[0m


[90;3m2024-05-31 16:24:49,781 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<276.00> [0m[36mEO-1: [0m[mTarget(tgt-266) tasked for imaging[0m


[90;3m2024-05-31 16:24:49,782 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<276.00> [0m[36mEO-1: [0m[mTarget(tgt-266) window enabled: 503.7 to 672.8[0m


[90;3m2024-05-31 16:24:49,782 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<276.00> [0m[36mEO-1: [0m[msetting timed terminal event at 672.8[0m


[90;3m2024-05-31 16:24:49,783 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<276.00> [0m[92mEO-2: [0m[mtarget index 7 tasked[0m


[90;3m2024-05-31 16:24:49,783 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<276.00> [0m[92mEO-2: [0m[mTarget(tgt-506) tasked for imaging[0m


[90;3m2024-05-31 16:24:49,784 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<276.00> [0m[92mEO-2: [0m[mTarget(tgt-506) window enabled: 425.6 to 618.7[0m


[90;3m2024-05-31 16:24:49,784 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<276.00> [0m[92mEO-2: [0m[msetting timed terminal event at 618.7[0m


[90;3m2024-05-31 16:24:49,784 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<276.00> [0m[34mEO-3: [0m[mtarget index 9 tasked[0m


[90;3m2024-05-31 16:24:49,784 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<276.00> [0m[34mEO-3: [0m[mTarget(tgt-739) tasked for imaging[0m


[90;3m2024-05-31 16:24:49,785 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<276.00> [0m[34mEO-3: [0m[mTarget(tgt-739) window enabled: 453.6 to 600.0[0m


[90;3m2024-05-31 16:24:49,785 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<276.00> [0m[34mEO-3: [0m[msetting timed terminal event at 600.0[0m


[90;3m2024-05-31 16:24:49,785 [0m[msim.simulator                  [0m[mINFO       [0m[33m<276.00> [0m[mRunning simulation at most to 1000000276.00 seconds[0m


[90;3m2024-05-31 16:24:49,816 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<428.00> [0m[92mEO-2: [0m[mimaged Target(tgt-506)[0m


[90;3m2024-05-31 16:24:49,818 [0m[mdata.base                      [0m[mINFO       [0m[33m<428.00> [0m[mData reward: {'EO-1_4594229376': 0.0, 'EO-2_4594228272': 0.539840444868018, 'EO-3_11383596672': 0.0}[0m


[90;3m2024-05-31 16:24:49,822 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<428.00> [0m[34mEO-3: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-05-31 16:24:49,846 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<428.00> [0m[36mEO-1: [0m[mfailed battery_valid check[0m


[90;3m2024-05-31 16:24:49,847 [0m[mgym                            [0m[mINFO       [0m[33m<428.00> [0m[mSatellites requiring retasking: ['EO-2_4594228272'][0m


[90;3m2024-05-31 16:24:49,847 [0m[mgym                            [0m[mINFO       [0m[33m<428.00> [0m[mStep reward: -0.46015955513198203[0m


[90;3m2024-05-31 16:24:49,847 [0m[mgym                            [0m[mINFO       [0m[33m<428.00> [0m[mEpisode terminated: True[0m


[90;3m2024-05-31 16:24:49,847 [0m[mgym                            [0m[mINFO       [0m[33m<428.00> [0m[mEpisode truncated: False[0m


## PettingZoo API

The [PettingZoo parallel API](https://pettingzoo.farama.org/api/parallel/) environment, 
ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their
documentation for a full description of the API. It tends to separate things into
dictionaries keyed by agent, rather than tuples.

In [11]:
from bsk_rl import ConstellationTasking

env = ConstellationTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    log_level="INFO",
)
env.reset()

env.observation_spaces



[90;3m2024-05-31 16:24:50,058 [0m[mgym                            [0m[mINFO       [0m[mResetting environment with seed=100802160[0m


[90;3m2024-05-31 16:24:50,059 [0m[mscene.targets                  [0m[mINFO       [0m[mGenerating 1000 targets[0m


[90;3m2024-05-31 16:24:50,214 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-05-31 16:24:50,235 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-05-31 16:24:50,256 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-05-31 16:24:50,280 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-05-31 16:24:50,301 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[mSatellites requiring retasking: ['EO-1_11385528784', 'EO-2_11385529888', 'EO-3_11385529456'][0m


[90;3m2024-05-31 16:24:50,302 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[mEnvironment reset[0m


{'EO-1_11385528784': Box(-1e+16, 1e+16, (20,), float64),
 'EO-2_11385529888': Box(-1e+16, 1e+16, (20,), float64),
 'EO-3_11385529456': Box(-1e+16, 1e+16, (20,), float64)}

In [12]:
env.action_spaces

{'EO-1_11385528784': Discrete(10),
 'EO-2_11385529888': Discrete(10),
 'EO-3_11385529456': Discrete(10)}

Actions are passed as a dictionary; the agent names can be accessed through the `agents`
property.

In [13]:
observation, reward, terminated, truncated, info = env.step(
    {
        env.agents[0]: 7,
        env.agents[1]: 9,
        env.agents[2]: 8,
    }
)

[90;3m2024-05-31 16:24:50,311 [0m[mgym                            [0m[mINFO       [0m[33m<0.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-05-31 16:24:50,312 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mtarget index 7 tasked[0m


[90;3m2024-05-31 16:24:50,312 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-225) tasked for imaging[0m


[90;3m2024-05-31 16:24:50,313 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-225) window enabled: 549.0 to 711.2[0m


[90;3m2024-05-31 16:24:50,313 [0m[36msats.satellite.EO-1            [0m[mINFO       [0m[33m<0.00> [0m[36mEO-1: [0m[msetting timed terminal event at 711.2[0m


[90;3m2024-05-31 16:24:50,313 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mtarget index 9 tasked[0m


[90;3m2024-05-31 16:24:50,313 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-84) tasked for imaging[0m


[90;3m2024-05-31 16:24:50,314 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-84) window enabled: 280.2 to 446.0[0m


[90;3m2024-05-31 16:24:50,314 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<0.00> [0m[92mEO-2: [0m[msetting timed terminal event at 446.0[0m


[90;3m2024-05-31 16:24:50,314 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mtarget index 8 tasked[0m


[90;3m2024-05-31 16:24:50,315 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-905) tasked for imaging[0m


[90;3m2024-05-31 16:24:50,315 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-905) window enabled: 410.5 to 600.0[0m


[90;3m2024-05-31 16:24:50,315 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<0.00> [0m[34mEO-3: [0m[msetting timed terminal event at 600.0[0m


[90;3m2024-05-31 16:24:50,316 [0m[msim.simulator                  [0m[mINFO       [0m[33m<0.00> [0m[mRunning simulation at most to 1000000000.00 seconds[0m


[90;3m2024-05-31 16:24:50,373 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<283.00> [0m[92mEO-2: [0m[mimaged Target(tgt-84)[0m


[90;3m2024-05-31 16:24:50,375 [0m[mdata.base                      [0m[mINFO       [0m[33m<283.00> [0m[mData reward: {'EO-1_11385528784': 0.0, 'EO-2_11385529888': 0.1736317989411481, 'EO-3_11385529456': 0.0}[0m


[90;3m2024-05-31 16:24:50,382 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<283.00> [0m[34mEO-3: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-05-31 16:24:50,412 [0m[mgym                            [0m[mINFO       [0m[33m<283.00> [0m[mSatellites requiring retasking: ['EO-2_11385529888'][0m


[90;3m2024-05-31 16:24:50,413 [0m[mgym                            [0m[mINFO       [0m[33m<283.00> [0m[mStep reward: {'EO-1_11385528784': 0.0, 'EO-2_11385529888': 0.1736317989411481, 'EO-3_11385529456': 0.0}[0m


[90;3m2024-05-31 16:24:50,413 [0m[mgym                            [0m[mINFO       [0m[33m<283.00> [0m[mEpisode terminated: {'EO-1_11385528784': False, 'EO-2_11385529888': False, 'EO-3_11385529456': False}[0m


[90;3m2024-05-31 16:24:50,413 [0m[mgym                            [0m[mINFO       [0m[33m<283.00> [0m[mEpisode truncated: {'EO-1_11385528784': False, 'EO-2_11385529888': False, 'EO-3_11385529456': False}[0m


In [14]:
observation

{'EO-1_11385528784': array([ 0.60338287, -0.03011611,  0.87323203,  0.00419951,  0.80172144,
         0.00750755,  0.37153971,  0.04032373,  0.61646756,  0.04666259,
         0.48161493,  0.04785926,  0.47384484,  0.07925763,  0.0035083 ,
         0.07716695,  0.00291903,  0.08894771,  0.50867957,  0.12433105]),
 'EO-2_11385529888': array([ 0.63433108, -0.03101471,  0.23779057, -0.01630323,  0.58209618,
        -0.02437096,  0.50745344, -0.02125649,  0.96829724, -0.01826611,
         0.9566993 , -0.00644389,  0.59672177,  0.01027947,  0.36671403,
         0.03685701,  0.84354774,  0.01354962,  0.41031534,  0.04520229]),
 'EO-3_11385529456': array([ 0.18204768, -0.01113924,  0.29690295,  0.00256507,  0.76019462,
         0.02354056,  0.53067777,  0.03641876,  0.32337915,  0.04132224,
         0.27812719,  0.023271  ,  0.96258472,  0.02236515,  0.05312964,
         0.03625353,  0.99471926,  0.07202423,  0.80033185,  0.05908512])}

Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API
is that it allows for individual agents to fail without terminating the entire environment.

In [15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents

['EO-2_11385529888', 'EO-3_11385529456']

In [16]:
observation, reward, terminated, truncated, info = env.step({
        env.agents[0]: 7,
        env.agents[1]: 9,
    }
)

[90;3m2024-05-31 16:24:50,424 [0m[mgym                            [0m[mINFO       [0m[33m<283.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-05-31 16:24:50,425 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<283.00> [0m[92mEO-2: [0m[mtarget index 7 tasked[0m


[90;3m2024-05-31 16:24:50,425 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<283.00> [0m[92mEO-2: [0m[mTarget(tgt-420) tasked for imaging[0m


[90;3m2024-05-31 16:24:50,426 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<283.00> [0m[92mEO-2: [0m[mTarget(tgt-420) window enabled: 493.1 to 549.7[0m


[90;3m2024-05-31 16:24:50,426 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<283.00> [0m[92mEO-2: [0m[msetting timed terminal event at 549.7[0m


[90;3m2024-05-31 16:24:50,426 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<283.00> [0m[34mEO-3: [0m[mtarget index 9 tasked[0m


[90;3m2024-05-31 16:24:50,427 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<283.00> [0m[34mEO-3: [0m[mTarget(tgt-405) tasked for imaging[0m


[90;3m2024-05-31 16:24:50,427 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<283.00> [0m[34mEO-3: [0m[mTarget(tgt-405) window enabled: 619.8 to 745.3[0m


[90;3m2024-05-31 16:24:50,427 [0m[34msats.satellite.EO-3            [0m[mINFO       [0m[33m<283.00> [0m[34mEO-3: [0m[msetting timed terminal event at 745.3[0m


[90;3m2024-05-31 16:24:50,428 [0m[msim.simulator                  [0m[mINFO       [0m[33m<283.00> [0m[mRunning simulation at most to 1000000283.00 seconds[0m


[90;3m2024-05-31 16:24:50,470 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<496.00> [0m[92mEO-2: [0m[mimaged Target(tgt-420)[0m


[90;3m2024-05-31 16:24:50,472 [0m[mdata.base                      [0m[mINFO       [0m[33m<496.00> [0m[mData reward: {'EO-1_11385528784': 0.0, 'EO-2_11385529888': 0.3667140288589458, 'EO-3_11385529456': 0.0}[0m


[90;3m2024-05-31 16:24:50,477 [0m[92msats.satellite.EO-2            [0m[mINFO       [0m[33m<496.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-05-31 16:24:50,505 [0m[mgym                            [0m[mINFO       [0m[33m<496.00> [0m[mSatellites requiring retasking: ['EO-2_11385529888'][0m


[90;3m2024-05-31 16:24:50,506 [0m[mgym                            [0m[mINFO       [0m[33m<496.00> [0m[mStep reward: {'EO-2_11385529888': 0.3667140288589458, 'EO-3_11385529456': 0.0}[0m


[90;3m2024-05-31 16:24:50,506 [0m[mgym                            [0m[mINFO       [0m[33m<496.00> [0m[mEpisode terminated: {'EO-2_11385529888': False, 'EO-3_11385529456': False}[0m


[90;3m2024-05-31 16:24:50,506 [0m[mgym                            [0m[mINFO       [0m[33m<496.00> [0m[mEpisode truncated: {'EO-2_11385529888': False, 'EO-3_11385529456': False}[0m
