# Action Masking

© Crown-owned copyright 2025, Defence Science and Technology Laboratory UK

PrimAITE environments support action masking. The action mask shows which of the agent's actions are applicable with the current environment state. For example, a node can only be turned on if it is currently turned off. Please refer to the action masking configuration user guide page for more information.

In [1]:
!primaite setup

2025-03-24 09:50:34,098: Performing the PrimAITE first-time setup...
2025-03-24 09:50:34,099: Building the PrimAITE app directories...
2025-03-24 09:50:34,099: Building primaite_config.yaml...
2025-03-24 09:50:34,099: Rebuilding the demo notebooks...
/home/runner/primaite/4.0.0/notebooks/example_notebooks/Data-Manipulation-Customising-Red-Agent.ipynb
2025-03-24 09:50:34,100: Reset example notebook: /home/runner/primaite/4.0.0/notebooks/example_notebooks/Data-Manipulation-Customising-Red-Agent.ipynb
/home/runner/primaite/4.0.0/notebooks/example_notebooks/Command-and-Control-E2E-Demonstration.ipynb
2025-03-24 09:50:34,100: Reset example notebook: /home/runner/primaite/4.0.0/notebooks/example_notebooks/Command-and-Control-E2E-Demonstration.ipynb
/home/runner/primaite/4.0.0/notebooks/example_notebooks/Data-Manipulation-E2E-Demonstration.ipynb
2025-03-24 09:50:34,100: Reset example notebook: /home/runner/primaite/4.0.0/notebooks/example_notebooks/Data-Manipulation-E2E-Demonstration

In [2]:
from primaite.session.environment import PrimaiteGymEnv
from primaite.config.load import data_manipulation_config_path
from prettytable import PrettyTable

In [3]:
env = PrimaiteGymEnv(data_manipulation_config_path())
env.action_masking = True

2025-03-24 09:50:37,711: PrimaiteGymEnv RNG seed = None


The action mask is a list of booleans that specifies whether each action in the agent's action map is currently possible. Demonstrated here:

In [4]:
act_table = PrettyTable(("number", "action", "parameters", "mask"))
mask = env.action_masks()
actions = env.agent.action_manager.action_map
max_str_len = 70
for act,mask in zip(actions.items(), mask):
    act_num, act_data = act
    act_type, act_params = act_data
    act_params = s if len(s:=str(act_params))<max_str_len else f"{s[:max_str_len-3]}..."
    act_table.add_row((act_num, act_type, act_params, mask))
print(act_table)

+--------+------------------------+------------------------------------------------------------------------+------+
| number |         action         |                               parameters                               | mask |
+--------+------------------------+------------------------------------------------------------------------+------+
|   0    |       do-nothing       |                                   {}                                   |  1   |
|   1    |   node-service-scan    |       {'node_name': 'web_server', 'service_name': 'web-server'}        |  1   |
|   2    |   node-service-stop    |       {'node_name': 'web_server', 'service_name': 'web-server'}        |  1   |
|   3    |   node-service-start   |       {'node_name': 'web_server', 'service_name': 'web-server'}        |  0   |
|   4    |   node-service-pause   |       {'node_name': 'web_server', 'service_name': 'web-server'}        |  1   |
|   5    |  node-service-resume   |       {'node_name': 'web_server', 's

## Action masking for Stable Baselines3 agents
SB3 agents automatically use the action_masks method during the training loop

In [5]:
from sb3_contrib import MaskablePPO


E0000 00:00:1742809838.189955    4628 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1742809838.194378    4628 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1742809838.206533    4628 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1742809838.206546    4628 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1742809838.206548    4628 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1742809838.206550    4628 computation_placer.cc:177] computation placer already registered. Please check linka

In [6]:
model = MaskablePPO("MlpPolicy", env, gamma=0.4, seed=32)
model.learn(1024)

2025-03-24 09:50:41,329: Resetting environment, episode 0, avg. reward: 0.0


2025-03-24 09:50:41,332: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_0.json


2025-03-24 09:50:42,775: Resetting environment, episode 1, avg. reward: -41.00000000000006


2025-03-24 09:50:42,777: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_1.json


2025-03-24 09:50:44,019: Resetting environment, episode 2, avg. reward: -42.650000000000055


2025-03-24 09:50:44,021: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_2.json


2025-03-24 09:50:45,315: Resetting environment, episode 3, avg. reward: -17.69999999999997


2025-03-24 09:50:45,317: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_3.json


2025-03-24 09:50:46,639: Resetting environment, episode 4, avg. reward: -50.65000000000007


2025-03-24 09:50:46,640: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_4.json


2025-03-24 09:50:48,305: Resetting environment, episode 5, avg. reward: -17.099999999999977


2025-03-24 09:50:48,306: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_5.json


2025-03-24 09:50:49,668: Resetting environment, episode 6, avg. reward: -27.44999999999994


2025-03-24 09:50:49,670: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_6.json


2025-03-24 09:50:50,930: Resetting environment, episode 7, avg. reward: -40.65000000000004


2025-03-24 09:50:50,932: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_7.json


2025-03-24 09:50:52,260: Resetting environment, episode 8, avg. reward: -14.299999999999983


2025-03-24 09:50:52,261: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_8.json


2025-03-24 09:50:53,555: Resetting environment, episode 9, avg. reward: -19.500000000000018


2025-03-24 09:50:53,557: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_9.json


2025-03-24 09:50:55,291: Resetting environment, episode 10, avg. reward: -61.3500000000001


2025-03-24 09:50:55,293: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_10.json


2025-03-24 09:50:56,621: Resetting environment, episode 11, avg. reward: -50.00000000000007


2025-03-24 09:50:56,622: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_11.json


2025-03-24 09:50:57,905: Resetting environment, episode 12, avg. reward: -20.74999999999996


2025-03-24 09:50:57,906: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_12.json


2025-03-24 09:50:59,194: Resetting environment, episode 13, avg. reward: -17.64999999999997


2025-03-24 09:50:59,195: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_13.json


2025-03-24 09:51:00,820: Resetting environment, episode 14, avg. reward: -44.20000000000005


2025-03-24 09:51:00,821: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_14.json


2025-03-24 09:51:02,131: Resetting environment, episode 15, avg. reward: -61.050000000000104


2025-03-24 09:51:02,133: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_15.json


2025-03-24 09:51:03,427: Resetting environment, episode 16, avg. reward: -28.49999999999995


2025-03-24 09:51:03,429: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_16.json


<sb3_contrib.ppo_mask.ppo_mask.MaskablePPO at 0x7f12341af550>

## Action masking for Ray RLLib agents
Ray uses a different API to obtain action masks, but this is handled by the PrimaiteRayEnv and PrimaiteRayMarlEnv classes

In [7]:
from primaite.session.ray_envs import PrimaiteRayEnv
from ray.rllib.algorithms.ppo import PPOConfig
import yaml
from ray.rllib.examples.rl_modules.classes.action_masking_rlm import ActionMaskingTorchRLModule
from ray.rllib.core.rl_module.rl_module import SingleAgentRLModuleSpec


In [8]:
with open(data_manipulation_config_path(), 'r') as f:
    cfg = yaml.safe_load(f)
for agent in cfg['agents']:
    if agent["ref"] == "defender":
        agent['agent_settings']['flatten_obs'] = True
env_config = cfg


In [9]:
config = (
    PPOConfig()
    .api_stack(enable_rl_module_and_learner=True, enable_env_runner_and_connector_v2=True)
    .environment(env=PrimaiteRayEnv, env_config=cfg, action_mask_key="action_mask")
    .rl_module(rl_module_spec=SingleAgentRLModuleSpec(module_class = ActionMaskingTorchRLModule))
    .env_runners(num_env_runners=0)
    .training(train_batch_size=128)
)
algo = config.build()
results = algo.train()



`UnifiedLogger` will be removed in Ray 2.7.
  return UnifiedLogger(config, logdir, loggers=None)
The `JsonLogger interface is deprecated in favor of the `ray.tune.json.JsonLoggerCallback` interface and will be removed in Ray 2.7.
  self._loggers.append(cls(self.config, self.logdir, self.trial))
The `CSVLogger interface is deprecated in favor of the `ray.tune.csv.CSVLoggerCallback` interface and will be removed in Ray 2.7.
  self._loggers.append(cls(self.config, self.logdir, self.trial))
The `TBXLogger interface is deprecated in favor of the `ray.tune.tensorboardx.TBXLoggerCallback` interface and will be removed in Ray 2.7.
  self._loggers.append(cls(self.config, self.logdir, self.trial))
2025-03-24 09:51:06,261: PrimaiteGymEnv RNG seed = None


2025-03-24 09:51:06,324: Resetting environment, episode 0, avg. reward: 0.0


2025-03-24 09:51:06,326: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_0.json


2025-03-24 09:51:07,473: Resetting environment, episode 1, avg. reward: -74.79999999999993


2025-03-24 09:51:07,475: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_1.json




## Action masking with MARL in Ray RLLib

Each agent has their own action mask which is useful for multi-agent environments where each agent are configured with different action spaces.

The code snippets below demonstrate how users can use multiple agents with action masks using the [UC2 MARL example config](./Training-an-RLLIB-MARL-System.ipynb).

In [10]:
from ray.rllib.core.rl_module.marl_module import MultiAgentRLModuleSpec
from primaite.session.ray_envs import PrimaiteRayMARLEnv
from primaite.config.load import data_manipulation_marl_config_path

In [11]:
with open(data_manipulation_marl_config_path(), 'r') as f:
    cfg = yaml.safe_load(f)
env_config = cfg


In [12]:
config = (
    PPOConfig()
    .multi_agent(
        policies={'defender_1','defender_2'}, # These names are the same as the agents defined in the example config.
        policy_mapping_fn=lambda agent_id, *args, **kwargs: agent_id,
        )
    .api_stack(enable_rl_module_and_learner=True, enable_env_runner_and_connector_v2=True)
    .environment(env=PrimaiteRayMARLEnv, env_config=cfg, action_mask_key="action_mask")
    .rl_module(rl_module_spec=MultiAgentRLModuleSpec(module_specs={
        "defender_1":SingleAgentRLModuleSpec(module_class=ActionMaskingTorchRLModule),
        "defender_2":SingleAgentRLModuleSpec(module_class=ActionMaskingTorchRLModule),
        }))
    .env_runners(num_env_runners=0)
    .training(train_batch_size=128)
)
algo = config.build()
results = algo.train()



`UnifiedLogger` will be removed in Ray 2.7.
  return UnifiedLogger(config, logdir, loggers=None)
The `JsonLogger interface is deprecated in favor of the `ray.tune.json.JsonLoggerCallback` interface and will be removed in Ray 2.7.
  self._loggers.append(cls(self.config, self.logdir, self.trial))
The `CSVLogger interface is deprecated in favor of the `ray.tune.csv.CSVLoggerCallback` interface and will be removed in Ray 2.7.
  self._loggers.append(cls(self.config, self.logdir, self.trial))
The `TBXLogger interface is deprecated in favor of the `ray.tune.tensorboardx.TBXLoggerCallback` interface and will be removed in Ray 2.7.
  self._loggers.append(cls(self.config, self.logdir, self.trial))


2025-03-24 09:51:08,313: Resetting environment, episode 0, avg. reward: {'defender_1': 0.0, 'defender_2': 0.0}


2025-03-24 09:51:08,314: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_0.json


2025-03-24 09:51:08,426: step: 1, Rewards: {'defender_1': 0.65, 'defender_2': 0.65}


2025-03-24 09:51:08,458: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_2.json


2025-03-24 09:51:08,461: Resetting environment, episode 1, avg. reward: {'defender_1': 0.65, 'defender_2': 0.65}


2025-03-24 09:51:08,462: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_1.json


2025-03-24 09:51:08,584: step: 1, Rewards: {'defender_1': 0.7000000000000001, 'defender_2': 0.7000000000000001}


2025-03-24 09:51:08,598: step: 2, Rewards: {'defender_1': 0.7000000000000001, 'defender_2': 0.7000000000000001}


2025-03-24 09:51:08,614: step: 3, Rewards: {'defender_1': 0.45, 'defender_2': 0.45}


2025-03-24 09:51:08,628: step: 4, Rewards: {'defender_1': 0.45, 'defender_2': 0.45}


2025-03-24 09:51:08,640: step: 5, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,655: step: 6, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,668: step: 7, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,680: step: 8, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,694: step: 9, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,707: step: 10, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,720: step: 11, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,733: step: 12, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,746: step: 13, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,759: step: 14, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,772: step: 15, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,785: step: 16, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,798: step: 17, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,810: step: 18, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,822: step: 19, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,836: step: 20, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,848: step: 21, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,860: step: 22, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,873: step: 23, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,885: step: 24, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,898: step: 25, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,913: step: 26, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,928: step: 27, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,942: step: 28, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,956: step: 29, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,969: step: 30, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,983: step: 31, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:08,995: step: 32, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,009: step: 33, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,022: step: 34, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,036: step: 35, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,048: step: 36, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,065: step: 37, Rewards: {'defender_1': 0.35000000000000003, 'defender_2': 0.35000000000000003}


2025-03-24 09:51:09,081: step: 38, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,098: step: 39, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,113: step: 40, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,127: step: 41, Rewards: {'defender_1': -0.55, 'defender_2': -0.55}


2025-03-24 09:51:09,142: step: 42, Rewards: {'defender_1': -0.55, 'defender_2': -0.55}


2025-03-24 09:51:09,155: step: 43, Rewards: {'defender_1': -0.55, 'defender_2': -0.55}


2025-03-24 09:51:09,171: step: 44, Rewards: {'defender_1': -0.55, 'defender_2': -0.55}


2025-03-24 09:51:09,190: step: 45, Rewards: {'defender_1': -0.55, 'defender_2': -0.55}


2025-03-24 09:51:09,206: step: 46, Rewards: {'defender_1': -0.55, 'defender_2': -0.55}


2025-03-24 09:51:09,221: step: 47, Rewards: {'defender_1': -0.55, 'defender_2': -0.55}


2025-03-24 09:51:09,235: step: 48, Rewards: {'defender_1': -0.55, 'defender_2': -0.55}


2025-03-24 09:51:09,248: step: 49, Rewards: {'defender_1': -0.55, 'defender_2': -0.55}


2025-03-24 09:51:09,262: step: 50, Rewards: {'defender_1': -0.55, 'defender_2': -0.55}


2025-03-24 09:51:09,274: step: 51, Rewards: {'defender_1': -0.55, 'defender_2': -0.55}


2025-03-24 09:51:09,286: step: 52, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,299: step: 53, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,314: step: 54, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,328: step: 55, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,343: step: 56, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,356: step: 57, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,370: step: 58, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,382: step: 59, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,396: step: 60, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,410: step: 61, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,424: step: 62, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,436: step: 63, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,449: step: 64, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,462: step: 65, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,475: step: 66, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,487: step: 67, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,500: step: 68, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,512: step: 69, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,525: step: 70, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,539: step: 71, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,552: step: 72, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,564: step: 73, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,577: step: 74, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,589: step: 75, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,602: step: 76, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,614: step: 77, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,627: step: 78, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,640: step: 79, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,654: step: 80, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,668: step: 81, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,680: step: 82, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,693: step: 83, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,707: step: 84, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,720: step: 85, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,734: step: 86, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,746: step: 87, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,760: step: 88, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,773: step: 89, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,787: step: 90, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,799: step: 91, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,813: step: 92, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,825: step: 93, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,838: step: 94, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,854: step: 95, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,867: step: 96, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,881: step: 97, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,895: step: 98, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,907: step: 99, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,921: step: 100, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,933: step: 101, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,947: step: 102, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,960: step: 103, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,973: step: 104, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,985: step: 105, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:09,997: step: 106, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:10,008: step: 107, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:10,020: step: 108, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:10,033: step: 109, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:10,044: step: 110, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:10,056: step: 111, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:10,070: step: 112, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:10,084: step: 113, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:10,099: step: 114, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:10,112: step: 115, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:10,124: step: 116, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:10,137: step: 117, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:10,152: step: 118, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:10,165: step: 119, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:10,179: step: 120, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:10,192: step: 121, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:10,206: step: 122, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:10,220: step: 123, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:10,235: step: 124, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:10,250: step: 125, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:10,265: step: 126, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:51:10,279: step: 127, Rewards: {'defender_1': -0.55, 'defender_2': -0.55}


2025-03-24 09:51:10,291: step: 128, Rewards: {'defender_1': -0.55, 'defender_2': -0.55}


2025-03-24 09:51:10,296: Resetting environment, episode 2, avg. reward: {'defender_1': -20.99999999999998, 'defender_2': -20.99999999999998}


2025-03-24 09:51:10,297: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-50-34/agent_actions/episode_2.json
