# Train a Multi agent system using RLLIB

© Crown-owned copyright 2025, Defence Science and Technology Laboratory UK

This notebook will demonstrate how to use the `PrimaiteRayMARLEnv` to train a very basic system with two PPO agents on the [UC2 scenario](./Data-Manipulation-E2E-Demonstration.ipynb).

#### First, Import packages and read our config file.

In [1]:
!primaite setup

2025-03-24 09:52:33,903: Performing the PrimAITE first-time setup...
2025-03-24 09:52:33,903: Building the PrimAITE app directories...
2025-03-24 09:52:33,903: Building primaite_config.yaml...
2025-03-24 09:52:33,903: Rebuilding the demo notebooks...
2025-03-24 09:52:33,927: Rebuilding the example notebooks...
2025-03-24 09:52:33,928: PrimAITE setup complete!


In [2]:
import yaml
import ray
from primaite import PRIMAITE_PATHS
from ray.rllib.algorithms.ppo import PPOConfig
from primaite.session.ray_envs import PrimaiteRayMARLEnv

E0000 00:00:1742809954.807608    6703 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1742809954.812034    6703 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1742809954.824305    6703 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1742809954.824315    6703 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1742809954.824317    6703 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1742809954.824319    6703 computation_placer.cc:177] computation placer already registered. Please check linka

In [3]:
with open(PRIMAITE_PATHS.user_config_path / 'example_config/data_manipulation_marl.yaml', 'r') as f:
    cfg = yaml.safe_load(f)
ray.init(local_mode=True)

2025-03-24 09:52:43,871	INFO worker.py:1788 -- Started a local Ray instance.


0,1
Python version:,3.10.16
Ray version:,2.32.0


#### Create a Ray algorithm config which accepts our two agents

In [4]:
config = (
    PPOConfig()
    .multi_agent(
        policies={'defender_1','defender_2'}, # These names are the same as the agents defined in the example config.
        policy_mapping_fn=lambda agent_id, episode, worker, **kw: agent_id,
        )
    .environment(env=PrimaiteRayMARLEnv, env_config=cfg)
    .env_runners(num_env_runners=0)
    .training(train_batch_size=128)
    .evaluation(evaluation_duration=1)
    )


#### Start the training
This example will save outputs to a default Ray directory and use mostly default settings.

In [5]:
algo = config.build()
results = algo.train()

`UnifiedLogger` will be removed in Ray 2.7.
  return UnifiedLogger(config, logdir, loggers=None)
The `JsonLogger interface is deprecated in favor of the `ray.tune.json.JsonLoggerCallback` interface and will be removed in Ray 2.7.
  self._loggers.append(cls(self.config, self.logdir, self.trial))
The `CSVLogger interface is deprecated in favor of the `ray.tune.csv.CSVLoggerCallback` interface and will be removed in Ray 2.7.
  self._loggers.append(cls(self.config, self.logdir, self.trial))
The `TBXLogger interface is deprecated in favor of the `ray.tune.tensorboardx.TBXLoggerCallback` interface and will be removed in Ray 2.7.
  self._loggers.append(cls(self.config, self.logdir, self.trial))


2025-03-24 09:52:46,372: Resetting environment, episode 0, avg. reward: {'defender_1': 0.0, 'defender_2': 0.0}


2025-03-24 09:52:46,376: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-52-39/agent_actions/episode_0.json


2025-03-24 09:52:46,498: step: 1, Rewards: {'defender_1': 0.65, 'defender_2': 0.65}


2025-03-24 09:52:46,519: step: 2, Rewards: {'defender_1': -0.09999999999999998, 'defender_2': -0.09999999999999998}


2025-03-24 09:52:46,538: step: 3, Rewards: {'defender_1': -0.04999999999999999, 'defender_2': -0.04999999999999999}


2025-03-24 09:52:46,554: step: 4, Rewards: {'defender_1': -0.04999999999999999, 'defender_2': -0.04999999999999999}


2025-03-24 09:52:46,568: step: 5, Rewards: {'defender_1': -0.04999999999999999, 'defender_2': -0.04999999999999999}


2025-03-24 09:52:46,583: step: 6, Rewards: {'defender_1': -0.04999999999999999, 'defender_2': -0.04999999999999999}


2025-03-24 09:52:46,596: step: 7, Rewards: {'defender_1': -0.04999999999999999, 'defender_2': -0.04999999999999999}


2025-03-24 09:52:46,610: step: 8, Rewards: {'defender_1': -0.04999999999999999, 'defender_2': -0.04999999999999999}


2025-03-24 09:52:46,623: step: 9, Rewards: {'defender_1': -0.04999999999999999, 'defender_2': -0.04999999999999999}


2025-03-24 09:52:46,637: step: 10, Rewards: {'defender_1': -0.04999999999999999, 'defender_2': -0.04999999999999999}


2025-03-24 09:52:46,650: step: 11, Rewards: {'defender_1': -0.04999999999999999, 'defender_2': -0.04999999999999999}


2025-03-24 09:52:46,665: step: 12, Rewards: {'defender_1': -0.04999999999999999, 'defender_2': -0.04999999999999999}


2025-03-24 09:52:46,678: step: 13, Rewards: {'defender_1': -0.04999999999999999, 'defender_2': -0.04999999999999999}


2025-03-24 09:52:46,693: step: 14, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:46,707: step: 15, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:46,721: step: 16, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:46,734: step: 17, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:46,747: step: 18, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:46,760: step: 19, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:46,775: step: 20, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:46,789: step: 21, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:46,802: step: 22, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:46,815: step: 23, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:46,829: step: 24, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:46,843: step: 25, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:46,857: step: 26, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:46,870: step: 27, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:46,883: step: 28, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:46,897: step: 29, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:46,910: step: 30, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:46,923: step: 31, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:46,935: step: 32, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:46,953: step: 33, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:46,966: step: 34, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:46,979: step: 35, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:46,993: step: 36, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,007: step: 37, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,019: step: 38, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,032: step: 39, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,045: step: 40, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,058: step: 41, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,071: step: 42, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,086: step: 43, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,099: step: 44, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,112: step: 45, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,126: step: 46, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,139: step: 47, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,151: step: 48, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,164: step: 49, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,178: step: 50, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,191: step: 51, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,204: step: 52, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,217: step: 53, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,231: step: 54, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,245: step: 55, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,260: step: 56, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,274: step: 57, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,288: step: 58, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,301: step: 59, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,315: step: 60, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,329: step: 61, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,343: step: 62, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,357: step: 63, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,371: step: 64, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,385: step: 65, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,399: step: 66, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,413: step: 67, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,426: step: 68, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,440: step: 69, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,453: step: 70, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,467: step: 71, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,481: step: 72, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,495: step: 73, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,509: step: 74, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,523: step: 75, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,538: step: 76, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,553: step: 77, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,571: step: 78, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,586: step: 79, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,601: step: 80, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,616: step: 81, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,629: step: 82, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,643: step: 83, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,656: step: 84, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,669: step: 85, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,682: step: 86, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,697: step: 87, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,711: step: 88, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,725: step: 89, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,738: step: 90, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,753: step: 91, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,767: step: 92, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,781: step: 93, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,796: step: 94, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,809: step: 95, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,824: step: 96, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,837: step: 97, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,851: step: 98, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,864: step: 99, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,878: step: 100, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,891: step: 101, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,904: step: 102, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,917: step: 103, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,930: step: 104, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,943: step: 105, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,958: step: 106, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,970: step: 107, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,982: step: 108, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:47,995: step: 109, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:48,007: step: 110, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:48,020: step: 111, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:48,033: step: 112, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:48,045: step: 113, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:48,058: step: 114, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:48,071: step: 115, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:48,085: step: 116, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:48,098: step: 117, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:48,111: step: 118, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:48,124: step: 119, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:48,138: step: 120, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:48,152: step: 121, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:48,165: step: 122, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:48,178: step: 123, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:48,191: step: 124, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:48,204: step: 125, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:48,219: step: 126, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:48,233: step: 127, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:48,247: step: 128, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:48,270: Resetting environment, episode 1, avg. reward: {'defender_1': -17.250000000000014, 'defender_2': -17.250000000000014}


2025-03-24 09:52:48,271: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-52-39/agent_actions/episode_1.json


2025-03-24 09:52:48,627: step: 1, Rewards: {'defender_1': 0.65, 'defender_2': 0.65}




### Evaluate the results

In [6]:
eval = algo.evaluate()

2025-03-24 09:52:49,266: step: 2, Rewards: {'defender_1': 0.65, 'defender_2': 0.65}


2025-03-24 09:52:49,285: step: 3, Rewards: {'defender_1': 0.4, 'defender_2': 0.4}


2025-03-24 09:52:49,297: step: 4, Rewards: {'defender_1': 0.4, 'defender_2': 0.4}


2025-03-24 09:52:49,310: step: 5, Rewards: {'defender_1': -0.09999999999999998, 'defender_2': -0.09999999999999998}


2025-03-24 09:52:49,323: step: 6, Rewards: {'defender_1': -0.09999999999999998, 'defender_2': -0.09999999999999998}


2025-03-24 09:52:49,336: step: 7, Rewards: {'defender_1': -0.09999999999999998, 'defender_2': -0.09999999999999998}


2025-03-24 09:52:49,350: step: 8, Rewards: {'defender_1': -0.09999999999999998, 'defender_2': -0.09999999999999998}


2025-03-24 09:52:49,364: step: 9, Rewards: {'defender_1': -0.09999999999999998, 'defender_2': -0.09999999999999998}


2025-03-24 09:52:49,376: step: 10, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,389: step: 11, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,402: step: 12, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,416: step: 13, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,429: step: 14, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,443: step: 15, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,455: step: 16, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,468: step: 17, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,481: step: 18, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,495: step: 19, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,507: step: 20, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,520: step: 21, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,534: step: 22, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,549: step: 23, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,562: step: 24, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,576: step: 25, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,589: step: 26, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,602: step: 27, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,615: step: 28, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,629: step: 29, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,643: step: 30, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,657: step: 31, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,677: step: 32, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,692: step: 33, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,707: step: 34, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,722: step: 35, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,735: step: 36, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,750: step: 37, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,766: step: 38, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,781: step: 39, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,796: step: 40, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,810: step: 41, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,824: step: 42, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,838: step: 43, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,853: step: 44, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,869: step: 45, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,885: step: 46, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,902: step: 47, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,916: step: 48, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,930: step: 49, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,945: step: 50, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,959: step: 51, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,973: step: 52, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:49,987: step: 53, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,002: step: 54, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,016: step: 55, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,030: step: 56, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,044: step: 57, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,058: step: 58, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,071: step: 59, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,085: step: 60, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,099: step: 61, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,112: step: 62, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,126: step: 63, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,139: step: 64, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,153: step: 65, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,167: step: 66, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,180: step: 67, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,194: step: 68, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,208: step: 69, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,221: step: 70, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,234: step: 71, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,248: step: 72, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,261: step: 73, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,274: step: 74, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,286: step: 75, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,300: step: 76, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,312: step: 77, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,324: step: 78, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,337: step: 79, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,350: step: 80, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,363: step: 81, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,376: step: 82, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,389: step: 83, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,403: step: 84, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,415: step: 85, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,428: step: 86, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,442: step: 87, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,456: step: 88, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,469: step: 89, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,482: step: 90, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,494: step: 91, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,507: step: 92, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,521: step: 93, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,534: step: 94, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,546: step: 95, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,560: step: 96, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,575: step: 97, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,589: step: 98, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,605: step: 99, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,620: step: 100, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,632: step: 101, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,646: step: 102, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,659: step: 103, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,672: step: 104, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,686: step: 105, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,702: step: 106, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,718: step: 107, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,733: step: 108, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,749: step: 109, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,763: step: 110, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,777: step: 111, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,791: step: 112, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,805: step: 113, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,817: step: 114, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,830: step: 115, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,844: step: 116, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,860: step: 117, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,873: step: 118, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,886: step: 119, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,900: step: 120, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,913: step: 121, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,928: step: 122, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,942: step: 123, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,954: step: 124, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,969: step: 125, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,982: step: 126, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:50,997: step: 127, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:51,010: step: 128, Rewards: {'defender_1': -0.14999999999999997, 'defender_2': -0.14999999999999997}


2025-03-24 09:52:51,033: Resetting environment, episode 2, avg. reward: {'defender_1': -16.250000000000025, 'defender_2': -16.250000000000025}


2025-03-24 09:52:51,034: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/09-52-39/agent_actions/episode_2.json


2025-03-24 09:52:51,392: step: 1, Rewards: {'defender_1': 0.65, 'defender_2': 0.65}
