# Train a Multi agent system using RLLIB

© Crown-owned copyright 2025, Defence Science and Technology Laboratory UK

This notebook will demonstrate how to use the `PrimaiteRayMARLEnv` to train a very basic system with two PPO agents on the [UC2 scenario](./Data-Manipulation-E2E-Demonstration.ipynb).

#### First, Import packages and read our config file.

In [None]:
!primaite setup

In [None]:
import yaml
import ray
from primaite import PRIMAITE_PATHS
from ray.rllib.algorithms.ppo import PPOConfig
from primaite.session.ray_envs import PrimaiteRayMARLEnv

In [None]:
with open(PRIMAITE_PATHS.user_config_path / 'example_config/data_manipulation_marl.yaml', 'r') as f:
    cfg = yaml.safe_load(f)
ray.init(local_mode=True)

#### Create a Ray algorithm config which accepts our two agents

In [None]:
config = (
    PPOConfig()
    .multi_agent(
        policies={'defender_1','defender_2'}, # These names are the same as the agents defined in the example config.
        policy_mapping_fn=lambda agent_id, episode, worker, **kw: agent_id,
        )
    .environment(env=PrimaiteRayMARLEnv, env_config=cfg)
    .env_runners(num_env_runners=0)
    .training(train_batch_size=128)
    .evaluation(evaluation_duration=1)
    )


#### Start the training
This example will save outputs to a default Ray directory and use mostly default settings.

In [None]:
algo = config.build()
results = algo.train()

### Evaluate the results

In [None]:
eval = algo.evaluate()