# Training an Agent on UC7

© Crown-owned copyright 2025, Defence Science and Technology Laboratory UK

This notebook is identical in content to the [training an SB3 agent](./Training-an-SB3-Agent.ipynb) except this notebook trains an agent on the [use case 7 scenario](./UC7-E2E-Demo.ipynb) rather than [use case 2](./Data-Manipulation-E2E-Demonstration.ipynb). By default, the `uc7_config.yaml` blue agent (`defender`) is setup to defend against Threat Actor Profile (TAP) 001 which can be explored in more detail [here](./UC7-TAP001-Kill-Chain-E2E.ipynb).


#### First, we import the inital packages and read in our configuration file.

In [1]:
!primaite setup

2025-03-24 09:57:44,637: Performing the PrimAITE first-time setup...
2025-03-24 09:57:44,637: Building the PrimAITE app directories...
2025-03-24 09:57:44,637: Building primaite_config.yaml...
2025-03-24 09:57:44,638: Rebuilding the demo notebooks...
2025-03-24 09:57:44,660: Rebuilding the example notebooks...
2025-03-24 09:57:44,662: PrimAITE setup complete!


In [2]:
import yaml
from primaite.session.environment import PrimaiteGymEnv
from primaite import PRIMAITE_PATHS
from prettytable import PrettyTable
from deepdiff.diff import DeepDiff
scenario_path = PRIMAITE_PATHS.user_config_path / "example_config/uc7_config.yaml"

In [3]:
gym = PrimaiteGymEnv(env_config=scenario_path)

2025-03-24 09:57:48,900: PrimaiteGymEnv RNG seed = None


In [4]:
from stable_baselines3 import PPO

# EPISODE_LEN = 128
EPISODE_LEN = 128
NUM_EPISODES = 10
NO_STEPS = EPISODE_LEN * NUM_EPISODES
BATCH_SIZE = 32
LEARNING_RATE = 3e-4

E0000 00:00:1742810269.217227    9203 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1742810269.221691    9203 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1742810269.233774    9203 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1742810269.233786    9203 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1742810269.233788    9203 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1742810269.233789    9203 computation_placer.cc:177] computation placer already registered. Please check linka

In [5]:
model = PPO('MlpPolicy', gym, learning_rate=LEARNING_RATE,  n_steps=NO_STEPS, batch_size=BATCH_SIZE, verbose=0, tensorboard_log="./PPO_UC7/")

In [6]:
model.learn(total_timesteps=NO_STEPS)

2025-03-24 09:57:52,384: Resetting environment, episode 0, avg. reward: 0.0


2025-03-24 09:58:05,306: Resetting environment, episode 1, avg. reward: 120.30312499999992


2025-03-24 09:58:14,320: Resetting environment, episode 2, avg. reward: 120.14375000000013


2025-03-24 09:58:27,043: Resetting environment, episode 3, avg. reward: 121.4515624999999


2025-03-24 09:58:36,466: Resetting environment, episode 4, avg. reward: 132.703125


2025-03-24 09:58:49,621: Resetting environment, episode 5, avg. reward: 130.85625000000016


2025-03-24 09:58:58,545: Resetting environment, episode 6, avg. reward: 123.87812500000005


2025-03-24 09:59:07,254: Resetting environment, episode 7, avg. reward: 124.36250000000038


2025-03-24 09:59:16,760: Resetting environment, episode 8, avg. reward: 140.02500000000006


2025-03-24 09:59:29,817: Resetting environment, episode 9, avg. reward: 127.27187499999992


2025-03-24 09:59:42,874: Resetting environment, episode 10, avg. reward: 122.03124999999993


<stable_baselines3.ppo.ppo.PPO at 0x7f94e9134280>

In [7]:
model.save("PrimAITE-PPO-UC7-example-agent")

In [8]:
eval_model = PPO("MlpPolicy", gym)
eval_model = PPO.load("PrimAITE-PPO-UC7-example-agent", gym)

In [9]:
from stable_baselines3.common.evaluation import evaluate_policy

evaluate_policy(eval_model, gym, n_eval_episodes=1)

2025-03-24 09:59:44,792: Resetting environment, episode 11, avg. reward: 0.0


2025-03-24 09:59:54,727: Resetting environment, episode 12, avg. reward: 140.51406249999997


(140.51406466960907, 0.0)