# Training an Agent on UC7

© Crown-owned copyright 2025, Defence Science and Technology Laboratory UK

This notebook is identical in content to the [training an SB3 agent](./Training-an-SB3-Agent.ipynb) except this notebook trains an agent on the [use case 7 scenario](./UC7-E2E-Demo.ipynb) rather than [use case 2](./Data-Manipulation-E2E-Demonstration.ipynb). By default, the `uc7_config.yaml` blue agent (`defender`) is setup to defend against Threat Actor Profile (TAP) 001 which can be explored in more detail [here](./UC7-TAP001-Kill-Chain-E2E.ipynb).


#### First, we import the inital packages and read in our configuration file.

In [None]:
!primaite setup

In [None]:
import yaml
from primaite.session.environment import PrimaiteGymEnv
from primaite import PRIMAITE_PATHS
from prettytable import PrettyTable
from deepdiff.diff import DeepDiff
scenario_path = PRIMAITE_PATHS.user_config_path / "example_config/uc7_config.yaml"

In [None]:
gym = PrimaiteGymEnv(env_config=scenario_path)

In [None]:
from stable_baselines3 import PPO

# EPISODE_LEN = 128
EPISODE_LEN = 128
NUM_EPISODES = 10
NO_STEPS = EPISODE_LEN * NUM_EPISODES
BATCH_SIZE = 32
LEARNING_RATE = 3e-4

In [None]:
model = PPO('MlpPolicy', gym, learning_rate=LEARNING_RATE,  n_steps=NO_STEPS, batch_size=BATCH_SIZE, verbose=0, tensorboard_log="./PPO_UC7/")

In [None]:
model.learn(total_timesteps=NO_STEPS)

In [None]:
model.save("PrimAITE-PPO-UC7-example-agent")

In [None]:
eval_model = PPO("MlpPolicy", gym)
eval_model = PPO.load("PrimAITE-PPO-UC7-example-agent", gym)

In [None]:
from stable_baselines3.common.evaluation import evaluate_policy

evaluate_policy(eval_model, gym, n_eval_episodes=1)