# Training an SB3 Agent

© Crown-owned copyright 2024, Defence Science and Technology Laboratory UK

This notebook will demonstrate how to use primaite to create and train a PPO agent, using a pre-defined configuration file.

#### First, we import the inital packages and read in our configuration file.

In [None]:
!primaite setup

In [1]:
from primaite.game.game import PrimaiteGame
from primaite.session.environment import PrimaiteGymEnv
from primaite.game.agent.scripted_agents import probabilistic_agent
import yaml

cls identifier: AbstractScriptedAgent
cls identifier: ProxyAgent


In [2]:
from primaite.config.load import data_manipulation_config_path

In [3]:
with open(data_manipulation_config_path(), 'r') as f:
    cfg = yaml.safe_load(f)
for agent in cfg['agents']:
    if agent['ref'] == 'defender':
        agent['agent_settings']['flatten_obs']=True

Using the given configuration, we generate the environment our agent will train in.

In [4]:
gym = PrimaiteGymEnv(env_config=cfg)

agent_cfg: {'ref': 'client_2_green_user', 'team': 'GREEN', 'type': 'ProbabilisticAgent', 'agent_settings': {'action_probabilities': {0: 0.3, 1: 0.6, 2: 0.1}}, 'action_space': {'action_map': {0: {'action': 'do_nothing', 'options': {}}, 1: {'action': 'node_application_execute', 'options': {'node_name': 'client_2', 'application_name': 'WebBrowser'}}, 2: {'action': 'node_application_execute', 'options': {'node_name': 'client_2', 'application_name': 'DatabaseClient'}}}}, 'reward_function': {'reward_components': [{'type': 'WEBPAGE_UNAVAILABLE_PENALTY', 'weight': 0.25, 'options': {'node_hostname': 'client_2'}}, {'type': 'GREEN_ADMIN_DATABASE_UNREACHABLE_PENALTY', 'weight': 0.05, 'options': {'node_hostname': 'client_2'}}]}}
agent_type: ProbabilisticAgent
cls._registry: {'AbstractScriptedAgent': <class 'primaite.game.agent.interface.AbstractScriptedAgent'>, 'ProxyAgent': <class 'primaite.game.agent.interface.ProxyAgent'>}


KeyError: 'ProbabilisticAgent'

Lets define training parameters for the agent.

In [None]:
from stable_baselines3 import PPO

EPISODE_LEN = 128
NUM_EPISODES = 5
NO_STEPS = EPISODE_LEN * NUM_EPISODES
BATCH_SIZE = 32
LEARNING_RATE = 3e-4

In [None]:
model = PPO('MlpPolicy', gym, learning_rate=LEARNING_RATE,  n_steps=NO_STEPS, batch_size=BATCH_SIZE, verbose=0, tensorboard_log="./PPO_UC2/")

With the agent configured, let's train for our defined number of episodes.

In [None]:
model.learn(total_timesteps=NO_STEPS)

Next, let's save the agent to a zip file that can be used in future evaluation.

In [None]:
model.save("PrimAITE-PPO-example-agent")

Now, we load the saved agent and run it in evaluation mode.

In [None]:
eval_model = PPO("MlpPolicy", gym)
eval_model = PPO.load("PrimAITE-PPO-example-agent", gym)

Finally, evaluate the agent.

In [None]:
from stable_baselines3.common.evaluation import evaluate_policy

evaluate_policy(eval_model, gym, n_eval_episodes=10)