# Using Episode Schedules

© Crown-owned copyright 2025, Defence Science and Technology Laboratory UK

PrimAITE supports the ability to use different variations on a scenario at different episodes. This can be used to increase 
domain randomisation to prevent overfitting, or to set up curriculum learning to train agents to perform more complicated tasks.

When using a fixed scenario, a single yaml config file is used. However, to use episode schedules, PrimAITE uses a 
directory with several config files that work together.

## Demonstration

Run `primaite setup` to copy the example config files into the correct directory. Then, import and define config location.

In [1]:
!primaite setup

2025-03-24 10:01:41,310: Performing the PrimAITE first-time setup...
2025-03-24 10:01:41,310: Building the PrimAITE app directories...
2025-03-24 10:01:41,310: Building primaite_config.yaml...
2025-03-24 10:01:41,310: Rebuilding the demo notebooks...
2025-03-24 10:01:41,333: Rebuilding the example notebooks...
2025-03-24 10:01:41,335: PrimAITE setup complete!


In [2]:
import yaml
from primaite.session.environment import PrimaiteGymEnv
from primaite import PRIMAITE_PATHS
from prettytable import PrettyTable
scenario_path = PRIMAITE_PATHS.user_config_path / "example_config/scenario_with_placeholders"

### Base Scenario File
Let's view the contents of the base scenario file:

It contains all the base settings that stay fixed throughout all episodes, including the `io_settings`, `game` settings, the network layout and the blue agent definition. There are two placeholders: `*greens` and `*reds`.

In [3]:
with open(scenario_path/"scenario.yaml") as f:
    print(f.read())

metadata:
    version: 3.0

io_settings:
  save_agent_actions: true
  save_step_metadata: false
  save_pcap_logs: false
  save_sys_logs: false


game:
  max_episode_length: 128
  ports:
  - HTTP
  - POSTGRES_SERVER
  protocols:
  - ICMP
  - TCP
  - UDP
  thresholds:
    nmne:
      high: 10
      medium: 5
      low: 0

agents:
  - *greens
  - *reds

  - ref: defender
    team: BLUE
    type: proxy-agent
    observation_space:
      type: custom
      options:
        components:
          - type: nodes
            label: NODES
            options:
              routers: []
              hosts:
                - hostname: client
                - hostname: server
              num_services: 1
              num_applications: 1
              num_folders: 1
              num_files: 1
              num_nics: 1
              include_num_access: false
              include_nmne: true

          - type: links
            label: LINKS
            options:
              link_references:
       

### Schedule File
Let's view the contents of the schedule file:

This file references the base scenario file and defines which variations should be loaded in at each episode. In this instance, there are four episodes, during the first episode `greens_0` and `reds_0` is used, during the second episode `greens_0` and `reds_1` is used, and so on.

In [4]:
with open(scenario_path/"schedule.yaml") as f:
    print(f.read())

base_scenario: scenario.yaml
schedule:
  0:
    - greens_0.yaml
    - reds_0.yaml
  1:
    - greens_0.yaml
    - reds_1.yaml
  2:
    - greens_1.yaml
    - reds_1.yaml
  3:
    - greens_2.yaml
    - reds_2.yaml



### Green Agent Variation Files

There are three different variants of the green agent setup. In `greens_0`, there are no green agents, in `greens_1` there is a green agent that executes the database client application 80% of the time, and in `greens_2` there is a green agent that executes the database client application 5% of the time.

(the difference between `greens_1` and `greens_2` is in the agent name and action probabilities)

In [5]:
with open(scenario_path/"greens_0.yaml") as f:
    print(f.read())

# No green agents present
greens: &greens []



In [6]:
with open(scenario_path/"greens_1.yaml") as f:
    print(f.read())

agents: &greens
  - ref: green_A
    team: GREEN
    type: probabilistic-agent
    agent_settings:
      action_probabilities:
        0: 0.2
        1: 0.8

    action_space:
      action_map:
        0:
          action: do-nothing
          options: {}
        1:
          action: node-application-execute
          options:
            node_name: client
            application_name: database-client

    reward_function:
      reward_components:
        - type: green-admin-database-unreachable-penalty
          weight: 1.0
          options:
            node_hostname: client



In [7]:
with open(scenario_path/"greens_2.yaml") as f:
    print(f.read())

agents: &greens
  - ref: green_B
    team: GREEN
    type: probabilistic-agent
    agent_settings:
      action_probabilities:
        0: 0.95
        1: 0.05

    action_space:
      action_map:
        0:
          action: do-nothing
          options: {}
        1:
          action: node-application-execute
          options:
            node_name: client
            application_name: database-client

    reward_function:
      reward_components:
        - type: green-admin-database-unreachable-penalty
          weight: 1.0
          options:
            node_hostname: client



### Red Agent Variation Files

There are three different variants of the red agent setup. In `reds_0`, there are no red agents, in `reds_1` there is a red agent that executes every 20 steps, but in `reds_2` there is a red agent that executes every 2 steps.

In [8]:
with open(scenario_path/"reds_0.yaml") as f:
    print(f.read())

# No red agents present
reds: &reds []



In [9]:
with open(scenario_path/"reds_1.yaml") as f:
    print(f.read())

reds: &reds
  - ref: red_A
    team: RED
    type: red-database-corrupting-agent

    agent_settings:
      possible_start_nodes: [client,]
      target_application: data-manipulation-bot
      start_step: 10
      frequency: 10
      variance: 0



In [10]:
with open(scenario_path/"reds_2.yaml") as f:
    print(f.read())

reds: &reds
  - ref: red_B
    team: RED
    type: red-database-corrupting-agent

    agent_settings:
      possible_start_nodes: [client_1]
      target_application: data-manipulation-bot
      start_step: 3
      frequency: 2
      variance: 1



## Running the simulation

Create the environment using the variable config.

In [11]:
env = PrimaiteGymEnv(env_config=scenario_path)

2025-03-24 10:01:44,891: PrimaiteGymEnv RNG seed = None


### Episode 0
Let's run the episodes to verify that the agents are changing as expected. In episode 0, there should be no green or red agents, just the defender blue agent.

In [12]:
print(f"Current episode number: {env.episode_counter}")
print(f"Agents present: {list(env.game.agents.keys())}")

Current episode number: 0
Agents present: ['defender']


### Episode 1
When we reset the environment, it moves onto episode 1, where it will bring in reds_1 for red agent definition.


In [13]:
env.reset()
print(f"Current episode number: {env.episode_counter}")
print(f"Agents present: {list(env.game.agents.keys())}")

2025-03-24 10:01:44,904: Resetting environment, episode 0, avg. reward: 0.0


2025-03-24 10:01:44,905: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/10-01-41/agent_actions/episode_0.json


Current episode number: 1
Agents present: ['red_A', 'defender']


### Episode 2
When we reset the environment again, it moves onto episode 2, where it will bring in greens_1 and reds_1 for green and red agent definitions. Let's verify the agent names and that they take actions at the defined frequency.

Most green actions will be `node-application-execute` while red will `do-nothing` except at steps 10 and 20.

In [14]:
env.reset()
print(f"Current episode number: {env.episode_counter}")
print(f"Agents present: {list(env.game.agents.keys())}")
for i in range(21):
    env.step(0)

table = PrettyTable()
table.field_names = ["step", "Green Action", "Red Action"]
for i in range(21):
    green_action = env.game.agents['green_A'].history[i].action
    red_action = env.game.agents['red_A'].history[i].action
    table.add_row([i, green_action, red_action])
print(table)

2025-03-24 10:01:44,941: Resetting environment, episode 1, avg. reward: 0.0


2025-03-24 10:01:44,942: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/10-01-41/agent_actions/episode_1.json


Current episode number: 2
Agents present: ['green_A', 'red_A', 'defender']


+------+--------------------------+--------------------------+
| step |       Green Action       |        Red Action        |
+------+--------------------------+--------------------------+
|  0   | node-application-execute |        do-nothing        |
|  1   | node-application-execute |        do-nothing        |
|  2   | node-application-execute |        do-nothing        |
|  3   | node-application-execute |        do-nothing        |
|  4   |        do-nothing        |        do-nothing        |
|  5   |        do-nothing        |        do-nothing        |
|  6   | node-application-execute |        do-nothing        |
|  7   | node-application-execute |        do-nothing        |
|  8   | node-application-execute |        do-nothing        |
|  9   | node-application-execute |        do-nothing        |
|  10  | node-application-execute | node-application-execute |
|  11  | node-application-execute |        do-nothing        |
|  12  | node-application-execute |        do-nothing  

### Episode 3
When we reset the environment again, it moves onto episode 3, where it will bring in greens_2 and reds_2 for green and red agent definitions. Let's verify the agent names and that they take actions at the defined frequency.

Now, green will perform `node-application-execute` only 5% of the time, while red will perform `node-application-execute` more frequently than before.

In [15]:
env.reset()
print(f"Current episode number: {env.episode_counter}")
print(f"Agents present: {list(env.game.agents.keys())}")
for i in range(21):
    env.step(0)

table = PrettyTable()
table.field_names = ["step", "Green Action", "Red Action"]
for i in range(21):
    green_action = env.game.agents['green_B'].history[i].action
    red_action = env.game.agents['red_B'].history[i].action
    table.add_row([i, green_action, red_action])
print(table)

2025-03-24 10:01:45,015: Resetting environment, episode 2, avg. reward: 0.0


2025-03-24 10:01:45,016: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/10-01-41/agent_actions/episode_2.json


Current episode number: 3
Agents present: ['green_B', 'red_B', 'defender']
+------+--------------+--------------------------+
| step | Green Action |        Red Action        |
+------+--------------+--------------------------+
|  0   |  do-nothing  |        do-nothing        |
|  1   |  do-nothing  |        do-nothing        |
|  2   |  do-nothing  |        do-nothing        |
|  3   |  do-nothing  | node-application-execute |
|  4   |  do-nothing  | node-application-execute |
|  5   |  do-nothing  |        do-nothing        |
|  6   |  do-nothing  |        do-nothing        |
|  7   |  do-nothing  | node-application-execute |
|  8   |  do-nothing  |        do-nothing        |
|  9   |  do-nothing  | node-application-execute |
|  10  |  do-nothing  | node-application-execute |
|  11  |  do-nothing  |        do-nothing        |
|  12  |  do-nothing  | node-application-execute |
|  13  |  do-nothing  |        do-nothing        |
|  14  |  do-nothing  |        do-nothing        |
|  15  

### Further Episodes

Since the schedule definition only goes up to episode 3, if we reset the environment again, we run out of episodes. The environment will simply loop back to the beginning, but it produces a warning message to make users aware that the episodes are being repeated.

In [16]:
env.reset(); # semicolon suppresses jupyter outputting the observation space.


2025-03-24 10:01:45,076: Resetting environment, episode 3, avg. reward: 0.0


2025-03-24 10:01:45,077: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/10-01-41/agent_actions/episode_3.json




## Other uses

Since the episode schedules make use of yaml aliases and anchors, it's possible to use them in any part of the config, not just agent definitions. For instance, we can vary the simulation setup by changing what software is installed on hosts, how that software is configured, or even change the nodes themselves.

As an example, we will vary what software is installed on nodes in a basic test network.

In [17]:
mini_scenario_path = PRIMAITE_PATHS.user_config_path / "example_config/mini_scenario_with_simulation_variation"


Let's open the base scenario to see the placeholders. client_1 and server both have placeholders in the software installed on them. The server has a placeholder called `*server_services` and the client has `*client_applications`.

In [18]:
with open(mini_scenario_path/"base_scenario.yaml") as f:
    print(f.read())

metadata:
    version: 3.0

game:
  max_episode_length: 128
  ports: []
  protocols: []

agents:
  - ref: RL_Agent
    type: proxy-agent

    action_space:
      action_map:
        0:
          action: do-nothing
          options: {}
        1:
          action: node-shutdown
          options:
            node_name: client_1
        2:
          action: node-shutdown
          options:
            node_name: server
        3:
          action: node-startup
          options:
            node_name: client_1
        4:
          action: node-startup
          options:
            node_name: server
        5:
          action: host-nic-disable
          options:
            node_name: client_1
            nic_num: 1
        6:
          action: host-nic-disable
          options:
            node_name: server
            nic_num: 1
        7:
          action: host-nic-enable
          options:
            node_name: client_1
            nic_num: 1
        8:
          action: host-nic

In the 0th episode, `simulation_variant_1.yaml` is loaded in and the server gets a `database-service`, while client_1 gets `database-client`.

In [19]:
with open(mini_scenario_path/"simulation_variant_1.yaml") as f:
    print(f.read())

server_services: &server_services
  - type: database-service

client_applications: &client_applications
  - type: database-client



In [20]:
env = PrimaiteGymEnv(env_config=mini_scenario_path)
print(f"Episode: {env.episode_counter}")
env.game.simulation.network.get_node_by_hostname('server').software_manager.show()
env.game.simulation.network.get_node_by_hostname('client_1').software_manager.show()

2025-03-24 10:01:45,170: PrimaiteGymEnv RNG seed = None


Episode: 0
+---------------------------------------------------------------------------------------+
|                                server Software Manager                                |
+----------------------+-------------+-----------------+--------------+------+----------+
| Name                 | Type        | Operating State | Health State | Port | Protocol |
+----------------------+-------------+-----------------+--------------+------+----------+
| arp                  | Service     | RUNNING         | GOOD         | 219  | udp      |
| icmp                 | Service     | RUNNING         | GOOD         | None | icmp     |
| dns-client           | Service     | RUNNING         | GOOD         | 53   | tcp      |
| ntp-client           | Service     | RUNNING         | GOOD         | 123  | udp      |
| web-browser          | Application | RUNNING         | GOOD         | 80   | tcp      |
| nmap                 | Application | RUNNING         | GOOD         | None | none     |

In the 1st episode, `simulation_variant_2.yaml` is loaded in, therefore the server gets a `ftp-server` and client_1 gets a `ransomware-script`.

In [21]:
with open(mini_scenario_path/"simulation_variant_2.yaml") as f:
    print(f.read())

server_services: &server_services
  - type: ftp-server

client_applications: &client_applications
  - type: ransomware-script



In [22]:
env.reset()
print(f"Episode: {env.episode_counter}")
env.game.simulation.network.get_node_by_hostname('server').software_manager.show()
env.game.simulation.network.get_node_by_hostname('client_1').software_manager.show()

2025-03-24 10:01:45,186: Resetting environment, episode 0, avg. reward: 0.0


2025-03-24 10:01:45,188: Saving agent action log to /home/runner/primaite/4.0.0/sessions/2025-03-24/10-01-41/agent_actions/episode_0.json


Episode: 1
+---------------------------------------------------------------------------------------+
|                                server Software Manager                                |
+----------------------+-------------+-----------------+--------------+------+----------+
| Name                 | Type        | Operating State | Health State | Port | Protocol |
+----------------------+-------------+-----------------+--------------+------+----------+
| arp                  | Service     | RUNNING         | GOOD         | 219  | udp      |
| icmp                 | Service     | RUNNING         | GOOD         | None | icmp     |
| dns-client           | Service     | RUNNING         | GOOD         | 53   | tcp      |
| ntp-client           | Service     | RUNNING         | GOOD         | 123  | udp      |
| web-browser          | Application | RUNNING         | GOOD         | 80   | tcp      |
| nmap                 | Application | RUNNING         | GOOD         | None | none     |