# NOTEBOOK COUNTERMEASURES: USING DEEPSEEK TO GENERATE MITIGATIONS

## 1. OBJECTIVES

This notebook aims to generate specific mitigations for IoT devices using the general mitigations from CAPEC. For this generation, the Deepseek LLM was used.

## 2. IMPORTS AND SETUP
Import the required python libraries, preprocessing, model training, and evaluation.

In [1]:
# Pandas import
import pandas as pd
# LLM neccesary imports
from huggingface_hub import interpreter_login
from unsloth import FastLanguageModel
from transformers import GenerationConfig
import torch
# Auxiliary imports
import re
from tqdm import tqdm
import json
import os
# Login to Hugging Face Hub
interpreter_login()

  from .autonotebook import tqdm as notebook_tqdm


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!

    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|



## 3. EdgeIIoT ATTACKS AND CAPEC EQUIVALENCES
Create a dataframe with the EdgeIIoT attacks and their corresponding CAPEC equivalences

In [2]:
# Create a dataframe with the attack names and their corresponding CAPEC IDs and names
attack_mapping = [
    {"attack_name": "DDoS_UDP", "capec_id": "486", "capec_name": "UDP Flood"},
    {"attack_name": "DDoS_ICMP", "capec_id": "487", "capec_name": "ICMP Flood"},
    {"attack_name": "SQL_injection","capec_id": "66", "capec_name": "SQL Injection"},
    {"attack_name": "DDoS_TCP",  "capec_id": "482", "capec_name": "TCP Flood"},
    {"attack_name": "Vulnerability_scanner", "capec_id": "310", "capec_name": "Scanning for vulnerable software"},
    {"attack_name": "Password",  "capec_id": "49", "capec_name": "Password Brute Force"},
    {"attack_name": "DDoS_HTTP", "capec_id": "488", "capec_name": "HTTP Flood"},
    {"attack_name": "Uploading", "capec_id": "242", "capec_name": "Code Injection"},
    {"attack_name": "Backdoor",  "capec_id": "523", "capec_name": "Malicious Software Implanted"},
    {"attack_name": "Port_Scanning",  "capec_id": "300", "capec_name": "Port scanning"},
    {"attack_name": "XSS","capec_id": "63", "capec_name": "Cross Site Scripting (XSS)"},
    {"attack_name": "Ransomware", "capec_id": "542", "capec_name": "Targeted malware"},
    {"attack_name": "Fingerprinting",  "capec_id": "224", "capec_name": "Fingerprinting"},
    {"attack_name": "MITM",  "capec_id": "94", "capec_name": "Man-In-The-Middle Attack"}
]

# Print the dataframe
df_mapping = pd.DataFrame(attack_mapping)
df_mapping.head()

Unnamed: 0,attack_name,capec_id,capec_name
0,DDoS_UDP,486,UDP Flood
1,DDoS_ICMP,487,ICMP Flood
2,SQL_injection,66,SQL Injection
3,DDoS_TCP,482,TCP Flood
4,Vulnerability_scanner,310,Scanning for vulnerable software


## 3. LOAD CAPEC DATA
Load the CAPEC data (downloaded from: https://capec.mitre.org/data/downloads.html)

In [3]:
# Load the CAPEC dataset
capec_df = pd.read_csv("../data/capec/capec-dataset.csv", sep=";", quotechar='"', encoding="utf-8")
capec_df.head()

Unnamed: 0,ID,Name,Abstraction,Status,Description,Alternate Terms,Likelihood Of Attack,Typical Severity,Related Attack Patterns,Execution Flow,Prerequisites,Skills Required,Resources Required,Indicators,Consequences,Mitigations,Example Instances,Related Weaknesses,Taxonomy Mappings,Notes
0,1,Accessing Functionality Not Properly Constrain...,Standard,Draft,"In applications, particularly web applications...",,High,High,::NATURE:ChildOf:CAPEC ID:122::NATURE:CanPrece...,::STEP:1:PHASE:Explore:DESCRIPTION:[Survey] Th...,::The application must be navigable in a manne...,::SKILL:In order to discover unrestricted reso...,::None: No specialized resources are required ...,,::SCOPE:Confidentiality:SCOPE:Access Control:S...,"::In a J2EE setting, administrators can associ...",::Implementing the Model-View-Controller (MVC)...,::276::285::434::693::732::1191::1193::1220::1...,TAXONOMY NAME:ATTACK:ENTRY ID:1574.010:ENTRY N...,
1,10,Buffer Overflow via Environment Variables,Detailed,Draft,This attack pattern involves causing a buffer ...,,High,High,::NATURE:ChildOf:CAPEC ID:100::,::STEP:1:PHASE:Explore:DESCRIPTION:[Identify t...,::The application uses environment variables.:...,::SKILL:An attacker can simply overflow a buff...,,"::If the application does bound checking, it s...",::SCOPE:Availability:TECHNICAL IMPACT:Unreliab...,::Do not expose environment variable to the us...,::A buffer overflow in sccw allows local users...,::120::302::118::119::74::99::20::680::733::697::,TAXONOMY NAME:OWASP Attacks:ENTRY NAME:Buffer ...,
2,100,Overflow Buffers,Standard,Draft,Buffer Overflow attacks target improper or mis...,,High,Very High,::NATURE:ChildOf:CAPEC ID:123::,::STEP:1:PHASE:Explore:DESCRIPTION:[Identify t...,::Targeted software performs buffer operations...,"::SKILL:In most cases, overflowing a buffer do...",::None: No specialized resources are required ...,::An attack designed to leverage a buffer over...,::SCOPE:Availability:TECHNICAL IMPACT:Unreliab...,::Use a language or compiler that performs aut...,::The most straightforward example is an appli...,::120::119::131::129::805::680::,TAXONOMY NAME:WASC:ENTRY ID:07:ENTRY NAME:Buff...,
3,101,Server Side Include (SSI) Injection,Detailed,Draft,An attacker can use Server Side Include (SSI) ...,,High,High,::NATURE:ChildOf:CAPEC ID:253::NATURE:CanPrece...,::STEP:1:PHASE:Explore:DESCRIPTION:[Determine ...,::A web server that supports server side inclu...,::SKILL:The attacker needs to be aware of SSI ...,::None: No specialized resources are required ...,,::SCOPE:Confidentiality:TECHNICAL IMPACT:Read ...,::Set the OPTIONS IncludesNOEXEC in the global...,::Consider a website hosted on a server that p...,::97::74::20::,TAXONOMY NAME:WASC:ENTRY ID:36:ENTRY NAME:SSI ...,
4,102,Session Sidejacking,Detailed,Draft,Session sidejacking takes advantage of an unen...,,High,High,::NATURE:ChildOf:CAPEC ID:593::,::STEP:1:PHASE:Explore:DESCRIPTION:[Detect Unp...,::An attacker and the victim are both using th...,::SKILL:Easy to use tools exist to automate th...,"::A packet sniffing tool, such as wireshark, c...",,::SCOPE:Confidentiality:SCOPE:Access Control:S...,::Make sure that HTTPS is used to communicate ...,::The attacker and the victim are using the sa...,::294::522::523::319::614::,,


In [4]:
# print the number of columns in the CAPEC dataset
print("Number of columns:",len(capec_df.columns))

Number of columns: 20


## 4. CLEAN CAPEC DATA
Clean the capec data inside each column

In [5]:
# Create a function to clean the CAPEC dataset, specifically the data with unnecessary chars in each column
def clean_capec_columns(df):
    # --- 1. Related Attack Patterns ---
    if "Related Attack Patterns" in df.columns:
        df["Related Attack Patterns"] = df["Related Attack Patterns"] \
            .fillna("") \
            .str.strip(":") \
            .str.replace("::", "\n", regex=False)

    # --- 2. Execution Flow ---
    if "Execution Flow" in df.columns:
        df["Execution Flow"] = df["Execution Flow"] \
            .fillna("") \
            .str.strip(":") \
            .str.replace("::STEP:", "\nSTEP:", regex=False) \
            .str.replace(":PHASE:", " | PHASE: ", regex=False) \
            .str.replace(":DESCRIPTION:", "\n    DESC: ", regex=False) \
            .str.replace(":TECHNIQUE:", "\n    - TECHNIQUE: ", regex=False)

    # --- 3. Prerequisites ---
    if "Prerequisites" in df.columns:
        df["Prerequisites"] = df["Prerequisites"] \
            .fillna("") \
            .str.strip(":") \
            .str.replace("::", "\n", regex=False)

    # --- 4. Skills Required ---
    if "Skills Required" in df.columns:
        df["Skills Required"] = df["Skills Required"] \
            .fillna("") \
            .str.strip(":") \
            .str.replace("::", "\n", regex=False) \
            .str.replace("SKILL:", "- Skill: ", regex=False) \
            .str.replace("LEVEL:", "  Level: ", regex=False)

    # --- 5. Resources Required ---
    if "Resources Required" in df.columns:
        df["Resources Required"] = df["Resources Required"] \
            .fillna("") \
            .str.strip(":") \
            .str.replace("::", "\n", regex=False)

    # --- 6. Indicators ---
    if "Indicators" in df.columns:
        df["Indicators"] = df["Indicators"] \
            .fillna("") \
            .str.strip(":") \
            .str.replace("::", "\n", regex=False)

    # --- 7. Consequences ---
    if "Consequences" in df.columns:
        df["Consequences"] = df["Consequences"] \
            .fillna("") \
            .str.strip(":") \
            .str.replace("::", "\n", regex=False) \
            .str.replace("SCOPE:", "- Scope: ", regex=False) \
            .str.replace("TECHNICAL IMPACT:", "  Impact: ", regex=False) \
            .str.replace("NOTE:", "  Note: ", regex=False)

    # --- 8. Mitigations ---
    if "Mitigations" in df.columns:
        df["Mitigations"] = df["Mitigations"] \
            .fillna("") \
            .str.strip(":") \
            .str.replace("::", "\n", regex=False)

    # --- 9. Example Instances ---
    if "Example Instances" in df.columns:
        df["Example Instances"] = df["Example Instances"] \
            .fillna("") \
            .str.strip(":") \
            .str.replace("::", "\n", regex=False)

    # --- 10. Related Weaknesses ---
    if "Related Weaknesses" in df.columns:
        df["Related Weaknesses"] = df["Related Weaknesses"] \
            .fillna("") \
            .str.strip(":") \
            .str.replace("::", ", ", regex=False)

    # --- 11. Taxonomy Mappings ---
    if "Taxonomy Mappings" in df.columns:
        df["Taxonomy Mappings"] = df["Taxonomy Mappings"] \
            .fillna("") \
            .str.strip(":") \
            .str.replace("::", "\n", regex=False) \
            .str.replace("TAXONOMY NAME:", "- Taxonomy: ", regex=False) \
            .str.replace("ENTRY ID:", "  ID: ", regex=False) \
            .str.replace("ENTRY NAME:", "  Name: ", regex=False)

    return df

In [6]:
# clean the CAPEC dataset removing the unncessary characters form the columns
capec_df = clean_capec_columns(capec_df)

In [7]:
# print the number of columns in the CAPEC dataset cleaned
print("Number of columns:",len(capec_df.columns))

Number of columns: 20


In [8]:
# print the data cleaned
capec_df.head()

Unnamed: 0,ID,Name,Abstraction,Status,Description,Alternate Terms,Likelihood Of Attack,Typical Severity,Related Attack Patterns,Execution Flow,Prerequisites,Skills Required,Resources Required,Indicators,Consequences,Mitigations,Example Instances,Related Weaknesses,Taxonomy Mappings,Notes
0,1,Accessing Functionality Not Properly Constrain...,Standard,Draft,"In applications, particularly web applications...",,High,High,NATURE:ChildOf:CAPEC ID:122\nNATURE:CanPrecede...,STEP:1 | PHASE: Explore\n DESC: [Survey] Th...,The application must be navigable in a manner ...,- Skill: In order to discover unrestricted res...,None: No specialized resources are required to...,,- Scope: Confidentiality:- Scope: Access Contr...,"In a J2EE setting, administrators can associat...",Implementing the Model-View-Controller (MVC) w...,"276, 285, 434, 693, 732, 1191, 1193, 1220, 129...",- Taxonomy: ATTACK: ID: 1574.010: Name: Hija...,
1,10,Buffer Overflow via Environment Variables,Detailed,Draft,This attack pattern involves causing a buffer ...,,High,High,NATURE:ChildOf:CAPEC ID:100,STEP:1 | PHASE: Explore\n DESC: [Identify t...,The application uses environment variables.\nA...,- Skill: An attacker can simply overflow a buf...,,"If the application does bound checking, it sho...",- Scope: Availability: Impact: Unreliable Exe...,Do not expose environment variable to the user...,A buffer overflow in sccw allows local users t...,"120, 302, 118, 119, 74, 99, 20, 680, 733, 697",- Taxonomy: OWASP Attacks: Name: Buffer Overf...,
2,100,Overflow Buffers,Standard,Draft,Buffer Overflow attacks target improper or mis...,,High,Very High,NATURE:ChildOf:CAPEC ID:123,STEP:1 | PHASE: Explore\n DESC: [Identify t...,Targeted software performs buffer operations.\...,"- Skill: In most cases, overflowing a buffer d...",None: No specialized resources are required to...,An attack designed to leverage a buffer overfl...,- Scope: Availability: Impact: Unreliable Exe...,Use a language or compiler that performs autom...,The most straightforward example is an applica...,"120, 119, 131, 129, 805, 680",- Taxonomy: WASC: ID: 07: Name: Buffer Overf...,
3,101,Server Side Include (SSI) Injection,Detailed,Draft,An attacker can use Server Side Include (SSI) ...,,High,High,NATURE:ChildOf:CAPEC ID:253\nNATURE:CanPrecede...,STEP:1 | PHASE: Explore\n DESC: [Determine ...,A web server that supports server side include...,- Skill: The attacker needs to be aware of SSI...,None: No specialized resources are required to...,,- Scope: Confidentiality: Impact: Read Data\n...,Set the OPTIONS IncludesNOEXEC in the global a...,Consider a website hosted on a server that per...,"97, 74, 20",- Taxonomy: WASC: ID: 36: Name: SSI Injectio...,
4,102,Session Sidejacking,Detailed,Draft,Session sidejacking takes advantage of an unen...,,High,High,NATURE:ChildOf:CAPEC ID:593,STEP:1 | PHASE: Explore\n DESC: [Detect Unp...,An attacker and the victim are both using the ...,- Skill: Easy to use tools exist to automate t...,"A packet sniffing tool, such as wireshark, can...",,- Scope: Confidentiality:- Scope: Access Contr...,Make sure that HTTPS is used to communicate wi...,The attacker and the victim are using the same...,"294, 522, 523, 319, 614",,


In [9]:
# print row corresponding to the CAPEC ID 66
row = capec_df[capec_df["ID"] == 66].iloc[0]

for col in capec_df.columns:
    print(f"\n{col}")
    print("—" * (len(col) + 4))
    print(row[col])


ID
——————
66

Name
————————
SQL Injection

Abstraction
———————————————
Standard

Status
——————————
Draft

Description
———————————————
This attack exploits target software that constructs SQL statements based on user input. An attacker crafts input strings so that when the target software constructs SQL statements based on the input, the resulting SQL statement performs actions other than those the application intended. SQL Injection results from failure of the application to appropriately validate input.

Alternate Terms
———————————————————
nan

Likelihood Of Attack
————————————————————————
High

Typical Severity
————————————————————
High

Related Attack Patterns
———————————————————————————
NATURE:ChildOf:CAPEC ID:248

Execution Flow
——————————————————
STEP:1 | PHASE: Explore
    DESC: [Survey application] The attacker first takes an inventory of the functionality exposed by the application.
    - TECHNIQUE: Spider web sites for all available links
    - TECHNIQUE: Sniff network commu

## 5. GENERATE PROMPTS
Generate prompts to generate the specific mitigations

In [10]:
# Convert the 'capec_id' and 'ID' columns to string type
df_mapping['capec_id'] = df_mapping['capec_id'].astype(str)
capec_df['ID'] = capec_df['ID'].astype(str)

# Merge the two dataframes on the 'capec_id' and 'ID' columns
merged_df = pd.merge(df_mapping, capec_df, left_on='capec_id', right_on='ID')

#Function to generate the prompt for the LLM
def construir_prompt_instruct(row):
    mitigations_list = row['Mitigations'].split(';')
    mitigations_formatted = '\n'.join(f"- {m.strip()}" for m in mitigations_list if m.strip())

    instruction = (
        "You are a cybersecurity engineer specialized in securing IoT systems.\n"
        "Your task is to generate **exactly 5** mitigations for the described attack — no more, no less.\n"
        "Each mitigation must:\n"
        "- Be specific, technical, and directly applicable to IoT environments (e.g., edge devices, embedded systems).\n"
        "- Include practical actions such as configuration steps, firewall rules, commands, or code snippets.\n"
        "- Be written on a single line and start with a dash (`-`).\n"
        "- Do NOT include labels like 'Mitigation', 'Bullet', or numbering (e.g., 1., 2., etc.).\n"
        "- Do NOT include any introductions, explanations, summaries, or extra comments.\n"
        "Just output 5 lines, each starting with a dash (`-`), and stop."
    )

    context = (
        f"A security analyst has detected an attack of type: {row['attack_name']}.\n\n"
        f"This attack is defined by CAPEC-{row['ID']}: {row['Name']}.\n"
        f"Description: {row['Description']}\n\n"
        f"General mitigations include:\n{mitigations_formatted}\n\n"
        f"The context is an IoT environment, which may include:\n"
        f"- Edge devices (e.g. sensors or gateways)\n"
        f"- Embedded systems with lightweight APIs\n"
        f"- Web interfaces built with Python, C or Lua\n"
        f"- Communication over MQTT, HTTP or CoAP\n"
    )

    final_prompt = (
        f"### Instruction:\n{instruction}\n"
        f"### Input:\n{context}\n"
        f"### Response:\n<think>\n-"
    )
    return final_prompt

# generate the prompts for each row in the merged dataframe
prompts = merged_df.apply(construir_prompt_instruct, axis=1).tolist()

# Print an example of the generated prompt
print(prompts[2])


### Instruction:
You are a cybersecurity engineer specialized in securing IoT systems.
Your task is to generate **exactly 5** mitigations for the described attack — no more, no less.
Each mitigation must:
- Be specific, technical, and directly applicable to IoT environments (e.g., edge devices, embedded systems).
- Include practical actions such as configuration steps, firewall rules, commands, or code snippets.
- Be written on a single line and start with a dash (`-`).
- Do NOT include labels like 'Mitigation', 'Bullet', or numbering (e.g., 1., 2., etc.).
- Do NOT include any introductions, explanations, summaries, or extra comments.
Just output 5 lines, each starting with a dash (`-`), and stop.
### Input:
A security analyst has detected an attack of type: SQL_injection.

This attack is defined by CAPEC-66: SQL Injection.
Description: This attack exploits target software that constructs SQL statements based on user input. An attacker crafts input strings so that when the target softw

## 6. LOAD LLM
Load the deepseek llm to generate the specific mitigations

In [11]:
max_seq_length = 2048 
dtype = None 
load_in_4bit = True

# Load deepseek model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model)

==((====))==  Unsloth 2025.2.15: Fast Llama patching. Transformers: 4.49.0.
   \\   /|    GPU: NVIDIA GeForce RTX 4060 Laptop GPU. Max memory: 7.996 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.9. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post2. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096, padding_idx=128004)
    (layers): ModuleList(
      (0): LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((409

## 7. GENERATE MITIGATIONS
Generate mitigations for each attack and store them in a list

In [13]:
# Generation configuration
generation_config = GenerationConfig(
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    max_new_tokens=200,
    repetition_penalty=1.1,
    pad_token_id=tokenizer.eos_token_id,
)

# List to store the mitigation results
mitigation_results = []

# Iterate over the prompts and generate mitigations
for i, prompt in enumerate(tqdm(prompts, desc="Generating mitigations")):
    # TTokenize the prompt
    inputs = tokenizer([prompt], return_tensors="pt").to("cuda" if torch.cuda.is_available() else "cpu")

    # Generate response from the model
    with torch.no_grad():
        outputs = model.generate(
            input_ids=inputs.input_ids,
            attention_mask=inputs.attention_mask,
            generation_config=generation_config,
            use_cache=True
        )

    # Decode the generated tokens
    response_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

    # Extract the text after "<think>" and before "</think>"
    extracted = ""
    if "<think>" in response_text:
        extracted = response_text.split("<think>")[-1]
        if "</think>" in extracted:
            extracted = extracted.split("</think>")[0]
    else:
        extracted = response_text.replace(prompt, "")

    # Clean the extracted text
    extracted = extracted.strip()

    # Save the result
    mitigation_results.append({
        "prompt": prompt.strip(),
        "response": extracted
    })

# Show an example of the generated mitigation
print(" - Generated mitigation example:\n", mitigation_results[0]["response"])

Generating mitigations: 100%|██████████| 14/14 [02:03<00:00,  8.85s/it]

 - Generated mitigation example:
 - - Use a stateless firewall to block all incoming UDP traffic except for known trusted services like DNS, NTP, or VoIP, ensuring that unexpected or unused ports do not receive traffic.  
- - Implement rate limiting on all IoT edge devices by setting per-device bandwidth limits or packet rate thresholds to prevent excessive traffic from being sent to or received from a single source.  
- - Deploy Network Address Translation (NAT) on IoT gateways to obscure the internal IP addresses, making it harder for attackers to determine the actual source of the traffic.  
- - Configure firewalls to drop UDP packets with a length greater than the maximum allowed size for the protocol, preventing malformed or oversized packets from causing issues.  
- - Utilize ingress/egress filtering on IoT gateways to only allow communication with known, trusted endpoints, blocking any unrecognized sources or destinations.  
```





In [None]:
# Print the mitigations for each attack
for i in range(len(mitigation_results)):
    print(f"\nAttack {df_mapping.iloc[i]['attack_name']}:")
    print(mitigation_results[i]["response"])


Attack DDoS_UDP:
- Implement rate limiting on all IoT edge devices to ensure that each device cannot send more than X packets per second to prevent overwhelming the network. For example, set a limit of 1000 packets/second via firewall rules or network configurations.
- Deploy an intrusion detection system (IDS) specifically tailored for IoT environments to monitor for abnormal UDP traffic patterns indicative of a DDoS attack. Configure it to alert on sudden spikes in traffic or unusual activity beyond normal usage thresholds.
- Use stateful firewalls with advanced features like dynamic packet filtering to block unsolicited UDP traffic from unknown sources and only allow known legitimate services. This helps reduce the risk of being overwhelmed by spoofed packets.
- Encrypt and sign IoT firmware updates before distribution to ensure integrity and authenticity. This prevents malicious actors from tampering with or injecting malicious code into the firmware, which could be used to launch

## 8. CLEAN GENERATED MITIGATIONS
Clean the generated mitigations, for example, by eliminating formatting errors

In [None]:
def extract_5_clean_bullets(text):
    bullets = []
    for line in text.splitlines():
        line = line.strip()
        if line.startswith("-") and len(line) > 2:
            # Replace multiple dashes with a single dash and remove leading spaces
            cleaned = re.sub(r"^(-\s*)+", "- ", line)
            bullets.append(cleaned.strip())
        if len(bullets) == 4:
            break
    return bullets

In [None]:
filtered_mitigations = []

for i, item in enumerate(mitigation_results):
    bullets = extract_5_clean_bullets(item["response"])
    filtered_mitigations.append({
        "attack": df_mapping.iloc[i]['attack_name'],
        "mitigations": bullets
    })

In [None]:
# print cleaned mitigations
for item in filtered_mitigations:
    print(f"\n🛡️ Attack: {item['attack']}")
    for bullet in item["mitigations"]:
        print(bullet)



🛡️ Attack: DDoS_UDP
- Implement rate limiting on all IoT edge devices to ensure that each device cannot send more than X packets per second to prevent overwhelming the network. For example, set a limit of 1000 packets/second via firewall rules or network configurations.
- Deploy an intrusion detection system (IDS) specifically tailored for IoT environments to monitor for abnormal UDP traffic patterns indicative of a DDoS attack. Configure it to alert on sudden spikes in traffic or unusual activity beyond normal usage thresholds.
- Use stateful firewalls with advanced features like dynamic packet filtering to block unsolicited UDP traffic from unknown sources and only allow known legitimate services. This helps reduce the risk of being overwhelmed by spoofed packets.
- Encrypt and sign IoT firmware updates before distribution to ensure integrity and authenticity. This prevents malicious actors from tampering with or injecting malicious code into the firmware, which could be used to lau

## 8. STORE MITIGATIONS
Store mitigations in a json file.

In [None]:
# Save the filtered mitigations to a JSONL file
output_file = os.path.join("..", "data", "capec", "filtered_mitigations.jsonl")

with open(output_file, "w") as f:
    for item in filtered_mitigations:
        json.dump(item, f)
        f.write("\n")