# CRISP DM


## Problem Understanding:
A central step in the ORCA pipeline is the automated mapping of natural language threat
descriptions to structured attack patterns from the Common Attack Pattern Enumeration
and Classification (CAPEC) framework. This process—referred to as Threat-to-CAPEC
Mapping—aims to translate unstructured textual threat inputs into standardized, machine-
readable representations that describe how an attacker might exploit a given vulnerability.
This mapping is critical for enabling further steps in the security analysis process, such
as correlating threats with known vulnerabilities (CWEs, CVEs), assessing risk using
scoring systems like CVSS, and informing mitigation strategies. However, the task poses
several inherent challenges:
Ambiguity: Threat descriptions are often informal, incomplete, or context-dependent.
Terminology mismatch: Natural language inputs may not directly align with the technical
vocabulary used in CAPEC definitions.
Granularity: A single threat may correspond to multiple CAPECs at varying levels of
abstraction, requiring semantic reasoning to determine relevance.
Solving this problem involves designing a system capable of understanding the semantics
of threat descriptions and reliably identifying the most appropriate CAPEC entries. This
step must balance accuracy, scalability, and interpretability to support reliable, automated
security assessments within the broader ORCA framework.

## Data Understanding

In [21]:
# imports
import pandas as pd
import json

In [18]:
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

Import threat data from json file

In [5]:
# Load threat data from JSON file into a DataFrame
df_threats = pd.read_json('data/all_threats.json')
df_threats.head()

Unnamed: 0,Threat ID,Threat title,Threat Description,Threat type,Impact type,Threat agent,Vulnerability,Threatened Asset,Affected Components
0,T-O-RAN-01,An attacker exploits insecure designs or lack ...,Unauthenticated/unauthorized access to O-RAN c...,,,All,[Outdated component from the lack of update or...,All,All
1,T-O-RAN-02,An attacker exploits misconfigured or poorly c...,Unauthenticated/unauthorized access to O-RAN c...,,,All,[Errors from the lack of configuration change ...,All,All
2,T-O-RAN-03,Attacks from the internet to penetrate O-RAN n...,Web servers serving O-RAN functional and manag...,,,All,[Errors in the design and implementation of th...,All,All
3,T-O-RAN-04,An attacker attempts to jam the airlink signal...,DDoS attacks on O-RAN systems: The 5G evolutio...,,,All,[Failure to address overload situations],"ASSET-D-06, ASSET-D-18","O-RU, airlink with UE, O-DU"
4,T-O-RAN-05,An attacker penetrates and compromises the O-R...,"O-RAN’s Fronthaul, O1, O2, A1, and E2 manageme...",,,All,[Improper or missing authentication and author...,All,"rApps, xApps, O-RU, O-DU, O-CU, Near-RT RIC, N..."


In [None]:
# Display DataFrame information
df_threats.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 182 entries, 0 to 181
Data columns (total 9 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Threat ID            182 non-null    object
 1   Threat title         182 non-null    object
 2   Threat Description   182 non-null    object
 3   Threat type          82 non-null     object
 4   Impact type          82 non-null     object
 5   Threat agent         182 non-null    object
 6   Vulnerability        182 non-null    object
 7   Threatened Asset     182 non-null    object
 8   Affected Components  182 non-null    object
dtypes: object(9)
memory usage: 12.9+ KB


There a 182 threats in 9 columns, each threat has a unique ID, a title, a description, a threat type, an impact type, a threat agent, one or more vulnerabilities (safed in a list), a threatened asset and a affected component

In [12]:
df_threats.describe()

Unnamed: 0,Threat ID,Threat title,Threat Description,Threat type,Impact type,Threat agent,Vulnerability,Threatened Asset,Affected Components
count,182,182,182,82,82,182,182,182,182
unique,182,181,182,16,17,2,116,58,55
top,T-O-RAN-01,External attacker exploits authentication weak...,Unauthenticated/unauthorized access to O-RAN c...,Spoofing,Authenticity,All,[weak mutual authentication],"ASSET-D-12, ASSET-D-13, ASSET-D-14, ASSET-D-15...",All
freq,1,2,1,28,26,178,12,24,19


In [26]:
df_threats.iloc[0]

Threat ID                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               

In [22]:
print(json.dumps(df_threats.iloc[0].to_dict(), indent=2, ensure_ascii=False))

{
  "Threat ID": "T-O-RAN-01",
  "Threat title": "An attacker exploits insecure designs or lack of adaption in O-RAN components",
  "Threat Description": "Unauthenticated/unauthorized access to O-RAN components could possibly be achieved via the different O-RAN interfaces, depending upon the design of the hardware-software O-RAN system and how different functions are segregated within the O-RAN system. \nO-RAN components might be vulnerable if: \n• Outdated component from the lack of update or patch management,\n• Poorly design architecture,\n• Missing appropriate security hardening,\n• Unnecessary or insecure function/protocol/component.\nAn attacker could, in such case, either inject malwares and/or manipulate existing software, harm the O-RAN components, create a performance issue by manipulation of parameters, or reconfigure the O-RAN components and disable the security features with the purpose of eavesdropping or wiretapping on various CUS & M planes, reaching northbound systems,

In [28]:
from IPython.display import HTML, display

display(HTML('''
<style>
/* White background for output area */
.output_area {
    background: white !important;
    color: black !important;
}

/* Optional: White background for DataFrame cells */
.dataframe {
    background-color: white !important;
    color: black !important;
}
</style>
'''))



In [29]:
display(df_threats)

Unnamed: 0,Threat ID,Threat title,Threat Description,Threat type,Impact type,Threat agent,Vulnerability,Threatened Asset,Affected Components
0,T-O-RAN-01,An attacker exploits insecure designs or lack of adaption in O-RAN components,"Unauthenticated/unauthorized access to O-RAN components could possibly be achieved via the different O-RAN interfaces, depending upon the design of the hardware-software O-RAN system and how different functions are segregated within the O-RAN system. \nO-RAN components might be vulnerable if: \n• Outdated component from the lack of update or patch management,\n• Poorly design architecture,\n• Missing appropriate security hardening,\n• Unnecessary or insecure function/protocol/component.\nAn attacker could, in such case, either inject malwares and/or manipulate existing software, harm the O-RAN components, create a performance issue by manipulation of parameters, or reconfigure the O-RAN components and disable the security features with the purpose of eavesdropping or wiretapping on various CUS & M planes, reaching northbound systems, attack broader network to cause denial-of-service, steal unprotected private keys, certificates, hash values, or other type of breaches.\nIn addition, O-RAN components could be software providing network functions, so they are likely to be vulnerable to software flaws: it could be possible to bypass firewall restrictions or to take advantage of a buffer overflow to execute arbitrary commands, etc.",,,All,"[Outdated component from the lack of update or patch management, Poorly design architecture, Missing appropriate security hardening, Unnecessary or insecure function/protocol/component]",All,All
1,T-O-RAN-02,An attacker exploits misconfigured or poorly configured O-RAN components,"Unauthenticated/unauthorized access to O-RAN components could possibly be achieved via the different O-RAN interfaces, depending upon the configuration of the hardware-software O-RAN system. \nO-RAN components might be vulnerable if: \n• Errors from the lack of configuration change management,\n• Misconfigured or poorly configured O-RAN components,\n• Improperly configured permissions,\n• Unnecessary features are enabled (e.g. unnecessary ports, services, accounts, or privileges),\n• Default accounts and their passwords still enabled and unchanged,\n• Security features are disabled or not configured securely.\nAn attacker could, in such case, either inject malwares and/or manipulate existing software, harm the O-RAN components, create a performance issue by manipulation of parameters, or reconfigure the O-RAN components and disable the security features with the purpose of eavesdropping or wiretapping on various CUS & M planes, reaching northbound systems, attack broader network to cause denial-of-service, steal unprotected private keys, certificates, hash values, or other type of breaches.",,,All,"[Errors from the lack of configuration change management, Misconfigured or poorly configured O-RAN components, Improperly configured permissions, Unnecessary features are enabled (e.g. unnecessary ports, services, accounts, or privileges), Default accounts and their passwords still enabled and unchanged, Security features are disabled or not configured securely]",All,All
2,T-O-RAN-03,Attacks from the internet to penetrate O-RAN network boundary,"Web servers serving O-RAN functional and management services should provide adequate protection. \nAn attacker that have access to the uncontrolled O-RAN network could:\n• Bypass the information flow control policy implemented by the firewall,\n• And/or attack O-RAN components in the trusted networks by taking advantage of particularities and errors in the design and implementation of the network protocols (IP, TCP, UDP, application protocols),\n• Use of incorrect or exceeded TCP sequence numbers,\n• Perform brute force attacks on FTP passwords,\n• Use of improper HTTP user sessions,\n• Etc.\nThe effects of such attacks may include:\n• An intrusion, meaning unauthorized access to O-RAN components,\n• Blocking, flooding or restarting an O-RAN component causing a denial of service,\n• Flooding of network equipment, causing a denial of service,\n• Etc.",,,All,"[Errors in the design and implementation of the network protocols (HTTP, P, TCP, UDP, application protocols)]",All,All
3,T-O-RAN-04,An attacker attempts to jam the airlink signal through IoT devices,"DDoS attacks on O-RAN systems: The 5G evolution means billions of things, collectively referred to as IoT, will be using the 5G O-RAN. Thus, IoT could increase the risk of O-RAN resource overload by way of DDoS attacks. Attackers create a botnet army by infecting many (millions/billions) IoT devices with a “remote-reboot” malware. Attackers instruct the malware to reboot all devices in a specific or targeted 5G coverage area at the same time.",,,All,[Failure to address overload situations],"ASSET-D-06, ASSET-D-18","O-RU, airlink with UE, O-DU"
4,T-O-RAN-05,"An attacker penetrates and compromises the O-RAN system through the open O-RAN’s Fronthaul, O1, O2, A1, and E2","O-RAN’s Fronthaul, O1, O2, A1, and E2 management interfaces are the new open interfaces that allow software programmability of RAN. These interfaces may not be secured to industry best practices.\nO-RAN components might be vulnerable if: \n• Improper or missing authentication and authorization processes,\n• Improper or missing ciphering and integrity checks of sensitive data exchanged over O-RAN interfaces,\n• Improper or missing replay protection of sensitive data exchanged over O-RAN interfaces,\n• Improper prevention of key reuse,\n• Improper implementation,\n• Improperly validate inputs, respond to error conditions in both the submitted data as well as out of sequence protocol steps.\nAn attacker could, in such case, cause denial-of-service, data tampering or information disclosure, etc.\nNOTE: O-RAN interfaces allow use of TLS or SSH. Industry best practices mandate the use of TLS (v1.2 or higher) or SSH certificate-based authentication. An implementation that implements TLS version lower than 1.2 or a SSH password authentication, may become the key source of vulnerability that a malicious code will exploit to compromise the O-RAN system.",,,All,"[Improper or missing authentication and authorization processes, Improper prevention of key reuse, Improper or missing replay protection of sensitive data exchanged over O-RAN interfaces, Improper or missing ciphering and integrity checks of sensitive data exchanged over O-RAN interfaces]",All,"rApps, xApps, O-RU, O-DU, O-CU, Near-RT RIC, Non-RT RIC"
...,...,...,...,...,...,...,...,...,...
177,T-E2-02,Malicious actor monitors messaging across E2 interface,Threat actor can gain access to the messaging across the E2 interface for a MiTM attack to read messages.,,,All,[Missing or weak confidentiality protection],ASSET-C-40,E2 interface
178,T-E3-03,Malicious actor modifies messaging across E2 interface,Threat actor can gain access to the messaging across the E2 interface for a MiTM attack to modify or inject messages. This can result in the Near-RT RIC and/or the E2 Nodes receiving malicious messages.,,,All,[Lack of integrity verification],ASSET-C-40,E2 interface
179,T-Y1-01,Untrusted Near-RT-RIC and Y1 consumers,"A Malicious Y1 consumer communicates with a Near-RT-RIC over the Y1 interface, or a malicious Near-RT-RIC communicates with a Y1 consumer over the Y1 interface, due to weak mutual authentication.",,,All,[weak mutual authentication],ASSET-C-42,Y1interface
180,T-Y1-02,Malicious actor monitors messaging across Y1 interface,Threat actor can gain access to the messaging across the Y1 interface for a MiTM attack to read messages.,,,All,[Missing or weak confidentiality protection],ASSET-C-42,Y1 interface
