# **Cyber-Attack Analysis: Methodology and Execution**

## **1. Introduction**
This notebook follows the methodology outlined in the research flowchart to analyze cyber-attacks using data from the CISSM website. It includes exploratory data analysis, distribution analysis, advanced statistical analysis, and predictive modeling.

---

## **2. Data Import**
- Source: CISSM website (includes over 14,000 recorded cyber-attack events from 2014 to 2024)
- Data Cleaning and Preprocessing

---

## **3. Exploratory Data Analysis (EDA)**
### **3.1 Distribution Analysis**
- Cyber-attack frequency across all industries  
- Cyber-attack type distribution across all industries  
- Cyber-attack actor-type distribution across all industries  

---

## **4. Cyber-Attack Actor and Motive Analysis**
### **4.1 Motive Analysis**
- Distribution of cyber-attack motives across industries  

### **4.2 Geographic Distribution**
- Distribution of affected countries  
- Distribution of actor countries  

### **4.3 Industry-Specific Analysis**
- Filtering public administration and healthcare-related attacks by type & motive  
- Addressing discrepancies vs. hypothesis  

---

## **5. Advanced Statistical Analysis**
- **Cyber-attacks vs. GDP**  
- **Cyber-attacks vs. Gini Coefficient**  

---

## **6. Time-Series Analysis**
### **6.1 Visualization**
- Historical frequency of cyber-attacks  

### **6.2 Predictive Modeling**
- ARIMA model for cyber-attack frequency prediction  

---

## **7. Discussion and Conclusion**
- Interpretation of results  
- Key insights  
- Future research directions  


In [1]:
# import packages
import pandas as pd
import numpy as np
import seaborn as sns
import plotly
import matplotlib.pyplot as plt
import plotly.express as px
import numpy as np
import plotly.graph_objects as go

**Data Import**

In [None]:
df = pd.read_csv("data/cissm_data.csv", encoding="latin-1")

**3. EDA Distribution**
first code below sets up df's 

In [4]:
## event_counts stores cyber event types over all indsustries
event_counts = df.groupby("event_type").size().reset_index(name="counts")
event_counts

Unnamed: 0,event_type,counts
0,Disruptive,4360
1,Exploitive,7000
2,Mixed,2500
3,Undetermined,181


In [5]:
## motive_counts stores motive over all industries
motive_counts = df.groupby("motive").size().reset_index(name="counts")
motive_counts

Unnamed: 0,motive,counts
0,Financial,8034
1,Industrial-Espionage,94
2,Personal Attack,93
3,Political-Espionage,679
4,Protest,1680
5,"Protest,Financial",1
6,Protest;Political-Espionage,1
7,Sabotage,340
8,Undetermined,3119


In [6]:
## actor_counts stores actor types over all industries
actor_counts = df.groupby("actor_type").size().reset_index(name="counts")
actor_counts

Unnamed: 0,actor_type,counts
0,Criminal,10676
1,Hacktivist,1909
2,Hacktvist,17
3,Hobbyist,191
4,Nation-State,803
5,Terrorist,30
6,Undetermined,415


In [7]:
## monthly counts stores month, event_type, and industry
monthly_counts = (
    df.groupby(["month", "event_type", "industry"]).size().reset_index(name="count")
)

In [8]:
## dfEventType stores occurrences of each event type within each industry
dfEventType = df.groupby("industry")["event_type"].value_counts().unstack(fill_value=0)

# Rename columns for clarity
dfEventType = dfEventType.rename(
    columns={
        "Undetermined": "undetermined_eventType_count",
        "Mixed": "mixed_eventType_count",
        "Exploitive": "exploititive_eventType_count",
        "Disruptive": "disruptive_eventType_count",
    }
)

# Calculate total counts for each industry
dfEventType["industry_counts"] = dfEventType.sum(axis=1)

# Reset index to make 'industry' a regular column
dfEventType = dfEventType.reset_index()

# Sort dfEventType by industry_counts in ascending order
dfEventType = dfEventType.sort_values(by="industry_counts", ascending=True)
dfEventType.head()

event_type,industry,disruptive_eventType_count,exploititive_eventType_count,mixed_eventType_count,undetermined_eventType_count,industry_counts
12,Medusa,0,0,1,0,1
2,"Agriculture, Forestry, Fishing and Hunting",6,3,9,0,18
10,Management of Companies and Enterprises,5,8,7,0,20
8,Health Care and Social Services,4,11,11,1,27
4,Construction,9,10,27,1,47


In [9]:
## dfActorType stores occurences of each actor_type within each industry
dfActorType = df.groupby("industry")["actor_type"].value_counts().unstack(fill_value=0)
dfActorType["industry_counts"] = dfActorType.sum(axis=1)
dfActorType = dfActorType.reset_index()
dfActorType = dfActorType.sort_values(by="industry_counts", ascending=True)
dfActorType.head()

actor_type,industry,Criminal,Hacktivist,Hacktvist,Hobbyist,Nation-State,Terrorist,Undetermined,industry_counts
12,Medusa,1,0,0,0,0,0,0,1
2,"Agriculture, Forestry, Fishing and Hunting",13,4,0,0,1,0,0,18
10,Management of Companies and Enterprises,16,2,0,0,2,0,0,20
8,Health Care and Social Services,27,0,0,0,0,0,0,27
4,Construction,39,7,0,1,0,0,0,47


In [10]:
## dfMotive stores occurrences of each motive type within each industry
dfMotive = df.groupby("industry")["motive"].value_counts().unstack(fill_value=0)
dfMotive["industry_counts"] = dfMotive.sum(axis=1)
dfMotive = dfMotive.reset_index()
dfMotive = dfMotive.sort_values(by="industry_counts", ascending=True)
dfMotive.rename(
    columns={"industry": "Sector Title", "Undetermined": "Undetermined Motive"},
    inplace=True,
)
dfMotive.head()

motive,Sector Title,Financial,Industrial-Espionage,Personal Attack,Political-Espionage,Protest,"Protest,Financial",Protest;Political-Espionage,Sabotage,Undetermined Motive,industry_counts
12,Medusa,1,0,0,0,0,0,0,0,0,1
2,"Agriculture, Forestry, Fishing and Hunting",10,0,0,0,4,0,0,1,3,18
10,Management of Companies and Enterprises,13,0,0,1,2,0,0,1,3,20
8,Health Care and Social Services,27,0,0,0,0,0,0,0,0,27
4,Construction,34,0,0,0,8,0,0,0,5,47


In [11]:
## combine dfEventType, dfActorType, and dfMotive type into final merged df

dfActorType = dfActorType.sort_values(by="industry_counts", ascending=True)
dfEventType = dfEventType.sort_values(by="industry_counts", ascending=True)
dfMotive = dfMotive.sort_values(by="industry_counts", ascending=True)
dfActorType.rename(columns={"industry": "Sector Title"}, inplace=True)
dfEventType.rename(columns={"industry": "Sector Title"}, inplace=True)
dfMotive.rename(columns={"industry": "Sector Title"}, inplace=True)

merged_df = pd.merge(dfEventType, dfActorType, on="Sector Title", how="outer")
final_merged_df = pd.merge(merged_df, dfMotive, on="Sector Title", how="outer")
final_merged_df = final_merged_df.sort_values(by="industry_counts_y", ascending=True)
final_merged_df.head()

Unnamed: 0,Sector Title,disruptive_eventType_count,exploititive_eventType_count,mixed_eventType_count,undetermined_eventType_count,industry_counts_x,Criminal,Hacktivist,Hacktvist,Hobbyist,...,Financial,Industrial-Espionage,Personal Attack,Political-Espionage,Protest,"Protest,Financial",Protest;Political-Espionage,Sabotage,Undetermined Motive,industry_counts
0,Medusa,0,0,1,0,1,1,0,0,0,...,1,0,0,0,0,0,0,0,0,1
1,"Agriculture, Forestry, Fishing and Hunting",6,3,9,0,18,13,4,0,0,...,10,0,0,0,4,0,0,1,3,18
2,Management of Companies and Enterprises,5,8,7,0,20,16,2,0,0,...,13,0,0,1,2,0,0,1,3,20
3,Health Care and Social Services,4,11,11,1,27,27,0,0,0,...,27,0,0,0,0,0,0,0,0,27
4,Construction,9,10,27,1,47,39,7,0,1,...,34,0,0,0,8,0,0,0,5,47


In [12]:
## create lists of values for certain columns for use throughout
listEvent = [
    "disruptive_count",
    "exploitive_count",
    "mixed_count",
    "undetermined_count",
]

listActor = [
    "Criminal",
    "Hacktivist",
    "Hobbyist",
    "Nation-State",
    "Terrorist",
    "Undetermined",
]

listMotive = [
    "Financial",
    "Industrial-Espionage",
    "Personal Attack",
    "Political-Espionage",
    "Protest",
    "Protest,Financial",
    "Protest;Political-Espionage",
    "Sabotage",
    "Undetermined Motive",
]

industry_types = [
    "Agriculture, Forestry, Fishing and HuntingMining",
    "Utilities",
    "Construction",
    "Manufacturing",
    "Wholesale Trade",
    "Retail Trade",
    "Transportation and Warehousing",
    "Information",
    "Finance and Insurance",
    "Real Estate Rental and Leasing",
    "Professional, Scientific, and Technical Services",
    "Management of Companies and Enterprises",
    "Administrative and Support and Waste Management and Remediation Services",
    "Educational Services",
    "Health Care and Social Assistance",
    "Arts, Entertainment, and Recreation",
    "Accommodation and Food Services",
    "Other Services (except Public Administration)",
    "Public Administration",
]


In [14]:
## create industryDF which stores event_date, industry, and counts
## create timeDF which stores event_date and counts

df["event_date"] = pd.to_datetime(df["event_date"], format="mixed")

event_types = ["Undetermined", "Disruptive", "Exploitive", "Mixed"]
filtered_df = df[df["event_type"].isin(event_types)]

timeDf = (
    filtered_df.groupby([pd.Grouper(key="event_date", freq="M")])
    .size()
    .reset_index(name="counts")
)

# Convert event_date to datetime
df["event_date"] = pd.to_datetime(df["event_date"])

# Filter rows based on event_type values
event_types = ["Undetermined", "Disruptive", "Exploitive", "Mixed"]
filtered_df = df[df["event_type"].isin(event_types)]

# Group by year, month, and event_type, then count occurrences
grouped_df = (
    filtered_df.groupby([pd.Grouper(key="event_date", freq="M"), "event_type"])
    .size()
    .reset_index(name="counts")
)


filtered_df = df[df["industry"].isin(industry_types)]

# Group by year, month, and event_type, then count occurrences
industryDf = (
    filtered_df.groupby([pd.Grouper(key="event_date", freq="M"), "industry"])
    .size()
    .reset_index(name="counts")
)

In [15]:
# df_health contains only the data of attacks on the Health Care and Social Assistance industr
df_health = df[df["industry"] == "Health Care and Social Assistance"]

In [16]:
# df_publicAdmin contains only data of attacks on the Public Adminsitration Industry
df_publicAdmin = df[df["industry"] == "Public Administration"]

**Graph**

In [17]:
## add all

**Cyber-Attack Actor & Motive Analysis**

**Gini & GDP Bining**

Time Series Analysis/
