# An Analysis of the Cybercrime landscape in an AI World

![Banner](./assets/banner.jpeg)

## Topic
*What problem are you (or your stakeholder) trying to address?*
üìù <!-- Answer Below -->
#### <span style = 'color:green'>Understanding whether AI availability has contributed to rising cybercrime rates and attack sophistication.</span>

## Project Question
*What specific question are you seeking to answer with this project?*
*This is not the same as the questions you ask to limit the scope of the project.*
üìù <!-- Answer Below -->
#### <span style = 'color:green'>Is there a measurable correlation between AI accessibility and changes in cybercrime trends?</span>

## What would an answer look like?
*What is your hypothesized answer to your question?*
üìù <!-- Answer Below -->
#### <span style = 'color: green'>AI availability has likely contributed to an increase in cybercrime volume and sophistication, as these tools lower technical barriers for attackers and eliminate traditional red flags such as misspellings in phishing emails.</span>

## Data Sources
*What 3 data sources have you identified for this project?*
*How are you going to relate these datasets?*
üìù <!-- Answer Below -->
* **Cyber Events Database:** The Cyber Events Database consists of publicly available information on cyber events
    * https://cissm.umd.edu/research-impact/publications/cyber-events-database-home
* **Global Cybersecurity Threats (2015-2024):** A comprehensive dataset tracking cybersecurity incidents, attack vectors, threat 
    * https://www.kaggle.com/datasets/atharvasoundankar/global-cybersecurity-threats-2015-2024
* **AI incident database:** Documenting the times when things go wrong with AI solutions
    * https://www.kaggle.com/datasets/konradb/ai-incident-database
* **Epoch AI:** Comprehensive database of over 3200 models tracks key factors driving machine learning progress
    * https://epoch.ai/data/ai-models 

## Approach and Analysis
*What is your approach to answering your project question?*
*How will you use the identified data to answer your project question?*
üìù <!-- Start Discussing the project here; you can add as many code cells as you need -->
<br>
#### <span style = 'color:green'>ChatGPT's public release was November of 2022 we will use that as a data point to compare metrics before and after wide spread AI availability. The Global Cybersecurity Threats dataset provides volume and attack type trends, the Cyber Events Database shows incident level context on motives and actors, and the AI Incident Database identifies specific cases of AI use allowing us to try and correlate AI availability with changes in cybercrime patterns.</span>

In [3]:
# Imports
import pandas as pd
import numpy as np

In [4]:
# Load datasets

# AI Incident Database
ai_incidents = pd.read_csv('data/AI_incidents_database.csv')

# Global Cybersecurity Threats
cyber_threats = pd.read_csv('data/Global_Cybersecurity_Threats_2015_2024.csv')

# CISSM Cyber Events Database
cyber_events = pd.read_csv('data/CISSM_Cyber_Events_Database_2014_Oct_2025.csv')

# Epoch AI Model Tracking
epoch_ai_models = pd.read_csv('data/epoch_ai_models.csv')

In [5]:
# Check first few rows of each dataset

display("AI Incidents Database")
display(ai_incidents.head())

display("Global Cybersecurity Threats")
display(cyber_threats.head())

display("CISSM Cyber Events Database")
display(cyber_events.head())

display("Epoch AI Model Tracking")
display(epoch_ai_models.head())

'AI Incidents Database'

Unnamed: 0,_id,incident_id,date,reports,Alleged deployer of AI system,Alleged developer of AI system,Alleged harmed or nearly harmed parties,description,title
0,ObjectId(625763de343edc875fe63a15),23,2017-11-08,"[242,243,244,245,246,247,248,249,250,253,254,2...","[""navya"",""keolis-north-america""]","[""navya"",""keolis-north-america""]","[""navya"",""keolis-north-america"",""bus-passengers""]",A self-driving public shuttle by Keolis North ...,Las Vegas Self-Driving Bus Involved in Accident
1,ObjectId(625763dc343edc875fe63a02),4,2018-03-18,"[629,630,631,632,633,634,635,636,637,638,639,6...","[""uber""]","[""uber""]","[""elaine-herzberg"",""pedestrians""]",An Uber autonomous vehicle (AV) in autonomous ...,Uber AV Killed Pedestrian in Arizona
2,ObjectId(625763db343edc875fe639ff),1,2015-05-19,"[1,2,3,4,5,6,7,8,9,10,11,12,14,15]","[""youtube""]","[""youtube""]","[""children""]",YouTube‚Äôs content filtering and recommendation...,Google‚Äôs YouTube Kids App Presents Inappropria...
3,ObjectId(625763de343edc875fe63a10),18,2015-04-04,"[130,131,132,133,134,135,136,137,138,1367,1368]","[""google""]","[""google""]","[""women""]",Google Image returns results that under-repres...,Gender Biases of Google Image Search
4,ObjectId(625763dd343edc875fe63a0a),12,2016-07-21,[42],"[""microsoft-research"",""boston-university""]","[""microsoft-research"",""google"",""boston-univers...","[""women"",""minority-groups""]",Researchers from Boston University and Microso...,Common Biases of Vector Embeddings


'Global Cybersecurity Threats'

Unnamed: 0,Country,Year,Attack Type,Target Industry,Financial Loss (in Million $),Number of Affected Users,Attack Source,Security Vulnerability Type,Defense Mechanism Used,Incident Resolution Time (in Hours)
0,China,2019,Phishing,Education,80.53,773169,Hacker Group,Unpatched Software,VPN,63
1,China,2019,Ransomware,Retail,62.19,295961,Hacker Group,Unpatched Software,Firewall,71
2,India,2017,Man-in-the-Middle,IT,38.65,605895,Hacker Group,Weak Passwords,VPN,20
3,UK,2024,Ransomware,Telecommunications,41.44,659320,Nation-state,Social Engineering,AI-based Detection,7
4,Germany,2018,Man-in-the-Middle,IT,74.41,810682,Insider,Social Engineering,VPN,68


'CISSM Cyber Events Database'

Unnamed: 0,slug,original_method,event_date,reported_date,year,month,actor,actor_type,organization,industry_code,...,opec,gulf_coop,g7,g20,aukus,csto,oecd,osce,five_eyes,change_log
0,1f72c2eb8ab303e4,1,2014-01-01,,2014,1,Undetermined,Criminal,Barry University,61,...,0,0,1,1,1,0,1,1,1,
1,ecac8b3e60a2f72f,1,2014-01-01,,2014,1,Undetermined,Criminal,Record Assist LLC,54,...,0,0,1,1,1,0,1,1,1,
2,3bbe0695e2d019f3,1,2014-01-01,,2014,1,Syrian Electronic Army,Hacktivist,Skype's Social Media,54,...,0,0,1,1,1,0,1,1,1,
3,6100014f6ca84b3d,1,2014-01-02,,2014,1,Undetermined,Criminal,Snapchat,51,...,0,0,1,1,1,0,1,1,1,
4,3a94b8cf6dde1f66,1,2014-01-03,,2014,1,DERP Trolling,Undetermined,Battle.net,51,...,0,0,1,1,1,0,1,1,1,


'Epoch AI Model Tracking'

Unnamed: 0,Model,Domain,Task,Organization,Authors,Publication date,Reference,Link,Citations,Notability criteria,...,Training compute cost (2023 USD),Utilization notes,Numerical format,Frontier model,Training power draw (W),Training compute estimation method,Hugging Face developer id,Post-training compute (FLOP),Post-training compute notes,Hardware utilization (HFU)
0,Claude Opus 4.5,"Language,Multimodal,Vision","Code generation,Language modeling/generation,Q...",Anthropic,,2025-11-24,Introducing Claude Opus 4.5,https://www.anthropic.com/news/claude-opus-4-5,,SOTA improvement,...,,,,,,,,,,
1,Gemini 3 Pro,"Multimodal,Language,Vision",Language modeling/generation,Google DeepMind,,2025-11-18,A new era of intelligence with Gemini 3,https://blog.google/products/gemini/gemini-3/,,Significant use,...,,,,,,,,,,
2,GPT-5.1,"Multimodal,Language,Vision","Language modeling/generation,Question answering",OpenAI,,2025-11-13,"GPT-5.1: A smarter, more conversational ChatGPT",https://openai.com/index/gpt-5-1/,,Significant use,...,,,,,,,,,,
3,Kimi K2 Thinking,Language,"Language modeling/generation,Question answerin...",Moonshot,,2025-11-06,Introducing Kimi K2 Thinking,https://moonshotai.github.io/Kimi-K2/thinking,,SOTA improvement,...,,,,,,Comparison with other models,moonshotai,,,
4,Gen-0,Robotics,Robotic manipulation,Generalist,,2025-11-04,GEN-0 / Embodied Foundation Models That Scale ...,https://generalistai.com/blog/nov-04-2025-GEN-0,,,...,,,,,,,,,,


In [8]:
# Get info about each dataset
display("Dataset Shapes:")
display("AI Incidents:", ai_incidents.shape)
display("Global Cybersecurity Threats:", cyber_threats.shape)
display("CISSM Cyber Events:", cyber_events.shape)
display("Epoch AI Model Tracking:")
display(epoch_ai_models.shape)
display("AI Incidents Database Info")
display(ai_incidents.info())
display("Global Cybersecurity Threats Info")
display(cyber_threats.info())
display("CISSM Cyber Events Database Info")
display(cyber_events.info())
display("Epoch AI Model Tracking Info")
display(epoch_ai_models.info())

'Dataset Shapes:'

'AI Incidents:'

(514, 9)

'Global Cybersecurity Threats:'

(3000, 10)

'CISSM Cyber Events:'

(15789, 44)

'Epoch AI Model Tracking:'

(3204, 56)

'AI Incidents Database Info'

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 514 entries, 0 to 513
Data columns (total 9 columns):
 #   Column                                   Non-Null Count  Dtype 
---  ------                                   --------------  ----- 
 0   _id                                      514 non-null    object
 1   incident_id                              514 non-null    int64 
 2   date                                     514 non-null    object
 3   reports                                  514 non-null    object
 4   Alleged deployer of AI system            514 non-null    object
 5   Alleged developer of AI system           514 non-null    object
 6   Alleged harmed or nearly harmed parties  514 non-null    object
 7   description                              514 non-null    object
 8   title                                    514 non-null    object
dtypes: int64(1), object(8)
memory usage: 36.3+ KB


None

'Global Cybersecurity Threats Info'

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 10 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   Country                              3000 non-null   object 
 1   Year                                 3000 non-null   int64  
 2   Attack Type                          3000 non-null   object 
 3   Target Industry                      3000 non-null   object 
 4   Financial Loss (in Million $)        3000 non-null   float64
 5   Number of Affected Users             3000 non-null   int64  
 6   Attack Source                        3000 non-null   object 
 7   Security Vulnerability Type          3000 non-null   object 
 8   Defense Mechanism Used               3000 non-null   object 
 9   Incident Resolution Time (in Hours)  3000 non-null   int64  
dtypes: float64(1), int64(3), object(6)
memory usage: 234.5+ KB


None

'CISSM Cyber Events Database Info'

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15789 entries, 0 to 15788
Data columns (total 44 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   slug             15789 non-null  object
 1   original_method  15789 non-null  int64 
 2   event_date       15789 non-null  object
 3   reported_date    1233 non-null   object
 4   year             15789 non-null  int64 
 5   month            15789 non-null  int64 
 6   actor            15789 non-null  object
 7   actor_type       15789 non-null  object
 8   organization     15789 non-null  object
 9   industry_code    15789 non-null  int64 
 10  industry         15789 non-null  object
 11  motive           15789 non-null  object
 12  event_type       15789 non-null  object
 13  event_subtype    15789 non-null  object
 14  magnitude        477 non-null    object
 15  duration         477 non-null    object
 16  scope            477 non-null    object
 17  ip               870 non-null  

None

'Epoch AI Model Tracking Info'

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3204 entries, 0 to 3203
Data columns (total 56 columns):
 #   Column                              Non-Null Count  Dtype  
---  ------                              --------------  -----  
 0   Model                               3204 non-null   object 
 1   Domain                              3120 non-null   object 
 2   Task                                3086 non-null   object 
 3   Organization                        3123 non-null   object 
 4   Authors                             2454 non-null   object 
 5   Publication date                    3186 non-null   object 
 6   Reference                           3047 non-null   object 
 7   Link                                3170 non-null   object 
 8   Citations                           1268 non-null   float64
 9   Notability criteria                 905 non-null    object 
 10  Notability criteria notes           773 non-null    object 
 11  Parameters                          2070 no

None

In [9]:
# Count null or missing values
display("AI Incidents Database Missing Values")
display(ai_incidents.isnull().sum())
display("Global Cybersecurity Threats Missing Values")
display(cyber_threats.isnull().sum())
display("CISSM Cyber Events Database Missing Values")
display(cyber_events.isnull().sum())
display("Epoch AI Model Tracking Missing Values")
display(epoch_ai_models.isnull().sum())

'AI Incidents Database Missing Values'

_id                                        0
incident_id                                0
date                                       0
reports                                    0
Alleged deployer of AI system              0
Alleged developer of AI system             0
Alleged harmed or nearly harmed parties    0
description                                0
title                                      0
dtype: int64

'Global Cybersecurity Threats Missing Values'

Country                                0
Year                                   0
Attack Type                            0
Target Industry                        0
Financial Loss (in Million $)          0
Number of Affected Users               0
Attack Source                          0
Security Vulnerability Type            0
Defense Mechanism Used                 0
Incident Resolution Time (in Hours)    0
dtype: int64

'CISSM Cyber Events Database Missing Values'

slug                   0
original_method        0
event_date             0
reported_date      14556
year                   0
month                  0
actor                  0
actor_type             0
organization           0
industry_code          0
industry               0
motive                 0
event_type             0
event_subtype          0
magnitude          15312
duration           15312
scope              15312
ip                 14919
org_data           14945
cust_data          14931
description            1
source_url             4
country                0
actor_country          0
state               8174
county              8336
nato                   0
eu                     0
shanghai_coop          0
oas                    0
mercosur               0
au                     0
ecowas                 0
asean                  0
opec                   0
gulf_coop              0
g7                     0
g20                    0
aukus                  0
csto                   0


'Epoch AI Model Tracking Missing Values'

Model                                    0
Domain                                  84
Task                                   118
Organization                            81
Authors                                750
Publication date                        18
Reference                              157
Link                                    34
Citations                             1936
Notability criteria                   2299
Notability criteria notes             2431
Parameters                            1134
Parameters notes                      1498
Training compute (FLOP)               1836
Training compute notes                1611
Training dataset size (gradients)     1891
Dataset size notes                    1577
Training time (hours)                 2663
Training time notes                   2612
Training hardware                     2047
Approach                              2904
Confidence                               0
Abstract                               395
Epochs     

In [8]:
# Data cleaning and preprocessing

# AI Incidents
# Convert date string to datetime and extract year for time based analysis, we need this to filter by month and year later
ai_incidents['date'] = pd.to_datetime(ai_incidents['date'])
ai_incidents['year'] = ai_incidents['date'].dt.year
ai_incidents['month'] = ai_incidents['date'].dt.month

# Select only columns relevant for analysis
ai_incidents_clean = ai_incidents[[
    'incident_id', 'date', 'year', 'month', 'title', 'description',
    'Alleged deployer of AI system', 'Alleged developer of AI system'
]].copy()

# Global Cybersecurity Threats
# Select relevant columns for trend and impact analysis
cyber_threats_clean = cyber_threats[[
    'Year', 'Country', 'Attack Type', 'Target Industry',
    'Financial Loss (in Million $)', 'Number of Affected Users',
    'Attack Source', 'Security Vulnerability Type'
]].copy()

# CISSM Cyber Events
# Convert event_date to datetime for time based filtering, we need this to filter by month and year later
cyber_events['event_date'] = pd.to_datetime(cyber_events['event_date'])

# Select columns relevant to motive, actor, and event classification
cyber_events_clean = cyber_events[[
    'event_date', 'year', 'month', 'actor_type', 'motive',
    'event_type', 'event_subtype', 'industry', 'country', 'description'
]].copy()

display("Data cleaning and preprocessing completed.")
display("Cleaned AI Incidents Dataset")
display(ai_incidents_clean.head())
display("Cleaned Global Cybersecurity Threats Dataset")
display(cyber_threats_clean.head())
display("Cleaned CISSM Cyber Events Dataset")
display(cyber_events_clean.head())

'Data cleaning and preprocessing completed.'

'Cleaned AI Incidents Dataset'

Unnamed: 0,incident_id,date,year,month,title,description,Alleged deployer of AI system,Alleged developer of AI system
0,23,2017-11-08,2017,11,Las Vegas Self-Driving Bus Involved in Accident,A self-driving public shuttle by Keolis North ...,"[""navya"",""keolis-north-america""]","[""navya"",""keolis-north-america""]"
1,4,2018-03-18,2018,3,Uber AV Killed Pedestrian in Arizona,An Uber autonomous vehicle (AV) in autonomous ...,"[""uber""]","[""uber""]"
2,1,2015-05-19,2015,5,Google‚Äôs YouTube Kids App Presents Inappropria...,YouTube‚Äôs content filtering and recommendation...,"[""youtube""]","[""youtube""]"
3,18,2015-04-04,2015,4,Gender Biases of Google Image Search,Google Image returns results that under-repres...,"[""google""]","[""google""]"
4,12,2016-07-21,2016,7,Common Biases of Vector Embeddings,Researchers from Boston University and Microso...,"[""microsoft-research"",""boston-university""]","[""microsoft-research"",""google"",""boston-univers..."


'Cleaned Global Cybersecurity Threats Dataset'

Unnamed: 0,Year,Country,Attack Type,Target Industry,Financial Loss (in Million $),Number of Affected Users,Attack Source,Security Vulnerability Type
0,2019,China,Phishing,Education,80.53,773169,Hacker Group,Unpatched Software
1,2019,China,Ransomware,Retail,62.19,295961,Hacker Group,Unpatched Software
2,2017,India,Man-in-the-Middle,IT,38.65,605895,Hacker Group,Weak Passwords
3,2024,UK,Ransomware,Telecommunications,41.44,659320,Nation-state,Social Engineering
4,2018,Germany,Man-in-the-Middle,IT,74.41,810682,Insider,Social Engineering


'Cleaned CISSM Cyber Events Dataset'

Unnamed: 0,event_date,year,month,actor_type,motive,event_type,event_subtype,industry,country,description
0,2014-01-01,2014,1,Criminal,Undetermined,Exploitive,Exploitation of End Hosts,Educational Services,United States of America,Barry University notifies patients of its Foot...
1,2014-01-01,2014,1,Criminal,Undetermined,Exploitive,Exploitation of Application Server,"Professional, Scientific, and Technical Services",United States of America,Record Assist LLC notifies of an unauthorized ...
2,2014-01-01,2014,1,Hacktivist,Protest,Disruptive,Message Manipulation,"Professional, Scientific, and Technical Services",United States of America,The Syrian Electronic Army hacks Skype's Twitt...
3,2014-01-02,2014,1,Criminal,Undetermined,Exploitive,Exploitation of Application Server,Information,United States of America,Greyhat hackers publish the partial phone numb...
4,2014-01-03,2014,1,Undetermined,Undetermined,Disruptive,External Denial of Service,Information,United States of America,"The servers for Steam, Origin, Battle.net, and..."


In [9]:
# Get info and check for missing values in cleaned datasets
display("Dataset Shapes After Cleaning:")
display("AI Incidents:", ai_incidents_clean.shape)
display("Global Cybersecurity Threats:", cyber_threats_clean.shape)
display("CISSM Cyber Events:", cyber_events_clean.shape)

display("AI Incidents Database Info")
display(ai_incidents_clean.info())
display(ai_incidents_clean.isna().sum())

display("Global Cybersecurity Threats Info")
display(cyber_threats_clean.info())
display(cyber_threats_clean.isna().sum())

display("CISSM Cyber Events Database Info")
display(cyber_events_clean.info())
display(cyber_events_clean.isna().sum())


'Dataset Shapes After Cleaning:'

'AI Incidents:'

(514, 8)

'Global Cybersecurity Threats:'

(3000, 8)

'CISSM Cyber Events:'

(15789, 10)

'AI Incidents Database Info'

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 514 entries, 0 to 513
Data columns (total 8 columns):
 #   Column                          Non-Null Count  Dtype         
---  ------                          --------------  -----         
 0   incident_id                     514 non-null    int64         
 1   date                            514 non-null    datetime64[ns]
 2   year                            514 non-null    int32         
 3   month                           514 non-null    int32         
 4   title                           514 non-null    object        
 5   description                     514 non-null    object        
 6   Alleged deployer of AI system   514 non-null    object        
 7   Alleged developer of AI system  514 non-null    object        
dtypes: datetime64[ns](1), int32(2), int64(1), object(4)
memory usage: 28.2+ KB


None

incident_id                       0
date                              0
year                              0
month                             0
title                             0
description                       0
Alleged deployer of AI system     0
Alleged developer of AI system    0
dtype: int64

'Global Cybersecurity Threats Info'

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 8 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Year                           3000 non-null   int64  
 1   Country                        3000 non-null   object 
 2   Attack Type                    3000 non-null   object 
 3   Target Industry                3000 non-null   object 
 4   Financial Loss (in Million $)  3000 non-null   float64
 5   Number of Affected Users       3000 non-null   int64  
 6   Attack Source                  3000 non-null   object 
 7   Security Vulnerability Type    3000 non-null   object 
dtypes: float64(1), int64(2), object(5)
memory usage: 187.6+ KB


None

Year                             0
Country                          0
Attack Type                      0
Target Industry                  0
Financial Loss (in Million $)    0
Number of Affected Users         0
Attack Source                    0
Security Vulnerability Type      0
dtype: int64

'CISSM Cyber Events Database Info'

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15789 entries, 0 to 15788
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   event_date     15789 non-null  datetime64[ns]
 1   year           15789 non-null  int64         
 2   month          15789 non-null  int64         
 3   actor_type     15789 non-null  object        
 4   motive         15789 non-null  object        
 5   event_type     15789 non-null  object        
 6   event_subtype  15789 non-null  object        
 7   industry       15789 non-null  object        
 8   country        15789 non-null  object        
 9   description    15788 non-null  object        
dtypes: datetime64[ns](1), int64(2), object(7)
memory usage: 1.2+ MB


None

event_date       0
year             0
month            0
actor_type       0
motive           0
event_type       0
event_subtype    0
industry         0
country          0
description      1
dtype: int64

In [10]:
# Begin exploratory data analysis 
display("Begin exploratory data analysis")

# Understand the time span for each dataset
display("Date Ranges")
display(f"AI Incidents: {ai_incidents_clean['year'].min()} - {ai_incidents_clean['year'].max()}")
display(f"Cyber Threats: {cyber_threats_clean['Year'].min()} - {cyber_threats_clean['Year'].max()}")
display(f"Cyber Events: {cyber_events_clean['year'].min()} - {cyber_events_clean['year'].max()}")

# Yearly Incident Counts
display("AI Incidents by Year")
display(ai_incidents_clean.groupby('year').size().reset_index(name='count'))

display("Cyber Threats by Year")
display(cyber_threats_clean.groupby('Year').size().reset_index(name='count'))

display("Cyber Events by Year")
display(cyber_events_clean.groupby('year').size().reset_index(name='count'))

# Categories of types of attacks, motives, and actors
display("Cyber Threats - Attack Types")
display(cyber_threats_clean['Attack Type'].value_counts())

display("Cyber Events - Event Types")
display(cyber_events_clean['event_type'].value_counts())

display("Cyber Events - Actor Types")
display(cyber_events_clean['actor_type'].value_counts())

display("Cyber Events - Motives")
display(cyber_events_clean['motive'].value_counts())


'Begin exploratory data analysis'

'Date Ranges'

'AI Incidents: 1983 - 2023'

'Cyber Threats: 2015 - 2024'

'Cyber Events: 2014 - 2025'

'AI Incidents by Year'

Unnamed: 0,year,count
0,1983,1
1,1992,1
2,1996,1
3,1998,1
4,1999,1
5,2003,4
6,2006,1
7,2007,1
8,2008,3
9,2009,2


'Cyber Threats by Year'

Unnamed: 0,Year,count
0,2015,277
1,2016,285
2,2017,319
3,2018,310
4,2019,263
5,2020,315
6,2021,299
7,2022,318
8,2023,315
9,2024,299


'Cyber Events by Year'

Unnamed: 0,year,count
0,2014,633
1,2015,857
2,2016,1104
3,2017,810
4,2018,824
5,2019,1067
6,2020,1748
7,2021,1431
8,2022,2566
9,2023,2484


'Cyber Threats - Attack Types'

Attack Type
DDoS                 531
Phishing             529
SQL Injection        503
Ransomware           493
Malware              485
Man-in-the-Middle    459
Name: count, dtype: int64

'Cyber Events - Event Types'

event_type
Exploitive      8043
Disruptive      4824
Mixed           2700
Undetermined     201
Disruptive        21
Name: count, dtype: int64

'Cyber Events - Actor Types'

actor_type
Criminal        11824
Hacktivist       2106
Nation-State      973
Undetermined      623
Hobbyist          198
Nation-state       35
Terrorist          30
Name: count, dtype: int64

'Cyber Events - Motives'

motive
Financial                                   9173
Undetermined                                3329
Protest                                     1876
Political-Espionage                          787
Sabotage                                     382
Industrial-Espionage                         111
Personal Attack                               93
Political-espionage                           26
Reputation                                     5
Political-Espionage,Industrial-Espionage       3
Political-Espionage,Sabotage                   2
Protest,Financial                              1
Protest,Political-Espionage                    1
Name: count, dtype: int64

In [11]:
# Define analysis period and AI era
# Define AI era based on ChatGPT public release (November 2022)
# Pre AI: 2015-2022 / Post-AI: 2023+ I wish we had more relevant data for 2024 but this is what we have to work with

# AI Incidents Dataset
# Filter analysis window and add era column
ai_incidents_clean = ai_incidents_clean[ai_incidents_clean['year'] >= 2015].copy()
ai_incidents_clean['ai_era'] = np.where(ai_incidents_clean['year'] >= 2023, 'post', 'pre')

# Global Cybersecurity Threats
# Add era column
cyber_threats_clean['ai_era'] = np.where(cyber_threats_clean['Year'] >= 2023, 'post', 'pre')

# CISSM Cyber Events Database
# Filter analysis window and add era column
cyber_events_clean = cyber_events_clean[cyber_events_clean['year'] >= 2015].copy()
cyber_events_clean['ai_era'] = np.where(cyber_events_clean['year'] >= 2023, 'post', 'pre')

# Verify Era Distribution
display("AI Incidents by Era")
display(ai_incidents_clean['ai_era'].value_counts())

display("\nCyber Threats by Era")
display(cyber_threats_clean['ai_era'].value_counts())

display("\nCyber Events by Era")
display(cyber_events_clean['ai_era'].value_counts())

'AI Incidents by Era'

ai_era
pre     424
post     45
Name: count, dtype: int64

'\nCyber Threats by Era'

ai_era
pre     2386
post     614
Name: count, dtype: int64

'\nCyber Events by Era'

ai_era
pre     10407
post     4749
Name: count, dtype: int64

In [16]:
# Try to understand impact and severity of incidents across eras

# Financial Impact
display("Cyber Threats - Average Financial Loss by Era")
display(cyber_threats_clean.groupby('ai_era')['Financial Loss (in Million $)'].mean().reset_index(name='avg_loss_million'))

# Financial loss seems skewed by outliers, let's look deeper
# After digging into the data, it appears to be generated data for illustration purposes, so we will just show summary statistics and sample values
display("Financial Loss - Summary Statistics")
display(cyber_threats_clean['Financial Loss (in Million $)'].describe())

display("Financial Loss - Sample Values")
display(cyber_threats_clean['Financial Loss (in Million $)'].head(20))

display("Cyber Threats - Average Affected Users by Era")
display(cyber_threats_clean.groupby('ai_era')['Number of Affected Users'].mean().reset_index(name='avg_affected_users'))

display("Cyber Threats - Attack Types by Era")
display(cyber_threats_clean.groupby(['ai_era', 'Attack Type']).size().reset_index(name='count'))

# Event Types & Motives
display("Cyber Events - Event Types by Era")
display(cyber_events_clean.groupby(['ai_era', 'event_type']).size().reset_index(name='count'))

display("Cyber Events - Motives by Era")
display(cyber_events_clean.groupby(['ai_era', 'motive']).size().reset_index(name='count'))

display("Cyber Events - Actor Types by Era")
display(cyber_events_clean.groupby(['ai_era', 'actor_type']).size().reset_index(name='count'))


'Cyber Threats - Average Financial Loss by Era'

Unnamed: 0,ai_era,avg_loss_million
0,post,51.127638
1,pre,50.329648


'Financial Loss - Summary Statistics'

count    3000.000000
mean       50.492970
std        28.791415
min         0.500000
25%        25.757500
50%        50.795000
75%        75.630000
max        99.990000
Name: Financial Loss (in Million $), dtype: float64

'Financial Loss - Sample Values'

0     80.53
1     62.19
2     38.65
3     41.44
4     74.41
5     98.24
6     33.26
7     59.23
8     16.88
9     69.14
10    88.67
11    38.81
12    30.56
13    58.37
14    48.01
15    64.31
16    13.04
17    93.14
18    14.01
19    36.45
Name: Financial Loss (in Million $), dtype: float64

'Cyber Threats - Average Affected Users by Era'

Unnamed: 0,ai_era,avg_affected_users
0,post,500630.249186
1,pre,505727.341157


'Cyber Threats - Attack Types by Era'

Unnamed: 0,ai_era,Attack Type,count
0,post,DDoS,105
1,post,Malware,98
2,post,Man-in-the-Middle,97
3,post,Phishing,108
4,post,Ransomware,103
5,post,SQL Injection,103
6,pre,DDoS,426
7,pre,Malware,387
8,pre,Man-in-the-Middle,362
9,pre,Phishing,421


'Cyber Events - Event Types by Era'

Unnamed: 0,ai_era,event_type,count
0,post,Disruptive,972
1,post,Disruptive,21
2,post,Exploitive,2470
3,post,Mixed,1178
4,post,Undetermined,108
5,pre,Disruptive,3590
6,pre,Exploitive,5215
7,pre,Mixed,1509
8,pre,Undetermined,93


'Cyber Events - Motives by Era'

Unnamed: 0,ai_era,motive,count
0,post,Financial,3354
1,post,Industrial-Espionage,22
2,post,Personal Attack,1
3,post,Political-Espionage,245
4,post,"Political-Espionage,Industrial-Espionage",3
5,post,"Political-Espionage,Sabotage",2
6,post,Political-espionage,26
7,post,Protest,514
8,post,Reputation,5
9,post,Sabotage,82


'Cyber Events - Actor Types by Era'

Unnamed: 0,ai_era,actor_type,count
0,post,Criminal,3652
1,post,Hacktivist,482
2,post,Hobbyist,18
3,post,Nation-State,329
4,post,Nation-state,35
5,post,Terrorist,2
6,post,Undetermined,231
7,pre,Criminal,7824
8,pre,Hacktivist,1402
9,pre,Hobbyist,154


## Resources and References
*What resources and references have you used for this project?*
üìù <!-- Answer Below -->

In [12]:
# ‚ö†Ô∏è Make sure you run this cell at the end of your notebook before every submission!
!jupyter nbconvert --to python source.ipynb

[NbConvertApp] Converting notebook source.ipynb to python
[NbConvertApp] Writing 9035 bytes to source.py
