# An Analysis of the Cybercrime landscape in an AI World

![Banner](./assets/banner.jpeg)

## Topic
*What problem are you (or your stakeholder) trying to address?*
üìù <!-- Answer Below -->
#### <span style = 'color:green'>Understanding whether AI availability has contributed to rising cybercrime rates and attack sophistication.</span>

## Project Question
*What specific question are you seeking to answer with this project?*
*This is not the same as the questions you ask to limit the scope of the project.*
üìù <!-- Answer Below -->
#### <span style = 'color:green'>Is there a measurable correlation between AI accessibility and changes in cybercrime trends?</span>

## What would an answer look like?
*What is your hypothesized answer to your question?*
üìù <!-- Answer Below -->
#### <span style = 'color: green'>AI availability has likely contributed to an increase in cybercrime volume and sophistication, as these tools lower technical barriers for attackers and eliminate traditional red flags such as misspellings in phishing emails.</span>

## Data Sources
*What 3 data sources have you identified for this project?*
*How are you going to relate these datasets?*
üìù <!-- Answer Below -->
* **Cyber Events Database:** The Cyber Events Database consists of publicly available information on cyber events
    * https://cissm.umd.edu/research-impact/publications/cyber-events-database-home
* **Global Cybersecurity Threats (2015-2024):** A comprehensive dataset tracking cybersecurity incidents, attack vectors, threat 
    * https://www.kaggle.com/datasets/atharvasoundankar/global-cybersecurity-threats-2015-2024
* **AI incident database:** Documenting the times when things go wrong with AI solutions
    * https://www.kaggle.com/datasets/konradb/ai-incident-database

## Approach and Analysis
*What is your approach to answering your project question?*
*How will you use the identified data to answer your project question?*
üìù <!-- Start Discussing the project here; you can add as many code cells as you need -->
<br>
#### <span style = 'color:green'>ChatGPT's public release was November of 2022 we will use that as a data point to compare metrics before and after wide spread AI availability. The Global Cybersecurity Threats dataset provides volume and attack type trends, the Cyber Events Database shows incident level context on motives and actors, and the AI Incident Database identifies specific cases of AI use allowing us to try and correlate AI availability with changes in cybercrime patterns.</span>

In [1]:
# Imports
import pandas as pd

In [12]:
# Load datasets

# AI Incident Database
ai_incidents = pd.read_csv('data/AI_incidents_database.csv')

# Global Cybersecurity Threats
cyber_threats = pd.read_csv('data/Global_Cybersecurity_Threats_2015_2024.csv')

# CISSM Cyber Events Database
cyber_events = pd.read_csv('data/CISSM_Cyber_Events_Database_2014_Oct_2025.csv')

In [6]:
# Check first few rows of each dataset

display("AI Incidents Database")
display(ai_incidents.head())

display("\nGlobal Cybersecurity Threats")
display(cyber_threats.head())

display("\nCISSM Cyber Events Database")
display(cyber_events.head())

'AI Incidents Database'

Unnamed: 0,_id,incident_id,date,reports,Alleged deployer of AI system,Alleged developer of AI system,Alleged harmed or nearly harmed parties,description,title
0,ObjectId(625763de343edc875fe63a15),23,2017-11-08,"[242,243,244,245,246,247,248,249,250,253,254,2...","[""navya"",""keolis-north-america""]","[""navya"",""keolis-north-america""]","[""navya"",""keolis-north-america"",""bus-passengers""]",A self-driving public shuttle by Keolis North ...,Las Vegas Self-Driving Bus Involved in Accident
1,ObjectId(625763dc343edc875fe63a02),4,2018-03-18,"[629,630,631,632,633,634,635,636,637,638,639,6...","[""uber""]","[""uber""]","[""elaine-herzberg"",""pedestrians""]",An Uber autonomous vehicle (AV) in autonomous ...,Uber AV Killed Pedestrian in Arizona
2,ObjectId(625763db343edc875fe639ff),1,2015-05-19,"[1,2,3,4,5,6,7,8,9,10,11,12,14,15]","[""youtube""]","[""youtube""]","[""children""]",YouTube‚Äôs content filtering and recommendation...,Google‚Äôs YouTube Kids App Presents Inappropria...
3,ObjectId(625763de343edc875fe63a10),18,2015-04-04,"[130,131,132,133,134,135,136,137,138,1367,1368]","[""google""]","[""google""]","[""women""]",Google Image returns results that under-repres...,Gender Biases of Google Image Search
4,ObjectId(625763dd343edc875fe63a0a),12,2016-07-21,[42],"[""microsoft-research"",""boston-university""]","[""microsoft-research"",""google"",""boston-univers...","[""women"",""minority-groups""]",Researchers from Boston University and Microso...,Common Biases of Vector Embeddings


'\nGlobal Cybersecurity Threats'

Unnamed: 0,Country,Year,Attack Type,Target Industry,Financial Loss (in Million $),Number of Affected Users,Attack Source,Security Vulnerability Type,Defense Mechanism Used,Incident Resolution Time (in Hours)
0,China,2019,Phishing,Education,80.53,773169,Hacker Group,Unpatched Software,VPN,63
1,China,2019,Ransomware,Retail,62.19,295961,Hacker Group,Unpatched Software,Firewall,71
2,India,2017,Man-in-the-Middle,IT,38.65,605895,Hacker Group,Weak Passwords,VPN,20
3,UK,2024,Ransomware,Telecommunications,41.44,659320,Nation-state,Social Engineering,AI-based Detection,7
4,Germany,2018,Man-in-the-Middle,IT,74.41,810682,Insider,Social Engineering,VPN,68


'\nCISSM Cyber Events Database'

Unnamed: 0,slug,original_method,event_date,reported_date,year,month,actor,actor_type,organization,industry_code,...,opec,gulf_coop,g7,g20,aukus,csto,oecd,osce,five_eyes,change_log
0,1f72c2eb8ab303e4,1,2014-01-01,,2014,1,Undetermined,Criminal,Barry University,61,...,0,0,1,1,1,0,1,1,1,
1,ecac8b3e60a2f72f,1,2014-01-01,,2014,1,Undetermined,Criminal,Record Assist LLC,54,...,0,0,1,1,1,0,1,1,1,
2,3bbe0695e2d019f3,1,2014-01-01,,2014,1,Syrian Electronic Army,Hacktivist,Skype's Social Media,54,...,0,0,1,1,1,0,1,1,1,
3,6100014f6ca84b3d,1,2014-01-02,,2014,1,Undetermined,Criminal,Snapchat,51,...,0,0,1,1,1,0,1,1,1,
4,3a94b8cf6dde1f66,1,2014-01-03,,2014,1,DERP Trolling,Undetermined,Battle.net,51,...,0,0,1,1,1,0,1,1,1,


In [10]:
# Get info about each dataset
display("AI Incidents Database Info")
display(ai_incidents.info())
display("Global Cybersecurity Threats Info")
display(cyber_threats.info())
display("ISSM Cyber Events Database Info")
display(cyber_events.info())

'AI Incidents Database Info'

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 514 entries, 0 to 513
Data columns (total 9 columns):
 #   Column                                   Non-Null Count  Dtype 
---  ------                                   --------------  ----- 
 0   _id                                      514 non-null    object
 1   incident_id                              514 non-null    int64 
 2   date                                     514 non-null    object
 3   reports                                  514 non-null    object
 4   Alleged deployer of AI system            514 non-null    object
 5   Alleged developer of AI system           514 non-null    object
 6   Alleged harmed or nearly harmed parties  514 non-null    object
 7   description                              514 non-null    object
 8   title                                    514 non-null    object
dtypes: int64(1), object(8)
memory usage: 36.3+ KB


None

'Global Cybersecurity Threats Info'

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 10 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   Country                              3000 non-null   object 
 1   Year                                 3000 non-null   int64  
 2   Attack Type                          3000 non-null   object 
 3   Target Industry                      3000 non-null   object 
 4   Financial Loss (in Million $)        3000 non-null   float64
 5   Number of Affected Users             3000 non-null   int64  
 6   Attack Source                        3000 non-null   object 
 7   Security Vulnerability Type          3000 non-null   object 
 8   Defense Mechanism Used               3000 non-null   object 
 9   Incident Resolution Time (in Hours)  3000 non-null   int64  
dtypes: float64(1), int64(3), object(6)
memory usage: 234.5+ KB


None

'ISSM Cyber Events Database Info'

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15789 entries, 0 to 15788
Data columns (total 44 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   slug             15789 non-null  object
 1   original_method  15789 non-null  int64 
 2   event_date       15789 non-null  object
 3   reported_date    1233 non-null   object
 4   year             15789 non-null  int64 
 5   month            15789 non-null  int64 
 6   actor            15789 non-null  object
 7   actor_type       15789 non-null  object
 8   organization     15789 non-null  object
 9   industry_code    15789 non-null  int64 
 10  industry         15789 non-null  object
 11  motive           15789 non-null  object
 12  event_type       15789 non-null  object
 13  event_subtype    15789 non-null  object
 14  magnitude        477 non-null    object
 15  duration         477 non-null    object
 16  scope            477 non-null    object
 17  ip               870 non-null  

None

In [14]:
# Count null or missing values
display("AI Incidents Database Missing Values")
display(ai_incidents.isnull().sum())
display("Global Cybersecurity Threats Missing Values")
display(cyber_threats.isnull().sum())
display("CISSM Cyber Events Database Missing Values")
display(cyber_events.isnull().sum())

'AI Incidents Database Missing Values'

_id                                        0
incident_id                                0
date                                       0
reports                                    0
Alleged deployer of AI system              0
Alleged developer of AI system             0
Alleged harmed or nearly harmed parties    0
description                                0
title                                      0
dtype: int64

'Global Cybersecurity Threats Missing Values'

Country                                0
Year                                   0
Attack Type                            0
Target Industry                        0
Financial Loss (in Million $)          0
Number of Affected Users               0
Attack Source                          0
Security Vulnerability Type            0
Defense Mechanism Used                 0
Incident Resolution Time (in Hours)    0
dtype: int64

'CISSM Cyber Events Database Missing Values'

slug                   0
original_method        0
event_date             0
reported_date      14556
year                   0
month                  0
actor                  0
actor_type             0
organization           0
industry_code          0
industry               0
motive                 0
event_type             0
event_subtype          0
magnitude          15312
duration           15312
scope              15312
ip                 14919
org_data           14945
cust_data          14931
description            1
source_url             4
country                0
actor_country          0
state               8174
county              8336
nato                   0
eu                     0
shanghai_coop          0
oas                    0
mercosur               0
au                     0
ecowas                 0
asean                  0
opec                   0
gulf_coop              0
g7                     0
g20                    0
aukus                  0
csto                   0


## Resources and References
*What resources and references have you used for this project?*
üìù <!-- Answer Below -->

In [2]:
# ‚ö†Ô∏è Make sure you run this cell at the end of your notebook before every submission!
!jupyter nbconvert --to python source.ipynb

[NbConvertApp] Converting notebook source.ipynb to python
[NbConvertApp] Writing 1271 bytes to source.py
