## Imports

Importing all libraries necessary for our analysis

In [70]:
import pandas as pd

## Data Overview
Basic exploration of data, such as columns, missing values, and data types.


In [None]:
file = "data/Global_Cybersecurity_Threats_2015-2024.csv"
df=pd.read_csv(file)
df.head()

Unnamed: 0,Country,Year,Attack Type,Target Industry,Financial Loss (in Million $),Number of Affected Users,Attack Source,Security Vulnerability Type,Defense Mechanism Used,Incident Resolution Time (in Hours)
0,China,2019,Phishing,Education,80.53,773169,Hacker Group,Unpatched Software,VPN,63
1,China,2019,Ransomware,Retail,62.19,295961,Hacker Group,Unpatched Software,Firewall,71
2,India,2017,Man-in-the-Middle,IT,38.65,605895,Hacker Group,Weak Passwords,VPN,20
3,UK,2024,Ransomware,Telecommunications,41.44,659320,Nation-state,Social Engineering,AI-based Detection,7
4,Germany,2018,Man-in-the-Middle,IT,74.41,810682,Insider,Social Engineering,VPN,68


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 10 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   Country                              3000 non-null   object 
 1   Year                                 3000 non-null   int64  
 2   Attack Type                          3000 non-null   object 
 3   Target Industry                      3000 non-null   object 
 4   Financial Loss (in Million $)        3000 non-null   float64
 5   Number of Affected Users             3000 non-null   int64  
 6   Attack Source                        3000 non-null   object 
 7   Security Vulnerability Type          3000 non-null   object 
 8   Defense Mechanism Used               3000 non-null   object 
 9   Incident Resolution Time (in Hours)  3000 non-null   int64  
dtypes: float64(1), int64(3), object(6)
memory usage: 234.5+ KB


In [4]:
columns = ["Country","Year","Attack Type","Target Industry","Loss", "Affected Users","Attack Source","Vulnerability Type","Defense Mechanism Used","Incident Resolution Time"]
df.columns = columns
df.head()

Unnamed: 0,Country,Year,Attack Type,Target Industry,Loss,Affected Users,Attack Source,Vulnerability Type,Defense Mechanism Used,Incident Resolution Time
0,China,2019,Phishing,Education,80.53,773169,Hacker Group,Unpatched Software,VPN,63
1,China,2019,Ransomware,Retail,62.19,295961,Hacker Group,Unpatched Software,Firewall,71
2,India,2017,Man-in-the-Middle,IT,38.65,605895,Hacker Group,Weak Passwords,VPN,20
3,UK,2024,Ransomware,Telecommunications,41.44,659320,Nation-state,Social Engineering,AI-based Detection,7
4,Germany,2018,Man-in-the-Middle,IT,74.41,810682,Insider,Social Engineering,VPN,68


In [5]:
df.shape

(3000, 10)

In [6]:
df.isnull().sum()

Unnamed: 0,0
Country,0
Year,0
Attack Type,0
Target Industry,0
Loss,0
Affected Users,0
Attack Source,0
Vulnerability Type,0
Defense Mechanism Used,0
Incident Resolution Time,0


In [7]:
df.isna().sum()

Unnamed: 0,0
Country,0
Year,0
Attack Type,0
Target Industry,0
Loss,0
Affected Users,0
Attack Source,0
Vulnerability Type,0
Defense Mechanism Used,0
Incident Resolution Time,0


## Range of years of attack
Using max() and min() we can conclude that the range of the years in the dataset where cyberthreats were recorded is between 2015 and 2024.

❗❗ This implies that we are not taking in consideration any attacks done before 2015 and the ones done in 2025.

In [72]:
latest_attack = df['Year'].max()
oldest_attack = df['Year'].min()
print(f"Latest attack: {latest_attack}")
print(f"Oldest attack: {oldest_attack}")

Latest attack: 2024
Oldest attack: 2015


## Number of attacks per attack type per year
Table showing the number of attacks of each (Ransomeware, Phishing, etc..) during each year (2015 till 2024)

In [10]:
attacks_count = df.groupby("Year")["Attack Type"].value_counts()
attacks_count

Unnamed: 0_level_0,Unnamed: 1_level_0,count
Year,Attack Type,Unnamed: 2_level_1
2015,Malware,51
2015,DDoS,50
2015,Ransomware,47
2015,Phishing,46
2015,SQL Injection,42
2015,Man-in-the-Middle,41
2016,Phishing,55
2016,DDoS,53
2016,Man-in-the-Middle,47
2016,SQL Injection,47


## Attacks number in each country
There are 10 countries in this dataset. This table demonstrates how many of each attack type has been committed in each country during that period

In [12]:
attack_per_country = df.groupby("Country")["Attack Type"].value_counts()
attack_per_country

Unnamed: 0_level_0,Unnamed: 1_level_0,count
Country,Attack Type,Unnamed: 2_level_1
Australia,DDoS,61
Australia,Malware,61
Australia,Phishing,51
Australia,Man-in-the-Middle,45
Australia,Ransomware,40
Australia,SQL Injection,39
Brazil,DDoS,61
Brazil,SQL Injection,57
Brazil,Phishing,54
Brazil,Malware,51


## Total attacks in the last 5 years
Total attacks from 2020 to 2024

In [13]:
attacks_last_5_years = df[df["Year"] >= 2020]
attacks_last_5_years.shape[0]

1546

## SQL injection attacks in last 5 years

❗❗ This is just to showcase how many attacks of each type is committed during that period

❗❗ We can replace SQL Injection with any of the other attacks

In [14]:
sqli_in_5_years = df[(df["Year"]>=2020) & (df["Attack Type"]=="SQL Injection")]
sqli_in_5_years

Unnamed: 0,Country,Year,Attack Type,Target Industry,Loss,Affected Users,Attack Source,Vulnerability Type,Defense Mechanism Used,Incident Resolution Time
30,UK,2022,SQL Injection,Education,66.24,678876,Hacker Group,Social Engineering,AI-based Detection,11
37,Japan,2021,SQL Injection,Retail,82.52,214372,Insider,Unpatched Software,Encryption,12
41,India,2021,SQL Injection,IT,98.09,826976,Nation-state,Zero-day,VPN,57
78,Russia,2022,SQL Injection,Banking,60.25,662517,Unknown,Social Engineering,Firewall,40
89,India,2020,SQL Injection,Education,62.50,550656,Nation-state,Weak Passwords,Encryption,4
...,...,...,...,...,...,...,...,...,...,...
2985,Russia,2023,SQL Injection,Telecommunications,79.71,358439,Insider,Unpatched Software,Antivirus,48
2988,USA,2022,SQL Injection,IT,37.94,691377,Hacker Group,Social Engineering,Antivirus,44
2996,Brazil,2023,SQL Injection,Telecommunications,30.28,892843,Hacker Group,Zero-day,VPN,26
2998,UK,2022,SQL Injection,IT,32.17,379954,Insider,Unpatched Software,Firewall,9


## Impact of attacks

This analysis aims to highlight the impact and affect of those attacks over the countries and the people of these countries

Below we can find those analyses:
1. The highest loss attack (we can see 2 attacks that have a loss of 99.99 Million USD in Australia and China)
2. Highest number of affected users recorded in USA by a ransomware attacks
3. Highest financial loss per attack type
4. Highest financial loss per country
5. Highest financial loss per industry

In [15]:
highest_loss_attack = df[df["Loss"] == df["Loss"].max()]
highest_loss_attack

Unnamed: 0,Country,Year,Attack Type,Target Industry,Loss,Affected Users,Attack Source,Vulnerability Type,Defense Mechanism Used,Incident Resolution Time
1806,Australia,2017,SQL Injection,IT,99.99,672966,Hacker Group,Zero-day,Firewall,13
2030,China,2024,DDoS,Banking,99.99,755185,Unknown,Weak Passwords,Encryption,20


In [16]:
highest_affected_users_attacks = df[df["Affected Users"] == df["Affected Users"].max()]
highest_affected_users_attacks

Unnamed: 0,Country,Year,Attack Type,Target Industry,Loss,Affected Users,Attack Source,Vulnerability Type,Defense Mechanism Used,Incident Resolution Time
2045,USA,2023,Ransomware,Government,93.34,999635,Nation-state,Weak Passwords,Firewall,24


In [17]:
highest_loss_in_each_attacks = df.groupby("Attack Type")["Loss"].max()
highest_loss_in_each_attacks

Unnamed: 0_level_0,Loss
Attack Type,Unnamed: 1_level_1
DDoS,99.99
Malware,99.72
Man-in-the-Middle,99.71
Phishing,99.98
Ransomware,99.9
SQL Injection,99.99


In [18]:
highest_loss_per_country = df.groupby("Country")["Loss"].max()
highest_loss_per_country

Unnamed: 0_level_0,Loss
Country,Unnamed: 1_level_1
Australia,99.99
Brazil,99.9
China,99.99
France,99.78
Germany,99.98
India,99.72
Japan,99.83
Russia,99.53
UK,99.45
USA,99.88


In [19]:
highest_loss_per_industry = df.groupby("Target Industry")["Loss"].max()
highest_loss_per_industry

Unnamed: 0_level_0,Loss
Target Industry,Unnamed: 1_level_1
Banking,99.99
Education,99.83
Government,99.72
Healthcare,99.97
IT,99.99
Retail,99.78
Telecommunications,99.98


## Comparisons

In these analyses we are showcasing different comparisons:

* The increase/decrease of each attack type (Ransomware, phishing, etc..) in the USA between 2023 and 2024
* Ratio of attacks in US to attacks in UK
* Ratio of attacks in US to attacks in India
* Ratio of attacks in UK to India
* Ratio of attacks committed by hacker groups to attacks committed by nations


❗❗ We can replace any of the years and countries with the value we desire

❗❗ Increase and decrease are calculated using this formula:

((Number of attacks in year y - Number of attacks in year x)/Number of attacks in year x)*100

In [38]:
attack_types = df[df['Country'] == 'USA']['Attack Type'].unique()

for attack in attack_types:
    usa_2023 = df[(df["Country"] == "USA") & (df["Attack Type"] == attack) & (df["Year"] == 2023)].shape[0]
    usa_2024 = df[(df["Country"] == "USA") & (df["Attack Type"] == attack) & (df["Year"] == 2024)].shape[0]

    if usa_2023 == 0:
        print(f'For {attack}, no data in 2023 to calculate change.')
        continue

    difference = ((usa_2024 - usa_2023) / usa_2023) * 100

    if difference < 0:
        print(f'{attack} attacks in USA in 2023 decreased by {abs(difference):.2f}% in 2024')
    elif difference > 0:
        print(f'{attack} attacks in USA in 2023 increased by {abs(difference):.2f}% in 2024')
    else:
        print(f'{attack} attacks in USA in 2023 have not increased nor decreased in 2024')


DDoS attacks in USA in 2023 decreased by 71.43% in 2024
Ransomware attacks in USA in 2023 decreased by 60.00% in 2024
Malware attacks in USA in 2023 decreased by 25.00% in 2024
SQL Injection attacks in USA in 2023 decreased by 40.00% in 2024
Phishing attacks in USA in 2023 increased by 250.00% in 2024
Man-in-the-Middle attacks in USA in 2023 have not increased nor decreased in 2024


In [55]:
attacks_USA = df[df["Country"] == "USA"].shape[0]
attacks_UK = df[df["Country"] == "UK"].shape[0]
attacks_India = df[df["Country"] == "India"].shape[0]

ratio_attacks_us_uk = attacks_USA / attacks_UK
ratio_attacks_us_india = attacks_USA / attacks_India
ratio_attacks_uk_india = attacks_UK / attacks_India

print(f'The ratio of attacks in USA to UK is: {ratio_attacks_us_uk:.2f}')
print(f'The ratio of attacks in USA to India is: {ratio_attacks_us_india:.2f}')
print(f'The ratio of attacks in UK to India is: {ratio_attacks_us_india:.2f}')

The ratio of attacks in USA to UK is: 0.89
The ratio of attacks in USA to India is: 0.93
The ratio of attacks in UK to India is: 0.93


In [60]:
attacks_by_groups = df[df["Attack Source"] == "Hacker Group"].shape[0]
attacks_by_nations = df[df["Attack Source"] == "Nation-state"].shape[0]

ratio_groups_nation = attacks_by_groups/attacks_by_nations
print(f'The ratio of attacks done by hacker groups to attacks done by nations is {ratio_groups_nation:.2f}')

The ratio of attacks done by hacker groups to attacks done by nations is 0.86


## Impacts on government

This analysis is showcasing how much attacks are commmitted in each country against the government. Especially committed by hacker groups.

# Importance of this analysis:
1. Assess National Cybersecurity Threat Levels:
Tracking how many attacks target government institutions in each country helps measure the cyber threat level faced by that nation’s critical infrastructure and public services. This insight can inform governments about their vulnerabilities and the scale of threats they must defend against.

2. Raise Public Awareness and Transparency:
Publicly reporting and analyzing such attacks improves transparency and helps citizens understand the cyber risks their governments face, fostering informed public discussions and supporting cybersecurity awareness initiatives.

3. Allocate Resources and Improve Defenses:
Governments can use this data to prioritize their cyber defense budgets and resources—focusing more attention and investment in countries or sectors experiencing higher rates of attacks, enabling more effective and timely countermeasures.

In [68]:
countries = df['Country'].unique()

for country in countries:
  number_of_attacks_country = df[(df["Attack Source"] == "Hacker Group") & (df['Target Industry'] == "Government") & (df["Country"] == country)].shape[0]
  print(f'Governmental attacks done by hacker groups in {country}: {number_of_attacks_country}')

Governmental attacks done by hacker groups in China: 6
Governmental attacks done by hacker groups in India: 9
Governmental attacks done by hacker groups in UK: 12
Governmental attacks done by hacker groups in Germany: 11
Governmental attacks done by hacker groups in France: 11
Governmental attacks done by hacker groups in Australia: 9
Governmental attacks done by hacker groups in Russia: 15
Governmental attacks done by hacker groups in Brazil: 7
Governmental attacks done by hacker groups in Japan: 9
Governmental attacks done by hacker groups in USA: 12
