# Cybersecurity Events Analysis (2004-2024)

## Comparative Study of Cyber Events Datasets
- **Advisens Events**: 2004-2023 (Hackmageddon dataset)
- **Malware Attacks**: 2015-2024 (in billions)
- **Ransomware Attacks**: 2017-2023 (events)

**Objective**: Analyze trends, patterns, and correlations across different types of cyber events

## 1. Import Required Libraries

In [20]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
from pathlib import Path

# Style configuration
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (16, 8)
plt.rcParams['font.size'] = 10

print("‚úì Libraries successfully imported")
print("  - pandas, numpy, scipy")
print("  - matplotlib, seaborn for static visualizations")
print("  - plotly for interactive charts")
print("  - Ready for cyber events analysis!")

‚úì Libraries successfully imported
  - pandas, numpy, scipy
  - matplotlib, seaborn for static visualizations
  - plotly for interactive charts
  - Ready for cyber events analysis!


## 2. Load and Explore Advisens Events Data

In [21]:
# File paths
advisens_path = Path('../../data/cyber_events/events_Advisen.csv')
malware_path = Path('../../data/cyber_events/malwares_exploring_topics.csv')
ransomware_path = Path('../../data/cyber_events/randsomware_exploring_topics.csv')

# Load Advisens events data
df_advisens = pd.read_csv(advisens_path)

print("="*70)
print("ADVISENS INCIDENTS DATA (Hackmageddon)")
print("="*70)
print(f"\n‚úì File loaded: {advisens_path}")
print(f"\nDataFrame structure:")
print(f"  Shape: {df_advisens.shape}")
print(f"  Columns: {df_advisens.columns.tolist()}")
print(f"  Data types:\n{df_advisens.dtypes}")

print(f"\nFirst and last 3 rows:")
print(pd.concat([df_advisens.head(3), df_advisens.tail(3)]))

print(f"\nSummary Statistics:")
print(df_advisens.describe())

print(f"\nYear range: {df_advisens['Date'].min():.0f} - {df_advisens['Date'].max():.0f}")
print(f"Total events: {df_advisens['nb_cyber_event'].sum():,.0f}")

ADVISENS INCIDENTS DATA (Hackmageddon)

‚úì File loaded: ..\..\data\cyber_events\events_Advisen.csv

DataFrame structure:
  Shape: (20, 2)
  Columns: ['Date', 'nb_cyber_event']
  Data types:
Date                int64
nb_cyber_event    float64
dtype: object

First and last 3 rows:
    Date  nb_cyber_event
0   2004      232.258064
1   2005      425.806452
2   2006     1161.290323
17  2021    11535.483870
18  2022    10877.419350
19  2023    11303.225810

Summary Statistics:
             Date  nb_cyber_event
count    20.00000       20.000000
mean   2013.50000     6392.903226
std       5.91608     4038.720472
min    2004.00000      232.258064
25%    2008.75000     2729.032258
50%    2013.50000     7741.935484
75%    2018.25000     9425.806452
max    2023.00000    12000.000000

Year range: 2004 - 2023
Total events: 127,858


## 3. Load and Explore Malware Attacks Data

In [22]:
# Load malware data
df_malware = pd.read_csv(malware_path)
df_malware['date'] = df_malware['date'].astype(int)

print("\n" + "="*70)
print("MALWARE ATTACKS DATA")
print("="*70)
print(f"\n‚úì File loaded: {malware_path}")
print(f"\nDataFrame structure:")
print(f"  Shape: {df_malware.shape}")
print(f"  Columns: {df_malware.columns.tolist()}")
print(f"  Data types:\n{df_malware.dtypes}")

print(f"\nData:")
print(df_malware)

print(f"\nSummary Statistics:")
print(df_malware['malware_attacks_billions'].describe())

print(f"\nYear range: {df_malware['date'].min():.0f} - {df_malware['date'].max():.0f}")
print(f"Total malware attacks (billions): {df_malware['malware_attacks_billions'].sum():.2f}")
print(f"Average per year: {df_malware['malware_attacks_billions'].mean():.2f} billion")


MALWARE ATTACKS DATA

‚úì File loaded: ..\..\data\cyber_events\malwares_exploring_topics.csv

DataFrame structure:
  Shape: (10, 2)
  Columns: ['date', 'malware_attacks_billions']
  Data types:
date                          int64
malware_attacks_billions    float64
dtype: object

Data:
   date  malware_attacks_billions
0  2015                      8.20
1  2016                      7.90
2  2017                      8.60
3  2018                     10.50
4  2019                      9.90
5  2020                      5.60
6  2021                      5.40
7  2022                      5.50
8  2023                      6.06
9  2024                      6.54

Summary Statistics:
count    10.000000
mean      7.420000
std       1.872942
min       5.400000
25%       5.715000
50%       7.220000
75%       8.500000
max      10.500000
Name: malware_attacks_billions, dtype: float64

Year range: 2015 - 2024
Total malware attacks (billions): 74.20
Average per year: 7.42 billion


## 4. Load and Explore Ransomware Attacks Data

In [23]:
# Load ransomware data
df_ransomware = pd.read_csv(ransomware_path)
df_ransomware['Year'] = df_ransomware['Year'].astype(int)

print("\n" + "="*70)
print("RANSOMWARE ATTACKS DATA")
print("="*70)
print(f"\n‚úì File loaded: {ransomware_path}")
print(f"\nDataFrame structure:")
print(f"  Shape: {df_ransomware.shape}")
print(f"  Columns: {df_ransomware.columns.tolist()}")
print(f"  Data types:\n{df_ransomware.dtypes}")

print(f"\nData:")
print(df_ransomware)

print(f"\nSummary Statistics:")
print(df_ransomware['Number of Ransomware Attacks (billion)'].describe())

print(f"\nYear range: {df_ransomware['Year'].min():.0f} - {df_ransomware['Year'].max():.0f}")
print(f"Total ransomware attacks: {df_ransomware['Number of Ransomware Attacks (billion)'].sum():,.2f}")
print(f"Average per year: {df_ransomware['Number of Ransomware Attacks (billion)'].mean():.2f}")


RANSOMWARE ATTACKS DATA

‚úì File loaded: ..\..\data\cyber_events\randsomware_exploring_topics.csv

DataFrame structure:
  Shape: (7, 2)
  Columns: ['Year', 'Number of Ransomware Attacks (billion)']
  Data types:
Year                                        int64
Number of Ransomware Attacks (billion)    float64
dtype: object

Data:
   Year  Number of Ransomware Attacks (billion)
0  2017                                  186.30
1  2018                                  206.40
2  2019                                  187.91
3  2020                                  304.64
4  2021                                  623.25
5  2022                                  493.33
6  2023                                  317.59

Summary Statistics:
count      7.000000
mean     331.345714
std      168.113518
min      186.300000
25%      197.155000
50%      304.640000
75%      405.460000
max      623.250000
Name: Number of Ransomware Attacks (billion), dtype: float64

Year range: 2017 - 2023
Total ransomwa

## 4.5. Load and Explore Statista Cyber Events Data

In [24]:
# Load statista data
statista_path = Path('../../data/cyber_events/statista_cyber_event.csv')

df_statista = pd.read_csv(statista_path)

print("\n" + "="*70)
print("STATISTA CYBER INCIDENTS DATA")
print("="*70)
print(f"\n‚úì File loaded: {statista_path}")
print(f"\nDataFrame structure:")
print(f"  Shape: {df_statista.shape}")
print(f"  Columns: {df_statista.columns.tolist()}")
print(f"  Data types:\n{df_statista.dtypes}")

print(f"\nData:")
print(df_statista)

print(f"\nYear range: {df_statista['Date'].min():.0f} - {df_statista['Date'].max():.0f}")
print(f"Total events (millions): {df_statista['nb_cyber_event_million'].sum():.2f}")
print(f"Average per year: {df_statista['nb_cyber_event_million'].mean():.2f} million")


STATISTA CYBER INCIDENTS DATA

‚úì File loaded: ..\..\data\cyber_events\statista_cyber_event.csv

DataFrame structure:
  Shape: (9, 2)
  Columns: ['Date', 'nb_cyber_event_million']
  Data types:
Date                        int64
nb_cyber_event_million    float64
dtype: object

Data:
   Date  nb_cyber_event_million
0  2016                    4.32
1  2017                    5.75
2  2018                    7.06
3  2019                   10.49
4  2020                   18.07
5  2021                   19.23
6  2022                   16.80
7  2023                   16.73
8  2024                   15.16

Year range: 2016 - 2024
Total events (millions): 113.61
Average per year: 12.62 million


## 5. Data Cleaning and Preprocessing

In [25]:
print("="*70)
print("DATA CLEANING AND PREPROCESSING")
print("="*70)

# Check for missing values
print("\nüìã Missing values check:")
print(f"  Advisens: {df_advisens.isnull().sum().sum()} missing values")
print(f"  Malware: {df_malware.isnull().sum().sum()} missing values")
print(f"  Ransomware: {df_ransomware.isnull().sum().sum()} missing values")

# Check for duplicates
print(f"\nüîç Duplicates check:")
print(f"  Advisens: {df_advisens.duplicated().sum()} duplicate rows")
print(f"  Malware: {df_malware.duplicated().sum()} duplicate rows")
print(f"  Ransomware: {df_ransomware.duplicated().sum()} duplicate rows")

# Standardize column names
df_advisens_clean = df_advisens.copy()
df_advisens_clean.columns = ['Year', 'Events']

df_malware_clean = df_malware.copy()
df_malware_clean.columns = ['Year', 'Malware_Attacks']

df_ransomware_clean = df_ransomware.copy()
df_ransomware_clean.columns = ['Year', 'Ransomware_Attacks']

df_statista_clean = df_statista.copy()
df_statista_clean.columns = ['Year', 'Statista_Events']

print(f"\n‚úì Column names standardized:")
print(f"  Advisens: {df_advisens_clean.columns.tolist()}")
print(f"  Malware: {df_malware_clean.columns.tolist()}")
print(f"  Ransomware: {df_ransomware_clean.columns.tolist()}")
print(f"  Statista: {df_statista_clean.columns.tolist()}")

# Data types
df_advisens_clean['Year'] = df_advisens_clean['Year'].astype(int)
df_malware_clean['Year'] = df_malware_clean['Year'].astype(int)
df_ransomware_clean['Year'] = df_ransomware_clean['Year'].astype(int)
df_statista_clean['Year'] = df_statista_clean['Year'].astype(int)

print(f"\n‚úì Data types standardized to integers")
print(f"  Advisens - Year dtype: {df_advisens_clean['Year'].dtype}")
print(f"  Malware - Year dtype: {df_malware_clean['Year'].dtype}")
print(f"  Ransomware - Year dtype: {df_ransomware_clean['Year'].dtype}")
print(f"  Statista - Year dtype: {df_statista_clean['Year'].dtype}")

print(f"\n‚úÖ Data preprocessing completed!")

DATA CLEANING AND PREPROCESSING

üìã Missing values check:
  Advisens: 0 missing values
  Malware: 0 missing values
  Ransomware: 0 missing values

üîç Duplicates check:
  Advisens: 0 duplicate rows
  Malware: 0 duplicate rows
  Ransomware: 0 duplicate rows

‚úì Column names standardized:
  Advisens: ['Year', 'Events']
  Malware: ['Year', 'Malware_Attacks']
  Ransomware: ['Year', 'Ransomware_Attacks']
  Statista: ['Year', 'Statista_Events']

‚úì Data types standardized to integers
  Advisens - Year dtype: int64
  Malware - Year dtype: int64
  Ransomware - Year dtype: int64
  Statista - Year dtype: int64

‚úÖ Data preprocessing completed!


In [26]:
# Check for missing values
print("\nüìã Missing values check:")
print(f"  Advisens: {df_advisens.isnull().sum().sum()} missing values")
print(f"  Malware: {df_malware.isnull().sum().sum()} missing values")
print(f"  Ransomware: {df_ransomware.isnull().sum().sum()} missing values")
print(f"  Statista: {df_statista.isnull().sum().sum()} missing values")

# Check for duplicates
print(f"\nüîç Duplicates check:")
print(f"  Advisens: {df_advisens.duplicated().sum()} duplicate rows")
print(f"  Malware: {df_malware.duplicated().sum()} duplicate rows")
print(f"  Ransomware: {df_ransomware.duplicated().sum()} duplicate rows")
print(f"  Statista: {df_statista.duplicated().sum()} duplicate rows")


üìã Missing values check:
  Advisens: 0 missing values
  Malware: 0 missing values
  Ransomware: 0 missing values
  Statista: 0 missing values

üîç Duplicates check:
  Advisens: 0 duplicate rows
  Malware: 0 duplicate rows
  Ransomware: 0 duplicate rows
  Statista: 0 duplicate rows


## 6. Descriptive Statistics and Analysis

In [27]:
print("\n" + "="*70)
print("DESCRIPTIVE STATISTICS - KEY PATTERNS")
print("="*70)

# Advisens analysis
print("\nüìä ADVISENS INCIDENTS ANALYSIS (2004-2023):")
print(f"  Total events: {df_advisens_clean['Events'].sum():,.0f}")
print(f"  Average per year: {df_advisens_clean['Events'].mean():.0f}")
print(f"  Min year: {df_advisens_clean[df_advisens_clean['Events'] == df_advisens_clean['Events'].min()]['Year'].values[0]:.0f} ({df_advisens_clean['Events'].min():.0f} events)")
print(f"  Max year: {df_advisens_clean[df_advisens_clean['Events'] == df_advisens_clean['Events'].max()]['Year'].values[0]:.0f} ({df_advisens_clean['Events'].max():.0f} events)")
print(f"  Growth (2004‚Üí2023): {((df_advisens_clean.iloc[-1]['Events'] - df_advisens_clean.iloc[0]['Events']) / df_advisens_clean.iloc[0]['Events'] * 100):.1f}%")
print(f"  Standard deviation: {df_advisens_clean['Events'].std():.0f}")

# Malware analysis
print(f"\nüìä MALWARE ATTACKS ANALYSIS (2015-2024):")
print(f"  Total attacks: {df_malware_clean['Malware_Attacks'].sum():.2f} billion")
print(f"  Average per year: {df_malware_clean['Malware_Attacks'].mean():.2f} billion")
print(f"  Min year: {df_malware_clean[df_malware_clean['Malware_Attacks'] == df_malware_clean['Malware_Attacks'].min()]['Year'].values[0]:.0f} ({df_malware_clean['Malware_Attacks'].min():.2f}B)")
print(f"  Max year: {df_malware_clean[df_malware_clean['Malware_Attacks'] == df_malware_clean['Malware_Attacks'].max()]['Year'].values[0]:.0f} ({df_malware_clean['Malware_Attacks'].max():.2f}B)")
print(f"  Growth (2015‚Üí2024): {((df_malware_clean.iloc[-1]['Malware_Attacks'] - df_malware_clean.iloc[0]['Malware_Attacks']) / df_malware_clean.iloc[0]['Malware_Attacks'] * 100):.1f}%")
print(f"  Standard deviation: {df_malware_clean['Malware_Attacks'].std():.2f}")

# Ransomware analysis
print(f"\nüìä RANSOMWARE ATTACKS ANALYSIS (2017-2023):")
print(f"  Total attacks: {df_ransomware_clean['Ransomware_Attacks'].sum():,.2f} billion")
print(f"  Average per year: {df_ransomware_clean['Ransomware_Attacks'].mean():.2f} billion")
print(f"  Min year: {df_ransomware_clean[df_ransomware_clean['Ransomware_Attacks'] == df_ransomware_clean['Ransomware_Attacks'].min()]['Year'].values[0]:.0f} ({df_ransomware_clean['Ransomware_Attacks'].min():.2f})")
print(f"  Max year: {df_ransomware_clean[df_ransomware_clean['Ransomware_Attacks'] == df_ransomware_clean['Ransomware_Attacks'].max()]['Year'].values[0]:.0f} ({df_ransomware_clean['Ransomware_Attacks'].max():.2f})")
print(f"  Growth (2017‚Üí2023): {((df_ransomware_clean.iloc[-1]['Ransomware_Attacks'] - df_ransomware_clean.iloc[0]['Ransomware_Attacks']) / df_ransomware_clean.iloc[0]['Ransomware_Attacks'] * 100):.1f}%")
print(f"  Standard deviation: {df_ransomware_clean['Ransomware_Attacks'].std():.2f}")

# Common year analysis
common_malware_ransomware = set(df_malware_clean['Year']) & set(df_ransomware_clean['Year'])
print(f"\nüîÑ COMMON YEAR PERIODS:")
print(f"  Advisens & Malware: {sorted(set(df_advisens_clean['Year']) & set(df_malware_clean['Year']))}")
print(f"  Advisens & Ransomware: {sorted(set(df_advisens_clean['Year']) & set(df_ransomware_clean['Year']))}")
print(f"  Malware & Ransomware: {sorted(common_malware_ransomware)}")


DESCRIPTIVE STATISTICS - KEY PATTERNS

üìä ADVISENS INCIDENTS ANALYSIS (2004-2023):
  Total events: 127,858
  Average per year: 6393
  Min year: 2004 (232 events)
  Max year: 2020 (12000 events)
  Growth (2004‚Üí2023): 4766.7%
  Standard deviation: 4039

üìä MALWARE ATTACKS ANALYSIS (2015-2024):
  Total attacks: 74.20 billion
  Average per year: 7.42 billion
  Min year: 2021 (5.40B)
  Max year: 2018 (10.50B)
  Growth (2015‚Üí2024): -20.2%
  Standard deviation: 1.87

üìä RANSOMWARE ATTACKS ANALYSIS (2017-2023):
  Total attacks: 2,319.42 billion
  Average per year: 331.35 billion
  Min year: 2017 (186.30)
  Max year: 2021 (623.25)
  Growth (2017‚Üí2023): 70.5%
  Standard deviation: 168.11

üîÑ COMMON YEAR PERIODS:
  Advisens & Malware: [2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023]
  Advisens & Ransomware: [2017, 2018, 2019, 2020, 2021, 2022, 2023]
  Malware & Ransomware: [2017, 2018, 2019, 2020, 2021, 2022, 2023]


## 7. Visualize Event Trends Over Time

In [28]:
# Create comprehensive visualization with Plotly
fig = go.Figure()

# Add Advisens events trace (primary axis)
fig.add_trace(go.Scatter(
    x=df_advisens_clean['Year'],
    y=df_advisens_clean['Events'],
    mode='lines+markers',
    name="Advisen's Events",
    line=dict(color='#1f77b4', width=3),
    marker=dict(size=8, color='#1f77b4', line=dict(color='white', width=1)),
    yaxis='y1',
    hovertemplate='<b>Year: %{x}</b><br>Events: %{y:,.0f}<extra></extra>'
))

# Add Statista events trace (secondary axis)
fig.add_trace(go.Scatter(
    x=df_statista_clean['Year'],
    y=df_statista_clean['Statista_Events'],
    mode='lines+markers',
    name="Statista Cyber Events",
    line=dict(color='#2ca02c', width=3),
    marker=dict(size=8, color='#2ca02c', line=dict(color='white', width=1)),
    yaxis='y2',
    hovertemplate='<b>Year: %{x}</b><br>Events: %{y:.2f}M<extra></extra>'
))

# Create secondary y-axes
fig.update_layout(
    title={
        'text': '<b>Cybersecurity Events Trends (2004-2023)</b>',
        'x': 0.5,
        'xanchor': 'center',
        'font': {'size': 20, 'color': '#222222'}
    },
    xaxis=dict(
        title=dict(text='<b>Year</b>', font=dict(size=14)),
        showgrid=True,
        gridwidth=1,
        gridcolor='#e8e8e8',
        dtick=1,
        tickangle=-45,
        range=[2003.5, 2023.5]
    ),
    yaxis=dict(
        title=dict(text='<b>Advisen\'s Events</b>', font=dict(size=12, color='#1f77b4')),
        tickfont=dict(color='#1f77b4')
    ),
    yaxis2=dict(
        title=dict(text='<b>Statista Events (Millions)</b>', font=dict(size=12, color='#2ca02c')),
        tickfont=dict(color='#2ca02c'),
        overlaying='y',
        side='right'
    ),
    plot_bgcolor='#ffffff',
    paper_bgcolor='#ffffff',
    hovermode='x unified',
    width=1100,
    height=700,
    margin=dict(l=100, r=100, t=150, b=150)
)

fig.update_xaxes(showline=True, linewidth=1, linecolor='#cccccc')
fig.update_yaxes(showline=True, linewidth=1, linecolor='#cccccc')

# Create outputs directory if it doesn't exist
output_dir = Path('./outputs')
output_dir.mkdir(parents=True, exist_ok=True)

# Save HTML and PNG versions
fig.write_html(output_dir / 'cyber_events_trends.html')
fig.write_image(output_dir / 'cyber_events_trends.png', width=1100, height=700)
print("‚úì Chart saved: ./outputs/cyber_events_trends.html")
print("‚úì Chart saved: ./outputs/cyber_events_trends.png")

fig.show()

‚úì Chart saved: ./outputs/cyber_events_trends.html
‚úì Chart saved: ./outputs/cyber_events_trends.png


Between 2016 and 2018, the trends are not at all the same, which highlights the fact that it is difficult to find consistent data on this subject.

In [29]:
# Visualize Malware Attacks Trends
fig_malware = go.Figure()

# Add Malware attacks trace
fig_malware.add_trace(go.Scatter(
    x=df_malware_clean['Year'],
    y=df_malware_clean['Malware_Attacks'],
    mode='lines+markers',
    name='Malware Attacks',
    line=dict(color='#ff7f0e', width=3),
    marker=dict(size=8, color='#ff7f0e', line=dict(color='white', width=1)),
    yaxis='y1',
    hovertemplate='<b>Year: %{x}</b><br>Attacks: %{y:.2f}B<extra></extra>'
))

# Update layout
fig_malware.update_layout(
    title={
        'text': '<b>Malware Attacks Trends (2015-2024)</b>',
        'x': 0.5,
        'xanchor': 'center',
        'font': {'size': 20, 'color': '#222222'}
    },
    xaxis=dict(
        title=dict(text='<b>Year</b>', font=dict(size=14)),
        showgrid=True,
        gridwidth=1,
        gridcolor='#e8e8e8',
        dtick=1,
        tickangle=-45,
        range=[2014.5, 2024.5]
    ),
    yaxis=dict(
        title=dict(text='<b>Malware Attacks (Billions)</b>', font=dict(size=12, color='#ff7f0e')),
        tickfont=dict(color='#ff7f0e')
    ),
    plot_bgcolor='#ffffff',
    paper_bgcolor='#ffffff',
    hovermode='x unified',
    width=1100,
    height=700,
    margin=dict(l=100, r=100, t=150, b=150)
)

fig_malware.update_xaxes(showline=True, linewidth=1, linecolor='#cccccc')
fig_malware.update_yaxes(showline=True, linewidth=1, linecolor='#cccccc')

# Create outputs directory if it doesn't exist
output_dir = Path('./outputs')
output_dir.mkdir(parents=True, exist_ok=True)

# Save HTML and PNG versions
fig_malware.write_html(output_dir / 'cyber_events_trends_malware.html')
fig_malware.write_image(output_dir / 'cyber_events_trends_malware.png', width=1100, height=700)
print("‚úì Chart saved: ./outputs/cyber_events_trends_malware.html")
print("‚úì Chart saved: ./outputs/cyber_events_trends_malware.png")

fig_malware.show()


‚úì Chart saved: ./outputs/cyber_events_trends_malware.html
‚úì Chart saved: ./outputs/cyber_events_trends_malware.png


In [30]:
# Visualize Ransomware Attacks Trends
fig_ransomware = go.Figure()

# Add Ransomware attacks trace
fig_ransomware.add_trace(go.Scatter(
    x=df_ransomware_clean['Year'],
    y=df_ransomware_clean['Ransomware_Attacks'],
    mode='lines+markers',
    name='Ransomware Attacks',
    line=dict(color='#d62728', width=3),
    marker=dict(size=8, color='#d62728', line=dict(color='white', width=1)),
    yaxis='y1',
    hovertemplate='<b>Year: %{x}</b><br>Attacks: %{y:.2f}<extra></extra>'
))

# Update layout
fig_ransomware.update_layout(
    title={
        'text': '<b>Ransomware Attacks Trends (2017-2023)</b>',
        'x': 0.5,
        'xanchor': 'center',
        'font': {'size': 20, 'color': '#222222'}
    },
    xaxis=dict(
        title=dict(text='<b>Year</b>', font=dict(size=14)),
        showgrid=True,
        gridwidth=1,
        gridcolor='#e8e8e8',
        dtick=1,
        tickangle=-45,
        range=[2016.5, 2023.5]
    ),
    yaxis=dict(
        title=dict(text='<b>Ransomware Attacks (billions)</b>', font=dict(size=12, color='#d62728')),
        tickfont=dict(color='#d62728')
    ),
    plot_bgcolor='#ffffff',
    paper_bgcolor='#ffffff',
    hovermode='x unified',
    width=1100,
    height=700,
    margin=dict(l=100, r=100, t=150, b=150)
)

fig_ransomware.update_xaxes(showline=True, linewidth=1, linecolor='#cccccc')
fig_ransomware.update_yaxes(showline=True, linewidth=1, linecolor='#cccccc')

# Create outputs directory if it doesn't exist
output_dir = Path('./outputs')
output_dir.mkdir(parents=True, exist_ok=True)

# Save HTML and PNG versions
fig_ransomware.write_html(output_dir / 'cyber_events_trends_ransomware.html')
fig_ransomware.write_image(output_dir / 'cyber_events_trends_ransomware.png', width=1100, height=700)
print("‚úì Chart saved: ./outputs/cyber_events_trends_ransomware.html")
print("‚úì Chart saved: ./outputs/cyber_events_trends_ransomware.png")

fig_ransomware.show()


‚úì Chart saved: ./outputs/cyber_events_trends_ransomware.html
‚úì Chart saved: ./outputs/cyber_events_trends_ransomware.png


In [31]:
# Load and visualize Statista Cyber Event data
statista_path = Path('../../data/cyber_events/statista_cyber_event.csv')

df_statista = pd.read_csv(statista_path)

print("\n" + "="*70)
print("STATISTA CYBER INCIDENTS DATA")
print("="*70)
print(f"\n‚úì File loaded: {statista_path}")
print(f"\nDataFrame structure:")
print(f"  Shape: {df_statista.shape}")
print(f"  Columns: {df_statista.columns.tolist()}")
print(f"  Data types:\n{df_statista.dtypes}")

print(f"\nData:")
print(df_statista)

print(f"\nYear range: {df_statista['Date'].min():.0f} - {df_statista['Date'].max():.0f}")
print(f"Total events (millions): {df_statista['nb_cyber_event_million'].sum():.2f}")
print(f"Average per year: {df_statista['nb_cyber_event_million'].mean():.2f} million")

# Visualize Statista Cyber Events Trends
fig_statista = go.Figure()

# Add Statista events trace
fig_statista.add_trace(go.Scatter(
    x=df_statista['Date'],
    y=df_statista['nb_cyber_event_million'],
    mode='lines+markers',
    name='Statista Cyber Events',
    line=dict(color='#2ca02c', width=3),
    marker=dict(size=8, color='#2ca02c', line=dict(color='white', width=1)),
    yaxis='y1',
    hovertemplate='<b>Year: %{x:.0f}</b><br>Events: %{y:.2f}M<extra></extra>'
))

# Update layout
fig_statista.update_layout(
    title={
        'text': '<b>Statista Cyber Events Trends (2016-2024)</b>',
        'x': 0.5,
        'xanchor': 'center',
        'font': {'size': 20, 'color': '#222222'}
    },
    xaxis=dict(
        title=dict(text='<b>Year</b>', font=dict(size=14)),
        showgrid=True,
        gridwidth=1,
        gridcolor='#e8e8e8',
        dtick=1,
        tickangle=-45,
        range=[2015.5, 2024.5]
    ),
    yaxis=dict(
        title=dict(text='<b>Cyber Events (Millions)</b>', font=dict(size=12, color='#2ca02c')),
        tickfont=dict(color='#2ca02c')
    ),
    plot_bgcolor='#ffffff',
    paper_bgcolor='#ffffff',
    hovermode='x unified',
    width=1100,
    height=700,
    margin=dict(l=100, r=100, t=150, b=150)
)

fig_statista.update_xaxes(showline=True, linewidth=1, linecolor='#cccccc')
fig_statista.update_yaxes(showline=True, linewidth=1, linecolor='#cccccc')

# Create outputs directory if it doesn't exist
output_dir = Path('./outputs')
output_dir.mkdir(parents=True, exist_ok=True)

# Save HTML and PNG versions
fig_statista.write_html(output_dir / 'cyber_events_trends_statista.html')
fig_statista.write_image(output_dir / 'cyber_events_trends_statista.png', width=1100, height=700)
print("‚úì Chart saved: ./outputs/cyber_events_trends_statista.html")
print("‚úì Chart saved: ./outputs/cyber_events_trends_statista.png")

fig_statista.show()


STATISTA CYBER INCIDENTS DATA

‚úì File loaded: ..\..\data\cyber_events\statista_cyber_event.csv

DataFrame structure:
  Shape: (9, 2)
  Columns: ['Date', 'nb_cyber_event_million']
  Data types:
Date                        int64
nb_cyber_event_million    float64
dtype: object

Data:
   Date  nb_cyber_event_million
0  2016                    4.32
1  2017                    5.75
2  2018                    7.06
3  2019                   10.49
4  2020                   18.07
5  2021                   19.23
6  2022                   16.80
7  2023                   16.73
8  2024                   15.16

Year range: 2016 - 2024
Total events (millions): 113.61
Average per year: 12.62 million
‚úì Chart saved: ./outputs/cyber_events_trends_statista.html
‚úì Chart saved: ./outputs/cyber_events_trends_statista.png


In [32]:
# Create comprehensive multi-dataset comparison visualization
print("\n" + "="*70)
print("MULTI-DATASET COMPARISON VISUALIZATION")
print("="*70)

fig_comparison = make_subplots(
    rows=2, cols=2,
    subplot_titles=("Advisens Events (2004-2023)", 
                    "Malware Attacks (2015-2024)",
                    "Ransomware Attacks (2017-2023)",
                    "Statista Cyber Events (2016-2024)"),
    specs=[[{}, {}], [{}, {}]],
    vertical_spacing=0.15,
    horizontal_spacing=0.12
)

# 1. Advisens Events
fig_comparison.add_trace(
    go.Scatter(
        x=df_advisens_clean['Year'],
        y=df_advisens_clean['Events'],
        mode='lines+markers',
        name="Advisens",
        line=dict(color='#1f77b4', width=2.5),
        marker=dict(size=6),
        hovertemplate='<b>%{x}</b><br>Events: %{y:,.0f}<extra></extra>'
    ),
    row=1, col=1
)

# 2. Malware Attacks
fig_comparison.add_trace(
    go.Scatter(
        x=df_malware_clean['Year'],
        y=df_malware_clean['Malware_Attacks'],
        mode='lines+markers',
        name="Malware",
        line=dict(color='#ff7f0e', width=2.5),
        marker=dict(size=6),
        hovertemplate='<b>%{x}</b><br>Attacks: %{y:.2f}B<extra></extra>'
    ),
    row=1, col=2
)

# 3. Ransomware Attacks
fig_comparison.add_trace(
    go.Scatter(
        x=df_ransomware_clean['Year'],
        y=df_ransomware_clean['Ransomware_Attacks'],
        mode='lines+markers',
        name="Ransomware",
        line=dict(color='#d62728', width=2.5),
        marker=dict(size=6),
        hovertemplate='<b>%{x}</b><br>Attacks: %{y:.2f}B<extra></extra>'
    ),
    row=2, col=1
)

# 4. Statista Cyber Events
fig_comparison.add_trace(
    go.Scatter(
        x=df_statista['Date'],
        y=df_statista['nb_cyber_event_million'],
        mode='lines+markers',
        name="Statista",
        line=dict(color='#2ca02c', width=2.5),
        marker=dict(size=6),
        hovertemplate='<b>%{x:.0f}</b><br>Events: %{y:.2f}M<extra></extra>'
    ),
    row=2, col=2
)

# Update axes labels
fig_comparison.update_xaxes(title_text="Year", row=1, col=1)
fig_comparison.update_xaxes(title_text="Year", row=1, col=2)
fig_comparison.update_xaxes(title_text="Year", row=2, col=1)
fig_comparison.update_xaxes(title_text="Year", row=2, col=2)

fig_comparison.update_yaxes(title_text="Events", row=1, col=1)
fig_comparison.update_yaxes(title_text="Attacks (Billions)", row=1, col=2)
fig_comparison.update_yaxes(title_text="Attacks (Billions)", row=2, col=1)
fig_comparison.update_yaxes(title_text="Events (Millions)", row=2, col=2)

# Update layout
fig_comparison.update_layout(
    title_text="<b>Cyber Events Multi-Dataset Comparison</b>",
    height=1000,
    width=1400,
    showlegend=False,
    plot_bgcolor='#ffffff',
    paper_bgcolor='#ffffff',
    margin=dict(l=80, r=80, t=120, b=80)
)

# Save comparison chart
fig_comparison.write_html(output_dir / 'cyber_events_comparison_all_datasets.html')
fig_comparison.write_image(output_dir / 'cyber_events_comparison_all_datasets.png', width=1400, height=1000)
print("‚úì Chart saved: ./outputs/cyber_events_comparison_all_datasets.html")
print("‚úì Chart saved: ./outputs/cyber_events_comparison_all_datasets.png")

fig_comparison.show()


MULTI-DATASET COMPARISON VISUALIZATION
‚úì Chart saved: ./outputs/cyber_events_comparison_all_datasets.html
‚úì Chart saved: ./outputs/cyber_events_comparison_all_datasets.png
