# Visualizing Cybersecurity Incidences
**Goal**: transform numbers into impactful visuals.

**Uses**:
* Plotly's Dash (for creating local dashboards)
* KaggleHub (for data)

**More about Dash**:
* [Dash App Examples](https://plotly.com/examples/)
* [User Guides](https://dash.plotly.com/minimal-app)
* [More about Jupyter Support for Dash](https://github.com/plotly/jupyter-dash?tab=readme-ov-file)
* [Dash Bootstrap Themes](https://hellodash.pythonanywhere.com/adding-themes/color-modes)

Note: as of Dash v2.11, Jupyter support is built into the main Dash package.

**The data set**
https://huggingface.co/datasets/vinitvek/cybersecurityattacks

## Environment Setup

In [1]:
# Installations
%pip install --q pandas dash "plotly[express]" ipywidgets nbformat dash-bootstrap-components fsspec huggingface_hub dash-bootstrap-templates

Note: you may need to restart the kernel to use updated packages.


In [2]:
# Libraries
from dash import Dash, html, dcc, callback, Output, Input, dash_table # We import the dcc module (DCC stands for Dash Core Components). This module includes a Graph component called dcc.Graph, which is used to render interactive graphs.
import plotly.express as px # We also import the plotly.express library to build the interactive graphs.
import pandas as pd
import numpy as np
import dash_bootstrap_components as dbc
import plotly.io as pio
from dash_bootstrap_templates import load_figure_template
from itertools import cycle
from plotly.express.colors import qualitative

In [3]:
# Download data set from HuggingFace using Pandas
df = pd.read_csv("hf://datasets/vinitvek/cybersecurityattacks/collab dataset.csv")

## Brief Data Exploration, Understanding

In [4]:
df.head(n=5)

Unnamed: 0,slug,event_date,event_year,affected_country,affected_organization,affected_industry,afftected_industry_code,event_type,event_subtype,motive,description,actor,actor_type,actor_country,source_url
0,babb843cbce5db9e,2023-12-31 00:00:00,2023,United Kingdom of Great Britain and Northern I...,Radioactive Waste Management,Administrative and Support and Waste Managemen...,56,Undetermined,Undetermined,Undetermined,Threat actors try to break into Radioactive Wa...,Undetermined,Criminal,Undetermined,https://www.theguardian.com/business/2023/dec/...
1,581e011d5c37c281,2023-12-31 00:00:00,2023,Belarus,BelTA,Information,51,Disruptive,Undetermined,Protest,Belarusian hacktivists from the Belarusian Cyb...,Belarusian Cyber-Partisans,Hacktivist,Belarus,https://www.bankinfosecurity.com/hacktivists-s...
2,fa79c150aac3cf77,2023-12-30 00:00:00,2023,United States of America,Xerox Business Solutions,Administrative and Support and Waste Managemen...,56,Mixed,Exploitation of Application Server,Financial,The U.S. division of Xerox Business Solutions ...,INC Ransom,Criminal,Undetermined,https://www.bleepingcomputer.com/news/security...
3,4d12747a4dd52156,2023-12-30 00:00:00,2023,Iran (Islamic Republic of),SnappFood,Accommodation and Food Services,72,Mixed,Exploitation of Application Server,Financial,Irleaks claims to have broken into the systems...,Irleaks,Criminal,Undetermined,https://www.darkreading.com/cyberattacks-data-...
4,1079752e8fe90b4d,2023-12-29 00:00:00,2023,Canada,Memorial University of Newfoundland,Educational Services,61,Disruptive,Undetermined,Financial,Memorial University of Newfoundland (MUN) is h...,Undetermined,Criminal,Undetermined,https://www.bleepingcomputer.com/news/security...


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13407 entries, 0 to 13406
Data columns (total 15 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   slug                     13407 non-null  object
 1   event_date               13407 non-null  object
 2   event_year               13407 non-null  int64 
 3   affected_country         13407 non-null  object
 4   affected_organization    13407 non-null  object
 5   affected_industry        13407 non-null  object
 6   afftected_industry_code  13407 non-null  int64 
 7   event_type               13407 non-null  object
 8   event_subtype            13407 non-null  object
 9   motive                   13407 non-null  object
 10  description              13407 non-null  object
 11  actor                    13407 non-null  object
 12  actor_type               13407 non-null  object
 13  actor_country            13407 non-null  object
 14  source_url               13407 non-nul

In [6]:
df.describe()

Unnamed: 0,event_year,afftected_industry_code
count,13407.0,13407.0
mean,2019.703886,63.197434
std,2.803879,18.867469
min,2014.0,11.0
25%,2017.0,51.0
50%,2020.0,61.0
75%,2022.0,81.0
max,2023.0,99.0


In [7]:
df.nunique(axis=0)

slug                       13407
event_date                  3130
event_year                    10
affected_country             163
affected_organization      12252
affected_industry             22
afftected_industry_code       42
event_type                     4
event_subtype                 86
motive                        10
description                11693
actor                       1135
actor_type                     6
actor_country                 82
source_url                 10768
dtype: int64

## Some Data Cleaning

In [8]:
# remove duplicate rows
duplicate_rows_mask = df.duplicated()
df[duplicate_rows_mask]

Unnamed: 0,slug,event_date,event_year,affected_country,affected_organization,affected_industry,afftected_industry_code,event_type,event_subtype,motive,description,actor,actor_type,actor_country,source_url


In [9]:
# Geographic location corrected.
df.loc[df['affected_country'] == 'Antarctica', 'actor_country'] = 'Netherlands'
df.loc[df['affected_country'] == 'Antarctica', 'affected_country'] = 'Netherlands'

In [10]:
# Replace all non-ASCII characters with a space
df['description'] = df['description'].str.replace(r'[^\x00-\x7F]+', ' ', regex=True)

# Remove leftover single quote chars
df['description'] = df['description'].str.replace(r"(?:'\s*){2,}", '', regex=True)

## Dash App
(Local Dashboard Creation)

### Dashboard Prep.

In [11]:
# adds  templates to plotly.io
pio.templates.default = "plotly_dark"

# create colour mapping for each country in data set
unique_countries = list(set(df['affected_country'].unique().tolist() + df['actor_country'].unique().tolist()))
palette = cycle(qualitative.Alphabet)  # use a large qualitative palette
color_map = {country: color for country, color in zip(unique_countries, palette)}

# Remove time, keep only date
df['event_date'] = pd.to_datetime(df['event_date'])
df['date'] = df['event_date'].dt.date

# Sort by date (ascending)
df = df.sort_values(by='date')

# Reset index after sorting
df = df.reset_index(drop=True)

# get all available countries
years = df.event_year.unique().tolist()
available_countries = unique_countries
available_countries.sort()

In [12]:
# create map data
ac = df['actor_country'].value_counts().reset_index()
afc = df['affected_country'].value_counts().reset_index()

# Step 1: Find items in my_list that are not in df['item']
missing_items = list(set(afc.affected_country.tolist()) - set(df['actor_country']))

# Step 2: Create a new DataFrame with those missing items and count 0
new_rows = pd.DataFrame({'actor_country': missing_items, 'count': 0})

# Step 3: Append to original df
ac = pd.concat([ac, new_rows], ignore_index=True)

# sort alphabetically
afc = afc.sort_values(by='affected_country')
ac = ac.sort_values(by='actor_country')

# rename cols
afc = afc.rename(columns={'affected_country': 'country', 'count': 'number_incidences'})
ac = ac.rename(columns={'actor_country': 'country', 'count': 'number_attacks_launched'})

# merge both dfs
map_data = pd.merge(ac, afc, on='country')

# make choropleth map
choro_graph = px.choropleth(
    map_data,
    locations='country',     # Column with country codes
    locationmode='country names',
    color='country',
    projection='equirectangular',     # Map projection style
    color_discrete_map=color_map,
    hover_data={'number_incidences': True, 'number_attacks_launched':True}
)

choro_graph = choro_graph.update_layout(showlegend=False,  margin=dict(l=0, r=0, t=0, b=0), )


### Dashboard creation

In [None]:
app = Dash(__name__, external_stylesheets=[dbc.themes.CYBORG])

app.layout = html.Div([
    html.Div([

        ### Dashboard title ###
        html.Div([
            html.H1("Cybersecurity Events from Around the World", style={
                "color": "#FFFFFF",
                "fontSize": "54px",
                "fontFamily": "Arial",
                "textAlign": "center",
            }),
            html.P(
                dcc.Markdown("Data collected from 2014-2023 | [[source]](https://huggingface.co/datasets/vinitvek/cybersecurityattacks)"),
                style={"fontSize": "24px", 
                       "color": "#E1E1E1", 
                       "marginTop": "-10px", 
                       "fontFamily": "Calibri", 
                       "textAlign": "center",}),
        ]),

        html.Div([
                html.Div([
                    # Country dropdown
                    html.Div([
                        html.Label("Select a Country:", style={'padding-right':'8px',}),
                        dcc.Dropdown(
                            id='country-dropdown',
                            options=[{'label': country, 'value': country} for country in available_countries],
                            value=None,
                            style={'width': '256px',}
                        ), 
                    ], style={'padding-right':'16px', 'zIndex':9999}),
                    # Year dropdown
                    html.Div([
                        html.Label("Select a year:", style={'padding-right':'8px'}),
                        dcc.Dropdown(
                            id='year-dropdown',
                            options=[{'label': year, 'value': year} for year in years],
                            value=None,
                            style={'width': '128px', }
                        ),
                    ], style={'zIndex':9999}),
                ], style={'display': 'flex', 'justifyContent':'center', 'margin-bottom':'8px'}),
        ]),

        html.Div([
            html.P("World Map: Countries which experienced cybersecurity events (coloured)",
                style={
                    'color': 'white',
                    'fontSize': '12px',
                    'fontFamily': 'monospace',
                    'fontWeight':'bold',
                    'width':'1000px',
                    'position':'absolute',
                    'padding':'5px',
                    'backgroundColor':'black',
                    'border': '0.5px solid black',
                    'right':'45%',
                    'zIndex': 9}
            ),

            # Map Graph
            html.Div([
                dcc.Graph(id='world-map',
                        figure=choro_graph)
            ], style={'position':'relative', 'margin':'4px', 'padding-top':'30px', 'height':'500px', 'width':'1016px'}),

            # Data Table
            html.Div([
                dash_table.DataTable(
                    id='country-table',
                    columns=[{"name": col, "id": col} for col in ['affected_country', 'date', 'affected_organization', 'description']],
                    fixed_rows={'headers': True},
                    data=df.to_dict('records'),
                    page_action='none',
                    style_table={
                        'overflowY': 'auto',   # Vertical scroll
                        'overflowX':'auto'
                    },
                    style_data={
                        'backgroundColor': "#1A1A1A",  # 👈 row background
                        'color': 'white'              # text color
                    },
                    style_header={
                        'zIndex':'9999',
                        'backgroundColor': "#000000",    # 👈 header row bg
                        'color': 'white',
                        'fontWeight': 'bold'
                    },
                    style_cell={
                        'fontSize': '12px',
                        'minWidth':'196px',
                        'maxWidth':'196px',
                        'textAlign': 'left',
                        'whiteSpace': 'normal',  # Wrap text if needed
                        'border': '1px solid #000000'
                    },
                )
            ], style={'position':'relative', 'margin':'4px', }),

        ], style={'display': 'flex', 'justifyContent':'center', 'alignItems':'stretch', 'margin-right':'24px', 'margin-left':'24px'}),

        ### Start of row 2 ###

        html.Div([
            html.Div([
                html.Div([
                    # Bar graph of events and subevents
                    html.Div([
                        dcc.Graph(id='events-bar-graph')
                        ], style={'position':'relative', 'margin':'4px', 'height':'600px', 'width':'1016px'}), 

                    html.Div([
                        #top graph
                        html.Div([
                            dcc.Graph(id='actors-responsible-bar-graph')
                        ], style={'position':'relative', 'margin':'4px', 'height':'300px', 'width':'784px'}), 
                        #bottom graph
                        html.Div([
                            dcc.Graph(id='motives-bar-graph')
                        ], style={'position':'relative', 'margin':'4px', 'height':'300px', 'width':'784px'}), 
                    ], style={'display': 'flex', 'flexDirection': 'column'})
                ], style={'display': 'flex', 'justifyContent':'center',}),

            ]),

        ], style={'display': 'flex', 'justifyContent':'center', 'alignItems':'stretch', 'margin-right':'24px', 'margin-left':'24px'}),

    ], style={'padding':'24px'})
])

# Callback to update table based on click + year
@app.callback(
    Output('country-table', 'data'),
    Output('events-bar-graph', 'figure'),
    Output('actors-responsible-bar-graph', 'figure'),
    Output('motives-bar-graph', 'figure'),
    Input('country-dropdown', 'value'),
    Input('year-dropdown', 'value')
)

def update_elements(country, year):
    
    if (country is None) and (year is None):
        table_data = df.to_dict('records')
        fig_data = df
        fig_title = "Global Cyberattack Details 2014-2023"
    elif (country is None):
        table_data =  df[df.event_year == year].to_dict('records')
        fig_data =  df[df.event_year == year]
        fig_title = "Global Cyberattack Details, " + str(year)
    elif (year is None):
        table_data = df[df.affected_country == country].to_dict('records')
        fig_data = df[df.affected_country == country]
        fig_title = "Global Cyberattack Details, " + country
    else:
        table_data = df[(df['affected_country'] == country) & (df['event_year'] == year)].to_dict('records')
        fig_data = df[(df['affected_country'] == country) & (df['event_year'] == year)]
        fig_title = "Global Cyberattack Details for " + country + " , " + str(year)
    
    actors_data = fig_data['actor_country'].value_counts().reset_index()
    motives_data = fig_data['motive'].value_counts().reset_index()

    fig = px.bar(fig_data, x="event_type", color='event_subtype', title=fig_title, hover_data={"actor_country": True, "motive": True}).update_layout(legend_title_text='Cyberattack Sub-Type', xaxis_title='Cyberattack Type')
    fig2 = px.bar(actors_data, x="actor_country", y="count", title="Cyberattack Origins").update_layout(xaxis_title='Country', yaxis_title='# of Attacks')
    fig3 = px.bar(motives_data, x="motive", y="count", title="Cyberattack Motives").update_layout(xaxis_title='Attack Motive', yaxis_title='Count')

    return table_data, fig, fig2, fig3

# run app
app.run(jupyter_mode="tab", debug=True)

Dash app running on http://127.0.0.1:8050/


<IPython.core.display.Javascript object>