# Visualizing Cybersecurity Incidences
### Goal: transform numbers into impactful visuals.
### Uses:
* Plotly's Dash (for creating local dashboards)
* KaggleHub (for data)

### More about Dash:
* [Dash App Examples](https://plotly.com/examples/)
* [User Guides](https://dash.plotly.com/minimal-app)
* [More about Jupyter Support for Dash](https://github.com/plotly/jupyter-dash?tab=readme-ov-file)
* [Dash Bootstrap Themes](https://hellodash.pythonanywhere.com/adding-themes/color-modes)

Note: as of Dash v2.11, Jupyter support is built into the main Dash package.

### data set
https://huggingface.co/datasets/vinitvek/cybersecurityattacks


## Environment Setup

In [2]:
# Installations
%pip install --q pandas dash "plotly[express]" ipywidgets nbformat dash-bootstrap-components fsspec huggingface_hub dash-bootstrap-templates

Note: you may need to restart the kernel to use updated packages.


In [3]:
# Libraries
from dash import Dash, html, dcc, callback, Output, Input, dash_table # We import the dcc module (DCC stands for Dash Core Components). This module includes a Graph component called dcc.Graph, which is used to render interactive graphs.
import plotly.express as px # We also import the plotly.express library to build the interactive graphs.
import pandas as pd
import numpy as np
import dash_bootstrap_components as dbc
import plotly.io as pio
from dash_bootstrap_templates import load_figure_template
from itertools import cycle
from plotly.express.colors import qualitative

In [4]:
# Download data set from HuggingFace using Pandas
df = pd.read_csv("hf://datasets/vinitvek/cybersecurityattacks/collab dataset.csv")

## Brief Data Exploration, Understanding

In [5]:
df.head(n=5)

Unnamed: 0,slug,event_date,event_year,affected_country,affected_organization,affected_industry,afftected_industry_code,event_type,event_subtype,motive,description,actor,actor_type,actor_country,source_url
0,babb843cbce5db9e,2023-12-31 00:00:00,2023,United Kingdom of Great Britain and Northern I...,Radioactive Waste Management,Administrative and Support and Waste Managemen...,56,Undetermined,Undetermined,Undetermined,Threat actors try to break into Radioactive Wa...,Undetermined,Criminal,Undetermined,https://www.theguardian.com/business/2023/dec/...
1,581e011d5c37c281,2023-12-31 00:00:00,2023,Belarus,BelTA,Information,51,Disruptive,Undetermined,Protest,Belarusian hacktivists from the Belarusian Cyb...,Belarusian Cyber-Partisans,Hacktivist,Belarus,https://www.bankinfosecurity.com/hacktivists-s...
2,fa79c150aac3cf77,2023-12-30 00:00:00,2023,United States of America,Xerox Business Solutions,Administrative and Support and Waste Managemen...,56,Mixed,Exploitation of Application Server,Financial,The U.S. division of Xerox Business Solutions ...,INC Ransom,Criminal,Undetermined,https://www.bleepingcomputer.com/news/security...
3,4d12747a4dd52156,2023-12-30 00:00:00,2023,Iran (Islamic Republic of),SnappFood,Accommodation and Food Services,72,Mixed,Exploitation of Application Server,Financial,Irleaks claims to have broken into the systems...,Irleaks,Criminal,Undetermined,https://www.darkreading.com/cyberattacks-data-...
4,1079752e8fe90b4d,2023-12-29 00:00:00,2023,Canada,Memorial University of Newfoundland,Educational Services,61,Disruptive,Undetermined,Financial,Memorial University of Newfoundland (MUN) is h...,Undetermined,Criminal,Undetermined,https://www.bleepingcomputer.com/news/security...


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13407 entries, 0 to 13406
Data columns (total 15 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   slug                     13407 non-null  object
 1   event_date               13407 non-null  object
 2   event_year               13407 non-null  int64 
 3   affected_country         13407 non-null  object
 4   affected_organization    13407 non-null  object
 5   affected_industry        13407 non-null  object
 6   afftected_industry_code  13407 non-null  int64 
 7   event_type               13407 non-null  object
 8   event_subtype            13407 non-null  object
 9   motive                   13407 non-null  object
 10  description              13407 non-null  object
 11  actor                    13407 non-null  object
 12  actor_type               13407 non-null  object
 13  actor_country            13407 non-null  object
 14  source_url               13407 non-nul

In [6]:
df.describe()

Unnamed: 0,event_year,afftected_industry_code
count,13407.0,13407.0
mean,2019.703886,63.197434
std,2.803879,18.867469
min,2014.0,11.0
25%,2017.0,51.0
50%,2020.0,61.0
75%,2022.0,81.0
max,2023.0,99.0


In [7]:
df.nunique()

slug                       13407
event_date                  3130
event_year                    10
affected_country             163
affected_organization      12252
affected_industry             22
afftected_industry_code       42
event_type                     4
event_subtype                 86
motive                        10
description                11693
actor                       1135
actor_type                     6
actor_country                 82
source_url                 10768
dtype: int64

In [8]:
df.nunique(axis=0)

slug                       13407
event_date                  3130
event_year                    10
affected_country             163
affected_organization      12252
affected_industry             22
afftected_industry_code       42
event_type                     4
event_subtype                 86
motive                        10
description                11693
actor                       1135
actor_type                     6
actor_country                 82
source_url                 10768
dtype: int64

In [9]:
df.drop(columns=['slug', 'affected_organization','description','source_url'])

Unnamed: 0,event_date,event_year,affected_country,affected_industry,afftected_industry_code,event_type,event_subtype,motive,actor,actor_type,actor_country
0,2023-12-31 00:00:00,2023,United Kingdom of Great Britain and Northern I...,Administrative and Support and Waste Managemen...,56,Undetermined,Undetermined,Undetermined,Undetermined,Criminal,Undetermined
1,2023-12-31 00:00:00,2023,Belarus,Information,51,Disruptive,Undetermined,Protest,Belarusian Cyber-Partisans,Hacktivist,Belarus
2,2023-12-30 00:00:00,2023,United States of America,Administrative and Support and Waste Managemen...,56,Mixed,Exploitation of Application Server,Financial,INC Ransom,Criminal,Undetermined
3,2023-12-30 00:00:00,2023,Iran (Islamic Republic of),Accommodation and Food Services,72,Mixed,Exploitation of Application Server,Financial,Irleaks,Criminal,Undetermined
4,2023-12-29 00:00:00,2023,Canada,Educational Services,61,Disruptive,Undetermined,Financial,Undetermined,Criminal,Undetermined
...,...,...,...,...,...,...,...,...,...,...,...
13402,2014-01-02 00:00:00,2014,United States of America,Information,51,Exploitive,Exploitation of Application Server,Undetermined,Undetermined,Criminal,Undetermined
13403,2014-01-01 00:00:00,2014,United States of America,Educational Services,61,Exploitive,Exploitation of End Hosts,Undetermined,Undetermined,Criminal,Undetermined
13404,2014-01-01 00:00:00,2014,United States of America,"Professional, Scientific, and Technical Services",54,Exploitive,Exploitation of Application Server,Undetermined,Undetermined,Criminal,Undetermined
13405,2014-01-01 00:00:00,2014,United States of America,"Professional, Scientific, and Technical Services",54,Disruptive,Message Manipulation,Protest,Syrian Electronic Army,Hacktivist,Syrian Arab Republic


## Dash App
(Local Dashboard Creation)

In [5]:
# Define your color palette and map it to countries
df['affected_country'].loc[df[df['affected_country'] == 'Antarctica'].index] = 'Netherlands'

unique_countries = df['affected_country'].unique()
palette = cycle(qualitative.Alphabet)  # use a large qualitative palette

color_map = {country: color for country, color in zip(unique_countries, palette)}

# adds  templates to plotly.io
pio.templates.default = "plotly_dark"

You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  df['affected_country'].loc[df[df['affected_country'] == 'Antarctica'].index] = 'Netherlands'
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ret

In [6]:
duplicate_rows_mask = df.duplicated()
df[duplicate_rows_mask]

Unnamed: 0,slug,event_date,event_year,affected_country,affected_organization,affected_industry,afftected_industry_code,event_type,event_subtype,motive,description,actor,actor_type,actor_country,source_url


In [13]:
# Remove time, keep only date
df['event_date'] = pd.to_datetime(df['event_date'])
df['date'] = df['event_date'].dt.date

# Sort by date (ascending)
df = df.sort_values(by='date')

# Optional: Reset index after sorting
df = df.reset_index(drop=True)

years = df.event_year.unique().tolist()
available_countries = df.affected_country.unique().tolist()
available_countries.sort()
# make choropleth map
choro_graph = px.choropleth(
    df,
    locations='affected_country',     # Column with country codes
    locationmode='country names',
    color='affected_country',
    projection='equirectangular',     # Map projection style
    color_discrete_map=color_map,
)

choro_graph = choro_graph.update_layout(showlegend=False,  margin=dict(l=0, r=0, t=0, b=0), )


In [None]:
app = Dash(__name__, external_stylesheets=[dbc.themes.CYBORG])

app.layout = html.Div([
    html.Div([

        ### Dashboard title ###
        html.Div([
            html.H1("Cybersecurity Failures", style={
                "color": "#FFFFFF",
                "fontSize": "64px",
                "fontFamily": "Arial",
                "textAlign": "center",
            }),
            html.P(
                dcc.Markdown("Data collected from 2014-2023 [[source]](https://huggingface.co/datasets/vinitvek/cybersecurityattacks)"),
                style={"fontSize": "24px", 
                       "color": "#E1E1E1", 
                       "marginTop": "-10px", 
                       "fontFamily": "Calibri", 
                       "textAlign": "center",}),
        ]),

        html.Div([
            html.H2("Click a country on the map to get more specific information on their cybersecurity incidences.", style={
                'height': '40px',
                'backgroundColor': "#52495FB3",
                'border': '1px solid #000000',
                'alignItems': 'center',
                'textAlign': 'center',
                'fontSize': '16px',
                'borderRadius': '8px',  # Optional: slightly rounded corners
                'padding':'8px',
                'width':'1920px'
                }
            )], style={'display':'flex', 'justifyContent': 'center', } 
        ),

        html.Div([
            html.P("World Map: All countries which experienced cybersecurity failures (Coloured)",
                style={
                    "color": "white",
                    "fontSize": "12px",
                    "fontFamily": "monospace",
                    'fontWeight':'bold',
                    "margin-top":'8px',
                    'width':'1008px',
                    "position":"absolute",
                    'padding':'5px',
                    'backgroundColor':'black',
                    'border': '0.5px solid black',
                    'right':'47%',
                    'zIndex': 9999}
            ),

            # Map Graph
            html.Div([
                dcc.Graph(id='world-map',
                        figure=choro_graph)
            ], style={'position':'relative', 'margin':'4px', 'padding-top':'30px', 'height':'500px', 'width':'1000px'}),

            # Data Table
            html.Div([
                dash_table.DataTable(
                    id='country-table',
                    columns=[{"name": col, "id": col} for col in ['affected_country', 'date', 'affected_organization', 'description']],
                    fixed_rows={'headers': True},
                    data=df.to_dict('records'),
                    page_action='none',
                    style_table={
                        'overflowY': 'auto',   # Vertical scroll
                        'overflowX':'auto'
                    },
                    style_data={
                        'backgroundColor': "#1A1A1A",  # 👈 row background
                        'color': 'white'              # text color
                    },
                    style_header={
                        'zIndex':'9999',
                        'backgroundColor': "#000000",    # 👈 header row bg
                        'color': 'white',
                        'fontWeight': 'bold'
                    },
                    style_cell={
                        'fontSize': '12px',
                        'minWidth':'196px',
                        'maxWidth':'196px',
                        'textAlign': 'left',
                        'whiteSpace': 'normal',  # Wrap text if needed
                        'border': '1px solid #000000'
                    },
                )
            ], style={'position':'relative', 'margin':'4px', }),

            # Year Filter
            html.Div([
                html.H4("Select Year",
                        style={'font':'monospace',
                               'fontSize':'14px',
                               'fontWeight':'bold',
                               'backgroundColor':'#000000',
                               'padding':'4px'}),

                dcc.RadioItems(
                    id='year-filter',
                    options=[{'label': year, 'value': year} for year in years],
                    value=None,
                    style={
                        'backgroundColor': "#0e0d0d",  # dark background
                        'color': 'white',             # text color
                        'border': '1px solid #1A1A1A',
                        'padding':'8px',
                    },
                    labelStyle={
                        'display': 'block',
                        'marginBottom': '8px',
                        'fontSize': '16px',
                        'cursor': 'pointer'
                    }  
                )], style={'position': 'relative', 'margin':'8px', 'zIndex': 9999, }),

        ], style={'display': 'flex', 'justifyContent':'center', 'padding':'16px', 'alignItems':'stretch', 'border': '1px solid red', 'margin-right':'24px', 'margin-left':'24px'}),

        ### Start of section 2 ###

        html.Div([
            html.H2("More information regarding the affected countries.", style={
                'height': '40px',
                'backgroundColor': "#52495FB3",
                'border': '1px solid #000000',
                'alignItems': 'center',
                'textAlign': 'center',
                'fontSize': '16px',
                'borderRadius': '8px',  # Optional: slightly rounded corners
                'padding':'8px',
                'width':'1920px'
                }
            )], style={'display':'flex', 'justifyContent': 'center', } 
        ),

        html.Div([
            html.Div([
                html.H2("Pick a country and year to specify graphs and tables",
                        style={
                            'fontSize':'16px',
                            'font':'Monospace'
                        }),

                html.Div([
                    # Country dropdown
                    html.Div([
                        html.Label("Select a Country:", style={'padding-right':'8px',}),
                        dcc.Dropdown(
                            id='country-dropdown',
                            options=[{'label': country, 'value': country} for country in available_countries],
                            value='Canada',
                            style={'width': '128px',}
                        ), 
                    ], style={'padding-right':'16px'}),
                    # Year dropdown
                    html.Div([
                        html.Label("Select a year:", style={'padding-right':'8px'}),
                        dcc.Dropdown(
                            id='year-dropdown',
                            options=[{'label': year, 'value': year} for year in years],
                            value=2023,
                            style={'width': '128px'}
                        ),
                    ]),
                ], style={'display': 'flex', 'justifyContent':'left'}),

                # Data Table
                html.Div([
                    dash_table.DataTable(
                        id='subtype-event-table',
                        columns=[{"name": col, "id": col} for col in ['event_subtype', 'count']],
                        fixed_rows={'headers': True},
                        data=[],
                        page_action='none',
                        style_table={
                            'overflowY': 'auto',   # Vertical scroll
                            'overflowX':'auto'
                        },
                        style_data={
                            'backgroundColor': "#1A1A1A",  # 👈 row background
                            'color': 'white'              # text color
                        },
                        style_header={
                            'zIndex':'9999',
                            'backgroundColor': "#000000",    # 👈 header row bg
                            'color': 'white',
                            'fontWeight': 'bold'
                        },
                        style_cell={
                            'fontSize': '12px',
                            'minWidth':'196px',
                            'maxWidth':'196px',
                            'textAlign': 'left',
                            'whiteSpace': 'normal',  # Wrap text if needed
                            'border': '1px solid #000000'
                        })
                    ], style={'position':'relative', 'margin':'4px', }),
                
            ])

        ], style={'display': 'flex', 'justifyContent':'center', 'padding':'16px', 'alignItems':'stretch', 'border': '1px solid red', 'margin-right':'24px', 'margin-left':'24px'}),

    ], style={'padding':'24px', 'border': '1px solid red'})
])

# Callback to update table based on click + year
@app.callback(
    Output('country-table', 'data'),
    Input('world-map', 'clickData'),
    Input('year-filter', 'value')
)
def update_section1(clickData, selected_year):
    if (clickData is None) and (selected_year is None):
        table_data = df.to_dict('records')
    elif (clickData is None):
        table_data =  df[df.event_year == selected_year].to_dict('records')
    elif (selected_year is None):
        country_clicked = clickData['points'][0]['location']
        table_data = df[df.affected_country == country_clicked].to_dict('records')
    else:
        country_clicked = clickData['points'][0]['location']
        table_data = df[(df['affected_country'] == country_clicked) & (df['event_year'] == selected_year)].to_dict('records')
    
    return table_data

@app.callback(
    Output('subtype-event-table', 'data'),
    Input('country-dropdown', 'value'),
    Input('year-dropdown', 'value')
)
def update_section2(country, year):
    if (country is None) and (year is None):
        table_data = df['event_subtype'].value_counts().reset_index()
    elif (country is None):
        table_data =  df[df.event_year == year]['event_subtype'].value_counts().reset_index()
    elif (year is None):
        table_data = df[df.affected_country == country]['event_subtype'].value_counts().reset_index()
    else:
        table_data = df[(df['affected_country'] == country) & (df['event_year'] == year)]['event_subtype'].value_counts().reset_index()
    
    return table_data.to_dict('records')

# run app
app.run(jupyter_mode="tab", debug=True)

Dash app running on http://127.0.0.1:8050/


<IPython.core.display.Javascript object>

In [20]:
df[df.event_year == 2023]['motive'].value_counts().reset_index()

Unnamed: 0,motive,count
0,Financial,1732
1,Undetermined,267
2,Protest,257
3,Political-Espionage,79
4,Espionage,20
5,Sabotage,17
6,Personal Attack,1
7,Industrial-Espionage,1


In [27]:
t = df[df.event_year == 2022]
t = t[t.affected_country == 'Canada']
t['motive'].value_counts().reset_index()

Unnamed: 0,motive,count
0,Financial,41
1,Undetermined,3
2,Protest,3
3,Political-Espionage,2


In [28]:
t['actor_country'].value_counts()

actor_country
Undetermined          34
Russian Federation    15
Name: count, dtype: int64

In [None]:
df.columns

Index(['slug', 'event_date', 'event_year', 'affected_country',
       'affected_organization', 'affected_industry', 'afftected_industry_code',
       'event_type', 'event_subtype', 'motive', 'description', 'actor',
       'actor_type', 'actor_country', 'source_url', 'date'],
      dtype='object')