# 740076443 BEMM461 Executive Dashboard - CW2

## Introduction
This Jupyter Notebook contains my Dashboard of visualisations created for an executive audience: decision makers at the central level, i.e. the Home Office, and then also police executives at local level. This notebook detail my code, thought process, challenges and considerations amongst other insight. It aims to provide an interesting commentary and display of data analysis and the steps needed to be taken to create a Dashboard.

## Table of Links
### Table
| Description | Link |
| -- | -- |
| Reflective blog |https://ele.exeter.ac.uk/mod/oublog/view.php?id=3234123|
| Chosen Dataset |https://www.data.gov.uk/dataset/44aba3b3-b9e4-4b0c-ad30-6ae2dcd9a9b9/police-use-of-force|
| Secondary Dataset | https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/datasets/policeforceareadatatables |

## Table of Contents
1. Executive Summary
2. Project Dashboard
3. Background to the Project
4. Articulation of Decision Making Process
5. Review of Analytics Methods Chosen
6. Review of Available Tools
7. Review of Chosen Datasets 
8. Visualisation of Data with Accompanying Code
9. Reflective Evaluation
10. Conclusion

## 1. Executive Summary

The aim of this Dashboard is to provide to UK Home Office decision makers at the local and central level key insights into the current policing of the UK population along various dimensions. In this way this Dashboard is intended for police/justice executive offices across the UK - i.e. including decision makers in the Metropolitan Police. As Sedrakyan et al (2019) posit, these being stakeholders at the “mega-level (governance)” and “macro-level (institution)”.
This is done to allow their decisions to be better evidenced and informed with regards to multiple factors, metrics and considerations. For example, the number of police resources needing to be deployed regionally relative to crime incident: target focuses i.e. London, areas for monitoring amongst age gender and then for specific information along the tactics being deployed by the police at multiple levels. By this, this intentional “drill down” display (Bach et al., 2022) targets a “detailed understanding of user needs”, at the “high level and low level” (Brath and Peters, 2004).

This was explicitly done to reduce cognitive load and elicit effective and easy to interpret insights at a glance – favouring a self-sufficient “visual language” (Kirk, 2020).

The Dashboard consists of multiple dimensions, variables and strata tracking specific metrics; for example, UK knife crime according to region over multiple years is particularly intuitive and instructive.

An intentional mix of visualisations were chosen - creating variety for the purpose of specificity, ease of understanding and aesthetics, according to each message being tried to be conveyed and data represented. By this, both the “intended structure of the data” and “perceptual structure of the visualisation” (Sedrakyan et al., 2019) were considered. 

This Dashboard notebook aims to effuse these considerations along multiple dimensions: audience, dashboard purpose, data source, data structure, data suitability and the intentions behind the crafting of visualisation and display behind each dashboard visualisation. 

As such I will review and critique of the chosen analytics methods and evaluation of tools used.

A link to the original data source is provided for access, evidence and information.

The notebook will demonstrate the Python code from start to finish of creating this dashboard, for reference and guidance of thought process and decision making.
My reflective blog is linked to evidence the week-by-week development of the project; ideas for topic, data sourcing, data manipulation and data visualisation amongst others: along with challenges and changes.

My conclusion summaries the notebook's activities and provides key findings and outcomes along with academic reference.

## 2. Project Dashboard

My Dashboard built and loaded into the Dash (Dash.Plotly.com) framework is provided below, a web based dashboard able to be hosted within the Jupyter Notebook.

My dashboard was instead for executives; decision makers who are needing profound insight with minimal input to cognition, processing or prior understanding. Whilst Schwendimann et al., (2016) maintain there is “still no consensus on what constitutes a dashboard”, let alone an effective one, I maintain that a ‘good’ dashboard can be constructed by considering ‘good’ design principles and equally importantly – the audience. Tufte (1997)’s maintenance of “excellence in statistical graphics consist[ing] of complex ideas communicated with clarity, precision and efficiency” renders moot.

Dash is an ideal framework for facilitating this; a web-based medium hosting up-to-date graphics packages which are loaded in according to necessary breadth and depth. This intrinsically removes barriers to access and understanding, and is especially potent when real-time data can be displayed. In this way, “cognitive scalability” (Yoghourdijan et al., 2018) is considered in each visualisation - whilst the “structure is not known a priori” (Manovich, 2011), a Dashboard is crucially an interactive, open and considered medium.

The information was structured and displayed with human cognitive abilities in mind. This is according to principles of psychology, memory, perception, understanding, best design principles and aesthetics.  In this way, each graph was considered to be “understandable first and beautiful after that” (Cairo, 2012). 
Summarily, the concept of weaving together “three applications in a seamless fashion” – namely “monitoring, analysis and management” (Eckerson, 2010) was considered throughout the Dashboarding process. This was achieved through various levels of summary depth and consideration of audience utility, requirements, usability and interaction.


In [1]:
import dash
from dash import dcc, html
import dash_bootstrap_components as dbc
from dash.dependencies import Input, Output
import plotly.express as px
import pandas as pd
import dash_leaflet as dl
import requests
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import os

# Load sample data (you can replace this with your data source)
#df = px.data.gapminder()
FullPoliceData = pd.ExcelFile('FullPoliceData.xlsx')

# Create a Dash app with Bootstrap styling
app = dash.Dash(__name__)

#Reading in 3rd page of the Excel sheet for analysis
Incidents_DataFrame = pd.read_excel(FullPoliceData, 2, index_col=0, skiprows = 5)

#Removing non-data
Incidents_DataFrame.drop(Incidents_DataFrame.tail(7).index,inplace=True)
Incidents_DataFrame.drop(Incidents_DataFrame.tail(2).index,inplace=True)

#Removing Missing Values
Incidents_DataFrame.dropna(inplace=True)

#Filling Missing Values
Incidents_DataFrame.bfill(inplace=True)

#Resetting Index
Incidents_DataFrame = Incidents_DataFrame.reset_index()

#Filling remaining Missing Values
Incidents_DataFrame.ffill(inplace=True)

#Setting index
Incidents_DataFrame.set_index('Type of force')

#Making Sub Data Frame
forcesplit = [Incidents_DataFrame.loc[[i]] for i in Incidents_DataFrame.index] 

#Removing variables which would skew data
Incidents_DataFrame = Incidents_DataFrame.drop(Incidents_DataFrame[Incidents_DataFrame['Tactic'] == 'Total Restraint'].index)
Incidents_DataFrame = Incidents_DataFrame.drop(Incidents_DataFrame[Incidents_DataFrame['Tactic'] == 'Total Unarmed skills'].index)
Incidents_DataFrame = Incidents_DataFrame.drop(Incidents_DataFrame[Incidents_DataFrame['Tactic'] == 'Total Other equipment'].index)
Incidents_DataFrame = Incidents_DataFrame.drop(Incidents_DataFrame[Incidents_DataFrame['Tactic'] == 'Total Less lethal weapons'].index)
Incidents_DataFrame = Incidents_DataFrame.drop(Incidents_DataFrame[Incidents_DataFrame['Tactic'] == 'Total Firearms'].index)
Incidents_DataFrame = Incidents_DataFrame.drop(Incidents_DataFrame[Incidents_DataFrame['Tactic'] == 'Total Other'].index)
Incidents_DataFrame = Incidents_DataFrame.drop(Incidents_DataFrame[Incidents_DataFrame['Tactic'] == 'Total Not reported'].index)
Incidents_DataFrame = Incidents_DataFrame.drop(Incidents_DataFrame[Incidents_DataFrame['Tactic'] == 'Handcuffing, of which'].index)
Incidents_DataFrame = Incidents_DataFrame.drop(Incidents_DataFrame[Incidents_DataFrame['Tactic'] == 'Baton, of which'].index)
Incidents_DataFrame = Incidents_DataFrame.drop(Incidents_DataFrame[Incidents_DataFrame['Tactic'] == 'Irritant spray, of which'].index)
Incidents_DataFrame = Incidents_DataFrame.drop(Incidents_DataFrame[Incidents_DataFrame['Tactic'] == 'CED, of which'].index)

#Reading in second dataset for visualisation
Age_Type_DataFrame = pd.read_excel(FullPoliceData, 3, skiprows = 4)

#Cleaning second dataframe
Age_Type_DataFrame.drop([0,1])

#Renaming columns
Age_Type_DataFrame.rename(columns={"Unnamed: 0": "Type of force", "Unnamed: 1": "Tactic", "Unnamed: 10": "Total incidents per tactic"})

#Cleaning
Age_Type_DataFrame.drop(Age_Type_DataFrame.tail(8).index,inplace=True)
Age_Type_DataFrame = Age_Type_DataFrame.rename(columns={"Unnamed: 0": "Type of force", "Unnamed: 1": "Tactic", "Unnamed: 10": "Total incidents per tactic"})
Age_Type_DataFrame = Age_Type_DataFrame.drop(columns=['Unnamed: 9'])

#Transforming DataFrame
Age_Type_DataFrame = Age_Type_DataFrame.iloc[[*np.arange(1, len(Age_Type_DataFrame)), *np.arange(1)]]

#Cleaning second DataFrame
Age_Type_DataFrame = Age_Type_DataFrame.drop([1,2,35])
Age_Type_DataFrame.drop(Age_Type_DataFrame.tail(1).index,inplace=True)
Age_Type_DataFrame['Type of force'] =  Age_Type_DataFrame['Type of force'].ffill()
Age_Type_DataFrame = Age_Type_DataFrame.drop(Age_Type_DataFrame[Age_Type_DataFrame['Tactic'] == 'Total Restraint'].index)
Age_Type_DataFrame = Age_Type_DataFrame.drop(Age_Type_DataFrame[Age_Type_DataFrame['Tactic'] == 'Handcuffing, of which'].index)
Age_Type_DataFrame = Age_Type_DataFrame.drop(Age_Type_DataFrame[Age_Type_DataFrame['Tactic'] == 'Total Unarmed skills'].index)
Age_Type_DataFrame = Age_Type_DataFrame.drop(Age_Type_DataFrame[Age_Type_DataFrame['Tactic'] == 'Total Other equipment'].index)
Age_Type_DataFrame = Age_Type_DataFrame.drop(Age_Type_DataFrame[Age_Type_DataFrame['Tactic'] == 'Baton, of which'].index)
Age_Type_DataFrame = Age_Type_DataFrame.drop(Age_Type_DataFrame[Age_Type_DataFrame['Tactic'] == 'Irritant spray, of which'].index)
Age_Type_DataFrame = Age_Type_DataFrame.drop(Age_Type_DataFrame[Age_Type_DataFrame['Tactic'] == 'Total Less lethal weapons'].index)
Age_Type_DataFrame = Age_Type_DataFrame.drop(Age_Type_DataFrame[Age_Type_DataFrame['Tactic'] == 'AEP, of which'].index)
Age_Type_DataFrame = Age_Type_DataFrame.drop(Age_Type_DataFrame[Age_Type_DataFrame['Tactic'] == 'Total Other'].index)
Age_Type_DataFrame = Age_Type_DataFrame.drop(Age_Type_DataFrame[Age_Type_DataFrame['Tactic'] == 'Dog use, of which'].index)

#Combining variables in second DataFrame
Age_Type_DataFrame_long = Age_Type_DataFrame.melt(id_vars=["Type of force", "Tactic", "Total incidents per tactic"], 
                  value_vars=["Under 11 years", "11 - 17 years", "18 - 34 years", "35 - 49 years", "50 - 64 years", "65 and over", "Not reported"],
                  var_name="Age Group", value_name="Incident Count")

Age_Type_DataFrame_long = Age_Type_DataFrame_long.drop(Age_Type_DataFrame_long[Age_Type_DataFrame_long['Tactic'] == 'Tactical communication (with other tactic)'].index)
Gender_DF = pd.read_excel(FullPoliceData, 4, skiprows=4)
Gender_DF = Gender_DF.rename(columns={"Unnamed: 0": "Type of force", "Unnamed: 1": "Tactic", "Unnamed: 7": "Total incidents per tactic"})
Gender_DF = Gender_DF.drop(columns=['Unnamed: 6'])
Gender_DF = Gender_DF.drop(Gender_DF.index[34:44])
Gender_DF['Other'] = Gender_DF['Other'] + Gender_DF['Not reported']
Gender_DF = Gender_DF.drop(columns=['Not reported'])
Gender_DF = Gender_DF.drop([0,1,2])
Gender_DF = Gender_DF.drop([44])
Gender_DF_Long = Gender_DF.melt(id_vars=["Type of force", "Tactic", "Total incidents per tactic"], 
                  value_vars=["Male", "Female", "Other"],
                  var_name="Gender", value_name="Incident Count")
Gender_DF_Long = Gender_DF_Long.drop(Gender_DF_Long[Gender_DF_Long['Tactic'] == 'Total Restraint'].index)
Gender_DF_Long = Gender_DF_Long.drop(Gender_DF_Long[Gender_DF_Long['Tactic'] == 'Handcuffing, of which'].index)
Gender_DF_Long = Gender_DF_Long.drop(Gender_DF_Long[Gender_DF_Long['Tactic'] == 'Total Unarmed skills'].index)
Gender_DF_Long = Gender_DF_Long.drop(Gender_DF_Long[Gender_DF_Long['Tactic'] == 'Total Other equipment'].index)
Gender_DF_Long = Gender_DF_Long.drop(Gender_DF_Long[Gender_DF_Long['Tactic'] == 'Baton, of which'].index)
Gender_DF_Long = Gender_DF_Long.drop(Gender_DF_Long[Gender_DF_Long['Tactic'] == 'Total Less lethal weapons'].index)
Gender_DF_Long = Gender_DF_Long.drop(Gender_DF_Long[Gender_DF_Long['Tactic'] == 'Total Other'].index)
UK_Regions_Incidents = pd.read_excel(FullPoliceData, 13, skiprows = 3)
UK_Regions_Incidents = UK_Regions_Incidents.drop(UK_Regions_Incidents.index[54:57])
UK_Regions_Incidents = UK_Regions_Incidents[~UK_Regions_Incidents['Police force'].str.contains('total', case=False, na=False)]
region_totals = UK_Regions_Incidents.groupby('Region')['Total'].sum().reset_index()
region_totals['Region'] = region_totals['Region'].replace('Yorkshire and the Humber', 'Yorkshire and The Humber')
FullPoliceTimeSeries = pd.ExcelFile('policeforceareatablesfinal.xlsx')
Knife_TimeSeries = pd.read_excel(FullPoliceTimeSeries, 8, skiprows = 7, header = 1)
Knife_TimeSeries = Knife_TimeSeries.drop(columns=['Area Codes'])
Knife_TimeSeries = Knife_TimeSeries.rename(columns={"Apr 2010 to Mar 2011": "2010", "% involving a knife Apr 2010\n to Mar 2011": "2010 Knife %", "Apr 2011 to Mar 2012\n[note 10]": "2011", "% involving a knife Apr 2011\n to Mar 2012": "2011 Knife %"})
Knife_TimeSeries = Knife_TimeSeries.rename(columns={"Apr 2012 to Mar 2013": "2012", "% involving a knife Apr 2012 \nto Mar 2013": "2012 Knife %", "Apr 2013 to Mar 2014": "2013", "% involving a knife Apr 2013\n to Mar 2014": "2013 Knife %"})
Knife_TimeSeries = Knife_TimeSeries.rename(columns={"Apr 2014 to Mar 2015": "2014", "% involving a knife Apr 2014\n to Mar 2015": "2014 Knife %", "Apr 2015 to Mar 2016": "2015", "% involving a knife Apr 2015\n to Mar 2016": "2015 Knife %"})
Knife_TimeSeries = Knife_TimeSeries.rename(columns={"Apr 2016 to Mar 2017": "2016", "% involving a knife Apr 2016\n to Mar 2017": "2016 Knife %", "Apr 2017 to Mar 2018": "2017", "% involving a knife Apr 2017\n to Mar 2018": "2017 Knife %"})
Knife_TimeSeries = Knife_TimeSeries.rename(columns={"Apr 2018 to Mar 2019": "2018", "% involving a knife Apr 2018\n to Mar 2019": "2018 Knife %", "Apr 2019 to Mar 2020": "2019", "% involving a knife Apr 2019 to Mar 2020": "2019 Knife %"})
Knife_TimeSeries = Knife_TimeSeries.rename(columns={"Apr 2020 to Mar 2021": "2020", "% involving a knife Apr 2020 to Mar 2021": "2020 Knife %", "Apr 2021 to Mar 2022": "2021", "% involving a knife Apr 2021 to Mar 2022": "2021 Knife %"})
Knife_TimeSeries = Knife_TimeSeries.rename(columns={"Jul 2022 to Jun 2023\n[note 3]": "2022", "% involving a knife Jul 2022 to Jun 2023": "2022 Knife %", "Jul 2023 to Jun 2024\n[note 3]": "2023", "% involving a knife Jul 2023 to Jun 2024": "2023 Knife %"})
Knife_TimeSeries = Knife_TimeSeries.drop(columns=['Jul 2023 to Jun 2024 compared with previous year % change'])
Knife_TimeSeries = Knife_TimeSeries.drop([0,1,3])
Knife_TimeSeries["Area name"] = Knife_TimeSeries["Area name"].replace("ENGLAND [note 3, 9]", "England")
Knife_TimeSeries["Area name"] = Knife_TimeSeries["Area name"].replace("Northumbria [note 11]", "Northumbria")
Knife_TimeSeries["Area name"] = Knife_TimeSeries["Area name"].replace("Greater Manchester [note 3, 9]", "Manchester")
Knife_TimeSeries["Area name"] = Knife_TimeSeries["Area name"].replace("Leicestershire [note 6]", "Leicestershire")
Knife_TimeSeries["Area name"] = Knife_TimeSeries["Area name"].replace("Essex [note 11]", "Essex")
Knife_TimeSeries["Area name"] = Knife_TimeSeries["Area name"].replace("Surrey [note 7][note 11]", "Surrey")
Knife_TimeSeries["Area name"] = Knife_TimeSeries["Area name"].replace("Norfolk [note 11]", "Norfolk")
Knife_TimeSeries["Area name"] = Knife_TimeSeries["Area name"].replace("Sussex [note 11][note 12]", "Sussex")
Knife_TimeSeries["Area name"] = Knife_TimeSeries["Area name"].replace("Suffolk [note 11]", "Suffolk")
Knife_TimeSeries["Area name"] = Knife_TimeSeries["Area name"].replace("Thames Valley [note 11]", "Thames Valley")
Knife_TimeSeries["Area name"] = Knife_TimeSeries["Area name"].replace("Avon and Somerset [note 11]", "Avon and Somerset")
Knife_TimeSeries["Area name"] = Knife_TimeSeries["Area name"].replace("WALES", "Wales")
Knife_TimeSeries["2022"] = pd.to_numeric(Knife_TimeSeries["2022"], errors = 'coerce')
Knife_TimeSeries["2022 Knife %"] = pd.to_numeric(Knife_TimeSeries["2022 Knife %"], errors = 'coerce')
Knife_TimeSeries["2023"] = pd.to_numeric(Knife_TimeSeries["2023"], errors = 'coerce')
Knife_TimeSeries["2023 Knife %"] = pd.to_numeric(Knife_TimeSeries["2023 Knife %"], errors = 'coerce')
UK_regions = ['North East ', 'North West', 'Yorkshire and The Humber', 'West Midlands', 'East Midlands', 'East', 'South East', 'South West', 'London']

UK_Knife_Small_DF = Knife_TimeSeries[Knife_TimeSeries['Area name'].isin(UK_regions)]

UK_Knife_Small_DF.rename(columns = {'Apr 2016 to Mar 2017': '2016'})

UK_Knife_Small_Long = pd.melt(UK_Knife_Small_DF, id_vars=["Area name"], value_vars=[str(year) for year in range(2010, 2024)],  var_name="Year", value_name="Knife Incidents")

UK_Knife_Small_Long['Year'] = UK_Knife_Small_Long['Year'].astype(int)

UK_Knife_Small_Long['Knife Incidents'] = pd.to_numeric(UK_Knife_Small_Long['Knife Incidents'], errors='coerce')

#first graph
fig_bubble = px.scatter(Age_Type_DataFrame_long, 
                        x='Tactic', 
                        y='Incident Count', 
                        size='Incident Count', 
                        color='Age Group', 
                        title='Bubble Plot of Incident Count by Tactic and Age Group', 
                        labels={'Incident Count': 'Incident Count', 'Tactic': 'Tactic'},
                        size_max=30)

#second graph
fig_bar = px.bar(Gender_DF_Long, 
                 x='Gender', 
                 y='Incident Count', 
                 color = 'Tactic',
                 title='Gender breakdown of Police method used', 
                 labels={'Type of force': 'Type of Force', 'Total': 'Total Incidents'})


#third graph
Treemap_Tactic = px.treemap(Incidents_DataFrame, 
                            path=[px.Constant("all"), 'Type of force', 'Tactic'], 
                            values='Total',
                            title='Population breakdown of all Tactic types',  
                            color='Tactic')
Treemap_Tactic.update_layout(margin=dict(t=50, l=25, r=25, b=25))

#fourth graph
Pairplot_Tactic = sns.pairplot(data = Gender_DF, hue = 'Tactic')
pairplot_image_path = os.path.join(os.getcwd(), 'assets', 'pairplot_tactic.png')
os.makedirs(os.path.dirname(pairplot_image_path), exist_ok=True)
Pairplot_Tactic.savefig(pairplot_image_path, dpi=300)
plt.close()

#fifth graph
plt.figure(figsize=(12, 8))
TimeSeries_Knife = sns.lineplot(x="Year", y="Knife Incidents", hue="Area name", data=UK_Knife_Small_Long, marker="o")
plt.title("Regional number of Knife Incidents over time")
time_series_image_path = os.path.join(os.getcwd(), 'assets', 'time_series_knife.png')
os.makedirs(os.path.dirname(time_series_image_path), exist_ok=True)
plt.savefig(time_series_image_path, dpi=300)
plt.close()


#geojson 
geojson_url = "https://raw.githubusercontent.com/codeforgermany/click_that_hood/master/public/data/united-kingdom-regions.geojson"
response = requests.get(geojson_url)
geojson_data = response.json()


# Layout of the app
app.layout = html.Div([
    # Title
    html.H1("UK Police Incidents Interactive Dashboard", style={'textAlign': 'center'}),

    # Map
    html.H3("UK Police Incident Areas and Values", style={'textAlign': 'center'}),
    dl.Map(
        center=[51.505, -0.09], zoom=6, children=[
            dl.TileLayer(),
            dl.GeoJSON(data=geojson_data, id="geojson-layer")
        ], style={'height': '500px', 'width': '100%'}
    ),
# New Bubble Chart
    html.H3("Incident Count by Tactic and Age Group", style={'textAlign': 'center'}),
    dcc.Graph(
        id='age-group-bubble-chart',
        figure=fig_bubble
    ),
# New Bar Chart for Force Incidents by Type
    html.H3("Gender breakdown of police methods used", style={'textAlign': 'center'}),
    dcc.Graph(
        id='force-incidents-bar-chart',
        figure=fig_bar
    ),
#Treemap
   html.H3("Incidents by Tactic and Type of Force", style={'textAlign': 'center'}),
    dcc.Graph(
        id='treemap-tactic',
        figure=Treemap_Tactic
    ),

#Pairplot
    html.H1("Pairplot of Tactics against Gender", style={'textAlign': 'center'}),
    # Embed the saved pairplot image in the dashboard
    html.Div([
     html.Img(src='/assets/pairplot_tactic.png', style={'width': '80%', 'margin': '0 auto', 'display': 'block'})
    ]),

#TimeSeries Knife Crime
    html.H1("Knife Incidents Time Series"),
    html.Img(src='/assets/time_series_knife.png', style={'width': '80%', 'height': 'auto'})
]),


# Run the app in Jupyter notebook
app.run_server(mode="inline", port = 8501)


## 3. Background to the Project

For the project I knew I wanted to create a dashboard for executives along a socioeconomic and/or demographic perspective in the UK.

For example, originally I was going to look into how benefits payments varied across region and similar demographic dimensions utilised, to generate insights into to the types of people requiring certain amounts of government money. In this way, the public executives again at the central and local level would be provided micro and macro insights and help inform their policy at large or specifically. In this way, good depictions “must consider potential users if they are to be effective” (Purchase, 2000) which was considered from the start of the Dashboard.

However, I located this very contemporary dataset on Policing on the Government website when conducting my original searches. I then selected it out of recency, breadth and high relevance to the new topic along with the provenance of being an authoritative source of data – data collection conducted by the ONS. I also decided to pursue this topic out of own personal interest - i.e. to what extent UK police used firearms and other methods and which regions exhibited higher levels of crime. This is detailed in Week 2 of my blog.

As such, the background to the project was deliberative but focussed and intentional in its generation of topic, aim, objectives and construal of information and insight displayed.

## 4. Articulation of Decision Making Process

Main focus of the project:

- Building on from the topic's exposition, my main focus was to provide micro and macro insights at various levels of depth across various strata according to importance, relevance and targetability to the executives. Details of this are found in my Week 1 blog post.

- My general focus throughout was on ensuring comprehensibility and the ability to translate and communicate large quantities of data into actionable insight without assistance from natural language. Considerations of this are detailed in my Week 3 blog post.

Specific decision making:

Data Source:
- I had to choose datasets according to provenance, accuracy, reliability and relevance.

- As such, I located a secondary Gov UK source as well: supporting the results of the original dataset and expanding on displayed information validity and breadth. This was to also provide a time-based aspect to my Dashboard. As Rosli and Caberra (2015) outline, “by consciously using Gestalt theory to motivate the design process”, the “spectator [is made] a more active participant” through reducing cognitive load and enhancing comprehensibility. 

1. Variables:
- Consequently, this meant I had to make various initial decisions with important implication.
- For example, the data source was particularly rich in terms of breadth and depth; there was an enjoyable amount of observations and variables considered taking the form of over a dozen worksheets in the Excel workbook.
- This meant I had to make the executive decision to both exclude and include various variables and data points: not least in terms of specificity to the project but also by definition in terms of what constitutes an effective dashboard and constituent visualisations.
- This decision was made in accordance with suitability of the data: quantity available for to be modelled and then considering which data points will provide the most insight for the executive audience. This is also detailed in my Week 4 Blog Post.

2. Manipulation:
- I had to create various Dataframes and subsects of the data according to the messages I wanted to convey.
- This meant cleaning the data, handling missing values, recasting variables and transforming the data in wide to long formats amongst other techniques.
- As such this was iterative and considered to the end of the insights I wanted to provide but also what messages the data intrinsically leant from it's values and intrinsic nature.

3. Visualisation:
- I knew my Dashboard was going to consist of 6 graphics. This meant I had to deliberate and decide as to the various mediums and displays appropriate for each story I wanted to tell; as well as taking into considerations precepts of design and psychology.
- As such, I decided on each graph's form as I analysed the data and not prescriptively. If I had decided beforehand, this would have meant some datapoints would be displayed in mediums that are not necessarily the best at telling their story. As Tufte and Graves-Morris (2014) outline, “graphical excellence” necessitates “a narrative quality: a story to tell about the data”. Reinforcingly, Knaflic (2015) corroborates this intentional construction in that visualisation “[is] typically the only part of the analytical process your audience ever sees”.
- As such, I knew I wanted a wide variety of displays with elements of interactivity and intuition.
-For example, the interactive bubble plot I think is particularly effective at telling an initial story of the distribution and skewness but then also encourages engagement and investigation through being able to be zoomed in.
- The line graph was also very effective at displaying how London has become a centre point for knife crime, and the area map reinforces this with the easy to comprehend colour scheme showed below.

Skills and libraries needed throughout:

1. Cleaning
- This took the form of the Pandas library for base commands; its features of Dataframes and general useful commands meant it was a great all-round library.

2. Manipulation
- Numpy has important numerical functions that meant it was vital to have loaded in.

3. Visualisation
- I really enjoyed working with Plotnine to create some really interesting graphs in a simple grammar of graphics.
However, Matplotlib, Seaborn, and Plotly were all loaded in for their own respective nuance and assistance in creating graphs; i.e. Plotly was compatible with Dash and had a simple syntax as well.
I had to work with Folium and GeoPandas in creating the map which required libraries including 'os', 'requests' and 'io' to read in the GeoJson file. Blog Post for Week 4 details the creation and solution with regards to the interactive map.


## 5. Review of Analytics Methods Chosen

- Analytics methods

1. Descriptive Analytics
- Descriptive Analytics took the form of gaining a strong statistical understanding of each data and dimension.
- This meant understanding the data I was working with; its form, shape, distribution, range, outlier values and so forth.
- These analytics were vital because it informed how I then manipulated the data with Pandas.
- Descriptive Analytics took the form of: Descriptive (Summary) Statistics, Box Plots and Bar Plots.

2. Dimensionality Reduction
- This took the form of reducing the amount of variables I was working with.
- Here I created smaller Dataframes based on each question I wanted to answer: i.e. was crime more skewed towards a certain age group or gender.
- This dimensionality reduction meant I was dealing with the right amount of data and to minimise confounding results.

3. Prescriptive Analytics
- It would have provided an interesting dimension to feature an element of prediction in the Dashboard. However, this was constrained by the data available and the need to communicate extant trends to the audience.

As such the following graphs were selected:
1. Bar Plot - Great for directly comparing two values of one variable or two variables. Here, plotting Male, Female and Other immediately displayed important insight: i.e. males are more likely to offend and the reporting of 'Other' is minimal.
2. Line Graph - Ideal for showing the trend along time horizons, this was suitable for displaying the relationship between regions' knife crime and clearly showed London as the most affected region.
3. Bubble Plot - Adding value to all graphs, clearly we see the highest offenders are youths. The nature of the graphic allows the executives to dig deeper into different dimensions; i.e. other age groups or types of force used. It also showed that the data source could potentially be improved by creating more age groups instead of one very large category of 18-34.
4. Interactive Geographic Map - This is shown below; despite extensive efforts I was not able to get the GeoJson layer of values to be added to the map. Nonetheless, it demonstrated that London was clearly the highest affected area of crime and would provide interesting insight if the dashboard became connected to real-time data. 
5. Tree Map - This display was very effective at reducing quite a broad variable with different implications into a singular intuitive graphic.
6. Pairplot - Whilst having multiple graphs within this, it importantly demonstrated the distribution of each value which underpinned all the data being displayed, as well as displaying new information of its own.
As Berinato (2023) portends, “to judge a chart’s value… you need to know more” than “whether you used the right chart type”. By this, the nature of these graphs were informed by the structure of the data, the story I wanted to tell, and considering the user’s needs, wants and cognition. 


## 6. Review of Available Tools

1. Python was the optimal tool for the task. It’s simplicity and power but crucially the ability to integrate and import the relevant external libraries were vital. As Toasa et al (2018) portend “no matter how great the technology”, the “dashboard’s success as a medium of communication is a product of design”. By this, Python was the best tool in facilitating the medium of communication in it’s adaptability, scale, useability and functionality.

2. Plotnine
- Nonetheless, Plotnine was surprisingly not compatible with Dash. This meant I had to convert the two graphs I made in Plotnine into Plotly graphs or images to be displayed.
- If I were to do the project again or be able to choose any tool, I'd want to code some in R or at least be able to use Plotnine freely.
3. Arc GIS
- Again this was not compatible with Dash. Whilst it was possible to load most of the interactive map in without it, I felt the standard GeoJson format and syntax was more customisable and better at displaying desired information than the format it took in the Dashboard. By this, Arc GIS is now what I will use for future geographical visualisation.
4. Stata
- I'd be very interested to try and integrate Stata into the process.
Stata would be suitable for regression, larger datasets and integrating multiple datasets. It would also bring more credence to the statistical side of Analytics.

5. R and GeoJson would be suitable because they'd complement existing libraries but boost comprehensibility and aesthetics with no downfall. In fact Plotnine is remarkably easy to use and modify. It can be hard to align all aspects when using GeoJson files especially region codes but when looking to conduct further specific analysis, for example now across a map of all London boroughs, this would be very useful.
However as with all tools, the dashboard’s “goal of informing while not distracting users” (Janes and Succi, 2013) means consideration along the lines of story, suitability and nature of the data are pertinent in tool selection.


## 7. Review of Chosen Datasets 

1. Police use of Force: 2018 Data.Gov.UK - Link 1
-This source was the most recent available from the best source to be used – the government. As such it proved authoritative with a great depth of variables but also breadth from multiple datasheets. By this it also contained the most relevant metrics relating to my goal of informing executives in the display of information in my Dashboard.


2. Crime in England and Wales: Police Force Area June 2024 ONS - Link 2
Similarly, this complementary secondary data source was also very recent and allowed my Dashboard to have a time series aspect. It was from a different branch of government but still proved as statistically reliable and suitable along my aims.
They were both ideal to work with: there were minimal missing values but also a large amount of data to work with which allowed me to be selective, specific and statistically significant in my results. There was a lot of transformation needed to extract the relevant data points and information with regards to my narrative, but these are not weaknesses of the datasets. Naturally, to be improved there could still have been more entries to increase analysis’ scope and explainability. 


## 8. Visualisation of Data with Accompanying Code
A key visualisation of the Dashboard is the coloured map shown below: it sets the scene of the story in displaying how London is the epicentre of crime, as well as it’s relative standing compared to neighbouring regions. The Dashboard is summarily concluded with a time series depiction of London’s markedly increased knife crime rate, posing a micro focus from an initial macro level insight.
Secondly, the interactive bubble plot is key in my Dashboard; it allows executives to zoom in as they wish: “overview first, zoom and filter, then details on demand” (Sedrakyan et al, 2019). 


In [2]:
#Full Map Visualisation 
import folium
from folium import *
import matplotlib.colors as mcolors
import geopandas as gpd
from io import *

#Reading in fourth datasheet for sixth graph
UK_Regions_Incidents = pd.read_excel(FullPoliceData, 13, skiprows = 3)

#Cleaning fourth dataset
UK_Regions_Incidents = UK_Regions_Incidents.drop(UK_Regions_Incidents.index[54:57])
UK_Regions_Incidents = UK_Regions_Incidents[~UK_Regions_Incidents['Police force'].str.contains('total', case=False, na=False)]

#Making smaller dataframe
region_totals = UK_Regions_Incidents.groupby('Region')['Total'].sum().reset_index()
region_totals['Region'] = region_totals['Region'].replace('Yorkshire and the Humber', 'Yorkshire and The Humber')

#importing geojson file for map
geojson_url = 'https://raw.githubusercontent.com/codeforgermany/click_that_hood/master/public/data/united-kingdom-regions.geojson'

#geojson code
response = requests.get(geojson_url)
geojson_data = response.text
gdf = gpd.read_file(StringIO(geojson_data))
geojson_data = response.json() 

#selecting regions for the map visual
regions_of_interest = ["East Midlands", "East of England", "London", "North East", "North West", "South East", "South West", "Wales", "West Midlands", "Yorkshire and The Humber"]
gdf_filtered = gdf[gdf['name'].isin(regions_of_interest)]

#Setting up geojson
gdf_filtered = gdf[gdf['name'].isin(region_totals['Region'])]
gdf_filtered = gdf_filtered.merge(region_totals, left_on='name', right_on='Region', how='left')

#values for map
min_total = gdf_filtered['Total'].min()
max_total = gdf_filtered['Total'].max()

#colour scale
norm = mcolors.Normalize(vmin=min_total, vmax=max_total)
cmap = plt.cm.ScalarMappable(norm=norm, cmap='RdYlGn_r')

#initialising map
m = folium.Map(location=[52.0, -1.5], zoom_start=6)


#map colour parameters
def style_function(feature):
    region = feature['properties']['name']
    total = gdf_filtered[gdf_filtered['name'] == region]['Total'].values[0]
    color = cmap.to_rgba(total)  # Get the color from the colormap
    return {
        'fillColor': mcolors.to_hex(color),  # Convert RGBA to Hex color
        'color': 'black',
        'weight': 1,
        'fillOpacity': 0.7
    }

#removing errorneous entries
gdf_filtered = gdf_filtered.drop(columns=['created_at', 'updated_at'], errors='ignore')

#placing region values onto map
folium.GeoJson(gdf_filtered, style_function=style_function).add_to(m)
m

In [3]:
fig_bubble

In [4]:
Treemap_Tactic

## 9. Reflective Evaluation

Good project progress made throughout. An iterative, intentional and considered approach was deployed from start to finish. As such, the process was informed from methodical searching, sourcing and manipulation. My outcomes were tailored: they considered audience, academia and suitability in constructing a comprehensive and considered Dashboard.

The main challenge was on-integration of Plotnine and GeoJson in Dash meaning my graphs had to be altered to become compatible. Initially setting up Dash was unfamiliar as well but not an obstacle.

For improvements, I will try and source even more data to consider a predictive element. I will also use Arc GIS to enable macro and micro level geographical visualisations. I am also keen to enhance interaction: whilst I had some, crucially “the interacti[ve] component involves the dialogue between the user and the system” and “is an essential part of infovis” (Yi et al, 2007) meaning further inclusion only ensures “sufficient assistance” to the user (Kirk, 2020).


## 10. Conclusion
Overall, this project was successful in displaying appropriate, large data in a tailored aesthetic and intuitive format. The outcome was a functioning Dashboard that would be useful in a professional environment and its decision makers. It provides a visual, interactive and perceptive dimension to information comprehension which emanates engagement and consideration. The project is significant in that it has provided a rewarding challenge and encouraged me to consider lots of different aspects to data analysis I hadn't previously. For example, I now appreciate the thought and design process of a real-time data representation and how powerful this can be - in terms of interfacing, comprehensibility and usefulness in information communication as opposed to stand-alone static graphics. In this way, despite alleged “little design guidance for dashboards” (Bach et al., 2022) I have produced a specific and considered Dashboard, hopefully proving to be effective, which would be refined through user feedback and further consideration of audience needs and the story of the data.

## References

Bach, B., Freeman, E., Abdul-Rahman, A., Turkay, C., Khan, S., Fan, Y., & Chen, M. (2022). Dashboard design patterns. IEEE Transactions on Visualization and Computer Graphics, 29(1), 342-352. https://doi.org/10.1109/TVCG.2022.3143372

Berinato, S. (2023). Good charts, updated and expanded: The HBR guide to making smarter, more persuasive data visualizations. Harvard Business Press.

Brath, R., & Peters, M. (2004). Dashboard design: Why design is important. DM Direct, 85, 1011285-1.

Cairo, A. (2012). The functional art: An introduction to information graphics and visualization. New Riders.

Elias, M., & Bezerianos, A. (2011). Exploration views: Understanding dashboard creation and customization for visualization novices. In Human-Computer Interaction–INTERACT 2011: 13th IFIP TC 13 International Conference (pp. 274-291). Springer Berlin Heidelberg.

Eckerson, W. W. (2010). Performance dashboards: measuring, monitoring, and managing your business. John Wiley & Sons.

Evergreen, S. D. (2019). Effective data visualization: The right chart for the right data. SAGE Publications.

Franklin, A., Gantela, S., Shifarraw, S., Johnson, T. R., Robinson, D. J., King, B. R., ... & Okafor, N. G. (2017). Dashboard visualizations: Supporting real-time throughput decision-making. Journal of Biomedical Informatics, 71, 211-221. https://doi.org/10.1016/j.jbi.2017.06.009

Gov UK (2018) https://www.data.gov.uk/dataset/44aba3b3-b9e4-4b0c-ad30-6ae2dcd9a9b9/police-use-of-force

Janes, A., Sillitti, A., & Succi, G. (2013). Effective dashboard design. Cutter IT Journal, 26(1), 17-24.

Keogh, A., Johnston, W., Ashton, M., Sett, N., Mullan, R., Donnelly, S., ... & Caulfield, B. (2020). “It’s not as simple as just looking at one chart”: A qualitative study exploring clinician’s opinions on various visualization strategies to represent longitudinal actigraphy data. Digital Biomarkers, 4(Suppl. 1), 87-99. https://doi.org/10.1159/000510670

Knaflic, C. N. (2015). Storytelling with data: A data visualization guide for business professionals. John Wiley & Sons.

Kirk, A. (2020). Data visualisation literacy–learning to see. Revista de Contabilidad y Dirección, 31, 37-48.

Manovich, L. (2011). What is visualisation?. Visual Studies, 26(1), 36-49. https://doi.org/10.1080/1472586X.2011.558741

ONS (2024)  https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/datasets/policeforceareadatatables

Park, Y., & Jo, I. H. (2015). Development of the learning analytics dashboard to support students' learning performance. Journal of Universal Computer Science, 21(1), 110-133.

Platts, K., & Tan, K. H. (2004). Strategy visualisation: Knowing, understanding, and formulating. Management Decision, 42(5), 667-676. https://doi.org/10.1108/00251740410539676

Purchase, H. C. (2000). Effective information visualisation: A study of graph drawing aesthetics and algorithms. Interacting with Computers, 13(2), 147-162. https://doi.org/10.1016/S0953-5438(00)00021-4

Rosli, M. H. W., & Cabrera, A. (2015). Gestalt principles in multimodal data representation. IEEE Computer Graphics and Applications, 35(2), 80-87. https://doi.org/10.1109/MCGA.2015.33

Sedrakyan, G., Mannens, E., & Verbert, K. (2019). Guiding the choice of learning dashboard visualizations: Linking dashboard design and data visualization concepts. Journal of Computer Languages, 50, 19-38. https://doi.org/10.1016/j.jcl.2019.01.001

Tufte, E. R. (1997). Visual explanations: Images and quantities, evidence and narrative. Graphics Press.

Tufte, E., & Graves-Morris, P. (2014). The visual display of quantitative information. In Diagrammatik-Reader. Grundlegende Texte aus Theorie und Geschichte (pp. 219-230). De Gruyter.

Toasa, R., Maximiano, M., Reis, C., & Guevara, D. (2018, June). Data visualization techniques for real-time information—A custom and dynamic dashboard for analyzing surveys' results. In 2018 13th Iberian Conference on Information Systems and Technologies (CISTI) (pp. 1-7). IEEE. https://doi.org/10.23919/CISTI.2018.8399294

Ware, C. (2021). Visual thinking for information design. Morgan Kaufmann.

Yi, J. S., Kang, Y. A., Stasko, J. T., & Jacko, J. A. (2007). Toward a deeper understanding of the role of information visualization. IEEE Transactions on Visualization and Computer Graphics, 13(6), 1224-1231. https://doi.org/10.1109/TVCG.2007.70512

Yoghourdjian, V., Archambault, D., Diehl, S., Dwyer, T., Klein, K., Purchase, H. C., & Wu, H. Y. (2018). Exploring the limits of complexity: A survey of empirical studies on graph visualization. Visual Informatics, 2(4), 264-282. https://doi.org/10.1016/j.visinf.2018.10.002