# Plotly Simple Sankey Diagrams Example

Sankey diagrams are a special type of diagram that visualize the flow of something (originally used for heat flow in engines). Nowadays these kinds of diagrams are for instance used when depicting the flow of voters between parties when comparing one election to another one.

For more background reading on Sankey diagrams, see [here](https://en.wikipedia.org/wiki/Sankey_diagram).

The following code snippet is based on [this](https://plotly.com/python/sankey-diagram/) and [this](https://towardsdatascience.com/sankey-diagram-basics-with-pythons-plotly-7a13d557401a)

The data that is behind this, is the [NYPD complaints data](https://www.propublica.org/datastore/dataset/civilian-complaints-against-new-york-city-police-officers). I have stored a copy on my google drive for this notebook.

In [2]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

In [1]:
# Mount my google drive
from google.colab import drive
drive.mount('drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at drive


In [3]:
df = pd.read_csv("/content/drive/My Drive/CCRB-Complaint-Data_202007271729/allegations_202007271729.csv")

In [4]:
#Sankey diagrams need a source list, a target list, and a flow label. In order to build these, I need to summarize the data.
# More specifically, I want to see how the rank of police officers evolved from the time the complaint was filed, to the current rank

df_grouped = df[["rank_incident", "rank_now","unique_mos_id"]].groupby(["rank_incident", "rank_now"]).count()
df_grouped=df_grouped.reset_index()
df_grouped

Unnamed: 0,rank_incident,rank_now,unique_mos_id
0,Captain,Captain,83
1,Captain,Chiefs and other ranks,41
2,Captain,Deputy Inspector,28
3,Captain,Inspector,30
4,Chiefs and other ranks,Chiefs and other ranks,2
5,Deputy Inspector,Chiefs and other ranks,23
6,Deputy Inspector,Deputy Inspector,24
7,Deputy Inspector,Inspector,49
8,Detective,Captain,3
9,Detective,Chiefs and other ranks,25


In [6]:
# The source and target need to be provided in a numerical form via the following dictionary (rank in increasing order)
rank_dict = {"Police Officer" : 0, "Detective": 1, "Sergeant": 2, "Lieutenant": 3, "Captain": 4, "Deputy Inspector": 5, "Inspector": 6, "Chiefs and other ranks": 7}

In [8]:
df_grouped["rank_incident_code"] = df_grouped["rank_incident"].apply(lambda x: rank_dict[x])
df_grouped["rank_now_code"] = df_grouped["rank_now"].apply(lambda x: rank_dict[x])

In [10]:
source = df_grouped["rank_incident_code"].values.tolist()
target = df_grouped["rank_now_code"].values.tolist()
value = df_grouped["unique_mos_id"].values.tolist()
label = list(rank_dict.keys())

# data to dict, dict to sankey
link = dict(source = source, target = target, value = value)
node = dict(label = label, pad=50, thickness=5)
data = go.Sankey(link = link, node=node)
# plot
fig = go.Figure(data)
fig.show()

While this shows what we want to see, it's not quite in the form that I would like. The reason is that source and target share the same coding. If we want to disentangle this, need to shift the coding of the target in the following way:

In [12]:
df_grouped["rank_now_code"] = df_grouped["rank_now"].apply(lambda x: rank_dict[x]+8) #8 since this is the number of all ranks.

In [13]:
source = df_grouped["rank_incident_code"].values.tolist()
target = df_grouped["rank_now_code"].values.tolist()
value = df_grouped["unique_mos_id"].values.tolist()
label = list(rank_dict.keys())+list(rank_dict.keys()) #need to double the labels as well

# data to dict, dict to sankey
link = dict(source = source, target = target, value = value)
node = dict(label = label, pad=50, thickness=5)
data = go.Sankey(link = link, node=node)
# plot
fig = go.Figure(data)
fig.show()

That already looks quite nice! We see that almost no officer got a demotion (apart from a few Sergeants and Detectives).

In [25]:
# to do: color