## Intro

The purpose of today's class is to explore data using **interactive visualizations**. Interactivity is a key part of modern dataviz. It's a way to allow users of your visualizations get their own feel for the data ... to create richer visualization, where people who use your work can expose more of the data by exploring.


In [1]:
import pandas as pd
pd.set_option('display.float_format', '{:.7f}'.format)
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import bokeh
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, FactorRange
from bokeh.io import show

## Part 1: Video Lectures and Reading

Starting this week, we'll be playing around with *explanatory data visualization*. Roughly speaking this means using data visualization to communicate your results to others. Thus, there are new things to think about. 

Until today we have worked with static data visualization. However, exploratory data analysis means to be able to explore the multi-faceted nature of data and *interactive dataviz* is a handy tool to do it! It allows to play with the data: Toggle the view. Zoom. Drag. Show more details. All those things. Those are a key part of modern data visualization. 

The video below provides context about these points.

We start with the video and then read a bit from a scientific article about types of explanatory dataviz. (*The video is from an old version of the class that used D3, so just ignore those parts.*).

[![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/yHKYMGwefso/0.jpg)](https://www.youtube.com/watch?v=yHKYMGwefso)

> *Exercises*: Explanatory data visualization

> * What are the three key elements to keep in mind when you design an explanatory visualization?
><font face="Comic sans MS" size="3" color="blue"> (1) Start with the question, what is that we cant to communicate. (2) allow exploration. (by making the visualization interective we can make our audians more engaged) (3) you should know your readers. (technical people or non technical managers) different prople have different level of understandings.   


> * In the video I talk about (1) *overview first*,  (2) *zoom and filter*,  (3) *details on demand*. 
>   - Go online and find a visualization that follows these principles (don't use one from the video). 
>   - Explain how your video achieves (1)-(3). It might be useful to use screenshots to illustrate your explanation.
> * Explain in your own words: How is explanatory data analysis different from exploratory data analysis?

## Part 2: Interactive visualizations with Bokeh



To really master interactive visualizations, you will need to work with JavaScript, especially [D3](https://d3js.org). Given the limited time available for this class, we can't squeeze that in. But luckily Python has some pretty good options for interactive visualizations. You can find a range of different options [here](https://mode.com/blog/python-interactive-plot-libraries/).

Today, we'll explore [`Bokeh`](https://docs.bokeh.org/en/latest/), which provides lots of nice interactive funtionalities to Python. To work with Bokeh, we first set up our system:

1. If you haven't installed it yet please do so. You can simply follow [these steps](https://docs.bokeh.org/en/latest/docs/first_steps/installation.html)

2. To include Bokeh in your notebooks you can follow the [Bokeh: Using with Jupyter](https://docs.bokeh.org/en/latest/docs/user_guide/output/jupyter.html#jupyter) guide. Come back to this one when you need it

3. We aim to give you a gentle start with Bokeh and I am going to include more example code than usual in the follwing.
   * **HINT 1**: If you're not an experienced Python user, I recommend going to the [official user's guide](https://docs.bokeh.org/en/latest/docs/user_guide.html#userguide) and working through it. Start by clicking "Introduction" in the linked page. That page has a glossary, a section on output methods, stuff on settings, and interfaces that you can scroll through. The next page *Basic Plotting* where the action is. Spend some time working through that.
   * **HINT 2**: And by "working through it", I mean copy, paste, and run the code in your own notebook. 

Ok. Let's get started. First a general announcement on the data.

> **Announcement**
> * During this entire lecture, as always, we are going to work with the SF Crime Data. 
> * We will use data for the **period 2010-2017***.


Now, to get you in the mood here's a little gif to illustrate what the goal of this exercise is:

![Movie](https://github.com/suneman/socialdata2023/blob/main/files/week8_1.gif?raw=true)

If the gif isn't displaying on your system, you can download it [here](https://github.com/suneman/socialdata2023/blob/main/files/week8_1.gif) and display locally.

> ***Exercise***: Recreate the results from **Week 2** as an interactive visualisation (shown in the gif). To complete the exercise, follow the steps below to create your own version of the dataviz.

### Data prep

A key step is to set up the data right. So for this one, we'll be pretty strict about the steps. The workflow is

1. Take the data for the period of 2010-2017 and group it by hour-of-the-day.
2. We would like to be able to easily compare how the distribution of crimes differ from each other, not absolute numbers, so we will work on *normalized data*:
    * To normalise data for within a crime category you simply to devide the count for each hour by the total number of this crime type. (To give a concrete example in the `ASSAULT` category, take the number of assault-counts in 1st hour you should devide by the total number of assaults, then you devide number of assaults in 2nd hour by the total number of assaults and so on)
    *  Your life will be easiest if you organize your dataframe as shown in [this helpful screenshot](https://github.com/suneman/socialdata2023/blob/main/files/W6_Part2_data.png).

In [2]:
#ake the data for the period of 2010-2017 and group it by hour-of-the-day.
data_link = 'D:\DTU\SP 2023\Social Data Viz\Data'
df = pd.read_pickle(data_link+'\ordered_Police_Department_Incident_Reports__Historical.pickle')

In [6]:
#df[(df['Category']=='BURGLARY') & (df['Hour']==1)].count

In [3]:
df_grouped = df.groupby(['Hour', 'Category']).size().unstack('Category')
df_grouped.head()

Category,ARSON,ASSAULT,BAD CHECKS,BRIBERY,BURGLARY,DISORDERLY CONDUCT,DRIVING UNDER THE INFLUENCE,DRUG/NARCOTIC,DRUNKENNESS,EMBEZZLEMENT,...,"SEX OFFENSES, NON FORCIBLE",STOLEN PROPERTY,SUICIDE,SUSPICIOUS OCC,TREA,TRESPASS,VANDALISM,VEHICLE THEFT,WARRANTS,WEAPON LAWS
Hour,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,249.0,8878.0,160.0,42.0,3386.0,517.0,705.0,3686.0,817.0,636.0,...,9.0,534.0,46.0,5008.0,,550.0,6352.0,4695.0,3959.0,1087.0
1,237.0,8074.0,3.0,26.0,2186.0,380.0,647.0,2242.0,774.0,12.0,...,,379.0,43.0,2147.0,,393.0,4451.0,3206.0,2696.0,815.0
2,259.0,7258.0,2.0,27.0,2358.0,341.0,642.0,1817.0,656.0,8.0,...,,317.0,30.0,1957.0,2.0,436.0,4210.0,2541.0,2245.0,712.0
3,255.0,3557.0,3.0,20.0,2412.0,216.0,286.0,1235.0,258.0,13.0,...,,246.0,23.0,1366.0,,359.0,2819.0,1572.0,1739.0,439.0
4,215.0,2216.0,3.0,10.0,2143.0,158.0,106.0,900.0,132.0,13.0,...,,222.0,15.0,931.0,,260.0,1879.0,1215.0,1325.0,308.0


In [5]:
df_normal = df_grouped.apply(lambda x: x / x.sum(), axis=0)
df_normal.head()

Category,ARSON,ASSAULT,BAD CHECKS,BRIBERY,BURGLARY,DISORDERLY CONDUCT,DRIVING UNDER THE INFLUENCE,DRUG/NARCOTIC,DRUNKENNESS,EMBEZZLEMENT,...,"SEX OFFENSES, NON FORCIBLE",STOLEN PROPERTY,SUICIDE,SUSPICIOUS OCC,TREA,TRESPASS,VANDALISM,VEHICLE THEFT,WARRANTS,WEAPON LAWS
Hour,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,0.0659079,0.0544135,0.1748634,0.0539153,0.0380573,0.0525513,0.1267986,0.0316797,0.0844357,0.2164001,...,0.2093023,0.0477084,0.0361351,0.0647095,,0.0294417,0.0567482,0.0376326,0.0403017,0.0532922
1,0.0627316,0.0494858,0.0032787,0.0333761,0.0245698,0.0386257,0.1163669,0.0192691,0.0799917,0.004083,...,,0.0338604,0.0337785,0.0277419,,0.0210374,0.0397649,0.0256975,0.0274447,0.0399569
2,0.0685548,0.0444845,0.0021858,0.0346598,0.026503,0.0346615,0.1154676,0.0156164,0.0677966,0.002722,...,,0.0283213,0.0235664,0.0252869,0.1428571,0.0233392,0.0376118,0.0203673,0.0228536,0.0349071
3,0.067496,0.021801,0.0032787,0.0256739,0.02711,0.0219557,0.0514388,0.0106143,0.0266639,0.0044233,...,,0.021978,0.0180676,0.0176504,,0.0192174,0.0251847,0.0126003,0.0177026,0.0215228
4,0.0569084,0.0135819,0.0032787,0.012837,0.0240865,0.0160602,0.0190647,0.0077351,0.013642,0.0044233,...,,0.0198338,0.0117832,0.0120297,,0.0139179,0.0167868,0.0097388,0.0134882,0.0151003


1. First, let's convert our **Pandas Dataframe** to **Bokeh ColumnDataSource**: 

In [7]:
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource, FactorRange, Grid, LinearAxis, Plot, VBar, HoverTool, Legend,LinearColorMapper
from bokeh.palettes import Category20
from bokeh.io import output_file, show


# Create a ColumnDataSource object from a pandas dataframe called df_normal
source = ColumnDataSource(df_normal)

# Create a list of strings representing hours in a day from 0 to 23
# and use it to create a FactorRange object that will be used for the x-axis range of the plot.
hours = [str(i) for i in range(0,24)]
factors = FactorRange(factors=hours)



# Create a set of strings representing the crime types that will be plotted.
focuscrimes = set(['WEAPON LAWS', 'PROSTITUTION', 'DRIVING UNDER THE INFLUENCE', 'ROBBERY', 'BURGLARY', 'ASSAULT', 'DRUNKENNESS', 
                    'DRUG/NARCOTIC', 'TRESPASS', 'LARCENY/THEFT', 'VANDALISM', 'VEHICLE THEFT', 'STOLEN PROPERTY', 'DISORDERLY CONDUCT'])

# Create an empty dictionary to store the VBars
bar = {}

# Create a figure object with various attributes and tools.
p = figure(title = "Hourly Crimes", x_axis_label = "hour", y_axis_label = "Distribution of crime ",
           x_range =factors,tools="pan,wheel_zoom,box_zoom,reset",toolbar_location="above" )

# Create a color palette based on the number of crime types being plotted
colors = Category20[len(focuscrimes)]

# Create an empty list to store the legend items
TOOLTIPS = [("Percentage", "@i")]


items = []
# Iterate through each crime type and create a corresponding VBar
for index,i in enumerate(focuscrimes):
    color = colors[index]
    # Create a VBar and add it to the figure
    bar[i] = p.vbar(x="Hour", top=i, source=source,bottom=0 ,muted_alpha=0.01, width=0.6,muted=True, line_color="black", fill_color=color,alpha=0.75)

    # Add the VBar to the legend items list
    items.append((i,[bar[i]]))

# Create a legend using the items list and add it to the figure
legend = Legend(items = items , location =(0,-15))
p.add_layout(legend,'right')

# Set the width of the plot and the click policy for the legend
p.plot_width = 600
p.plot_height = 500
p.legend.click_policy="mute" 

# Display the plot
show(p)
#Specefing our output file 
output_file("Focuscrime.html")

## Part 3: Narrative Dataviz

Let's finish up with some reading

*Reading*: [Narrative Visualization: Telling Stories with Data](http://vis.stanford.edu/files/2010-Narrative-InfoVis.pdf) by Edward Segel and Jeffrey Heer. We'll read section 1-3 today. (And the rest a bit later).

When you get to section 3 it's fun to open up the examples mentioned by the authors in a browser and explore them as you read the text. 

> *Exercise*: Answer a couple of questions about the paper.
> 
> * What is the *Oxford English Dictionary's* defintion of a narrative?
> * What is your favorite visualization among the examples in section 3? Explain why in a few words.