# Week 6

## Part 1: Video Lectures and Reading

> *Exercises*: Explanatory data visualization. 
> * What are the three key elements to keep in mind when you design an explanatory visualization?
> * In the video I talk about (1) *overview first*,  (2) *zoom and filter*,  (3) *details on demand*. 
>   - Go online and find a visualization that follows these principles (don't use one from the video). 
>   - Explain how your video achieves (1)-(3). It might be useful to use screenshots to illustrate your explanation.
> * Explain in your own words: How is explanatory data analysis different from exploratory data analysis?

## Part 2: Interactive visualizations with Bokeh

> **Announcement**
> * During this entire lecture, as always, we are going to work with the SF Crime Data. 
> * We will use data for the **period 2014-2024** (Jan 1st 2014 to Dec 31 2024).
> * We'll consider only the 10 focus crimes.

![Movie](https://raw.githubusercontent.com/suneman/socialdata2025/main/files/Week6_1.gif)


> ***Exercise***: Recreate a new version of the results from **Week 2** (with updated dates) as an interactive visualisation (shown in the gif). To complete the exercise, follow the steps below to create your own version of the dataviz.


### Data prep

A key step is to set up the data right. So for this one, we'll be pretty strict about the steps. The workflow is

1. Take the data for the period of 2014-2024 and group it by hour-of-the-day.
2. We would like to be able to easily compare how the distribution of crimes differ from each other, not absolute numbers, so we will work on *normalized data*:
    * To normalise data for within a crime category you simply to devide the count for each hour by the total number of this crime type. (To give a concrete example in the `DRUG/NARCOTIC` category, take the number of drug/narcotics-counts in 1st hour you should devide by the total number of drug/narcotic arrests, then you devide number of drug/narcotics-counts in 2nd hour by the total number of drug/narcotics arrests and so on)
    *  Your life will be easiest if you organize your dataframe as shown in [this helpful screenshot](https://github.com/suneman/socialdata2025/blob/main/files/W6_Part2_data.png).


In [9]:
import matplotlib.pyplot as plt
import pandas as pd
import os
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, FactorRange
from bokeh.io import output_notebook
from bokeh.palettes import Category10

# Loading the data 
data_path = os.path.abspath(os.path.join(os.pardir, "data"))
cleaned_data_path = os.path.join(data_path, "Police_Department_Incident_Reports_Complete.csv")
df = pd.read_csv(cleaned_data_path)

# Define focus crimes
focuscrimes = set([
    'WEAPON LAWS', 'PROSTITUTION', 'ROBBERY', 'BURGLARY', 'ASSAULT',
    'DRUG/NARCOTIC', 'LARCENY/THEFT', 'VANDALISM', 'VEHICLE THEFT', 'STOLEN PROPERTY'
])

# Filter data for focus crimes
df_focus = df[df['Category'].isin(focuscrimes)]

# Filter the data by year (2014 to 2024)
df_filtered = df_focus[(df_focus['Year'] >= 2014) & (df_focus['Year'] <= 2024)]

# Group by "Hour" and "Category", then pivot
counts_by_hour_cat = df_filtered.groupby(['Hour', 'Category']).size().unstack('Category', fill_value=0)

# Normalize each column (so each crime's column sums to 1)
normalized_by_crime = counts_by_hour_cat.div(counts_by_hour_cat.sum(axis=0), axis=1)

# Show the first few rows of the normalized table
print(normalized_by_crime.head())

Category   ASSAULT  BURGLARY  DRUG/NARCOTIC  LARCENY/THEFT  PROSTITUTION  \
Hour                                                                       
0         0.049173  0.054816       0.030810       0.043089      0.097418   
1         0.043368  0.036834       0.018128       0.023819      0.057143   
2         0.037827  0.047308       0.016143       0.015806      0.047160   
3         0.020558  0.056967       0.012731       0.012179      0.033391   
4         0.014968  0.056660       0.007842       0.009060      0.015835   

Category   ROBBERY  STOLEN PROPERTY  VANDALISM  VEHICLE THEFT  WEAPON LAWS  
Hour                                                                        
0         0.046937         0.048101   0.054332       0.040039     0.053924  
1         0.047697         0.027750   0.034215       0.022259     0.041757  
2         0.045445         0.027504   0.031800       0.018575     0.036115  
3         0.028045         0.022940   0.028029       0.015122     0.027156  
4    

In [28]:
# Reset the index so that "Hour" becomes a column, not the index
df_bokeh = normalized_by_crime.reset_index()

# Convert DataFrame to a ColumnDataSource for Bokeh
source = ColumnDataSource(df_bokeh)

# Convert each hour to string so Bokeh treats them as categorical factors.
hours_as_str = [str(h) for h in df_bokeh['Hour']]

# Create a figure with a FactorRange on the x-axis
p = figure(
    x_range=FactorRange(*hours_as_str),
    width=900,
    height=500,
    title="Normalized Crime Counts by Hour",
    toolbar_location='right'
)

# Identify which columns to plot (all except "Hour")
crime_columns = df_bokeh.columns.drop("Hour")
colors = Category10[len(crime_columns)]  # Choose a color palette

# Add vbars for each crime category.
# Set fill_alpha=1 (fully opaque when active) and muted_alpha=0.3 (semi-transparent when muted)
renderers = []
for i, crime in enumerate(crime_columns):
    r = p.vbar(
        x='Hour',        # Use the "Hour" column for x-axis
        top=crime,       # Use the crime column for the height
        width=3 / len(crime_columns),  # Adjust width for side-by-side bars
        source=source,
        legend_label=crime,
        color=colors[i % len(colors)],
        fill_alpha=0.5,
        muted_alpha=0.3, 
        alpha=0.6
    )
    renderers.append(r)

# Use the "mute" click policy so that clicking a legend item toggles the muted state
p.legend.click_policy = "hide"

# Format the axes
p.xaxis.axis_label = "Hour of the Day"
p.yaxis.axis_label = "Normalized Fraction"

# Display the plot in the notebook
output_notebook()
show(p)


## Part 3: Narrative Dataviz

> *Exercise*: Answer a couple of questions about the paper.
> 
> * What is the *Oxford English Dictionary's* defintion of a narrative?
> * What is your favorite visualization among the examples in section 3? Explain why in a few words.