# Week 6 - exercises

## Part 1: Video Lectures and Reading
### Exercises: Explanatory data visualization. <font color=gray>It's OK to use LLMs here if you can</font>.
### What are the three key elements to keep in mind when you design an explanatory visualization?
1. Start with a question: What is it you want to communicate?
2. Allow exploration
3. Know your readers (design for yur audience)
### In the video I talk about (1) *overview first*,  (2) *zoom and filter*,  (3) *details on demand*. 
* ##### Go online and find a visualization that follows these principles (don't use one from the video).     
https://pudding.cool/2018/02/waveforms/
* ##### Explain how your video achieves (1)-(3). It might be useful to use screenshots to illustrate your explanation.
The **1) overview first** is achieved through the intro tekst explaining about the main concept: to explore sound waves.     

![](sinewave1.png)    

The **2) Zoom and Filter** is achieved through that it makes the user able to change the amplitude and frequency of the sine wave getting both a visual and audio demonstration of the effect on the wave.    

![](sinewave2.png)    
![](sinewave4.png)    

### Explain in your own words: How is explanatory data analysis different from exploratory data analysis?
Exploratory Data Analysis is the initial step in data analysis, where you're essentially getting to know the data. You're exploring it without having specific questions in mind, looking for patterns, outliers, or anomalies to understand its structure, components, and relationships.     
Explanatory Data Analysis, on the other hand, is more about communicating your findings from the data, often after hypotheses have been tested or models have been built. It's the process of using the data to tell a story, to explain the insights and findings you've discovered during or after the exploratory phase.    

## Part 2: Interactive visualizations with Bokeh

In [1]:
#%pip install bokeh

In [11]:
import pandas as pd
from bokeh.plotting import figure, show
from bokeh.io import output_notebook

In [28]:
df = pd.read_csv("../data/Police_Department_Incident_Reports__Historical_2003_to_May_2018_20240129.csv")
focuscrimes = {'WEAPON LAWS', 'PROSTITUTION', 'DRIVING UNDER THE INFLUENCE', 'ROBBERY', 'BURGLARY', 'ASSAULT', 'DRUNKENNESS', 'DRUG/NARCOTIC', 'TRESPASS', 'LARCENY/THEFT', 'VANDALISM', 'VEHICLE THEFT', 'STOLEN PROPERTY', 'DISORDERLY CONDUCT'}
focuscrimes = pd.Series(sorted(focuscrimes))

In [33]:
import pandas as pd

# Load your DataFrame here
# df = pd.read_csv('your_file.csv')

# Ensure the 'Date' column is in datetime format
df['Date'] = pd.to_datetime(df['Date'])

df = df[df['Category'].isin(focuscrimes)]

# Filter the DataFrame for the years 2010 to 2017
df = df[df['Date'].dt.year.between(2010, 2017)]

# Assuming 'Time' is a string like 'HH:MM', we convert it to a datetime format and then extract the hour
df['Hour'] = pd.to_datetime(df['Time'], format='%H:%M').dt.hour

# Group by 'Category' and 'Hour' and count the incidents
grouped = df.groupby(['Category', 'Hour']).size().reset_index(name='Count')

# Calculate the total number of incidents for each 'Category'
category_totals = grouped.groupby('Category')['Count'].transform('sum')

# Normalize the counts
grouped['Normalized'] = grouped['Count'] / category_totals

# Pivot the table to get Categories as columns and Hours as rows
pivot_table = grouped.pivot(index='Hour', columns='Category', values='Normalized')

pivot_table

Category,ASSAULT,BURGLARY,DISORDERLY CONDUCT,DRIVING UNDER THE INFLUENCE,DRUG/NARCOTIC,DRUNKENNESS,LARCENY/THEFT,PROSTITUTION,ROBBERY,STOLEN PROPERTY,TRESPASS,VANDALISM,VEHICLE THEFT,WEAPON LAWS
Hour,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
0,0.055468,0.040191,0.052282,0.121869,0.035064,0.080276,0.039479,0.129656,0.056201,0.044247,0.027969,0.054945,0.035913,0.054413
1,0.049745,0.027653,0.038354,0.114539,0.020654,0.077235,0.025431,0.095748,0.060538,0.033786,0.021191,0.038576,0.024113,0.039486
2,0.044837,0.031432,0.032569,0.098656,0.016746,0.07014,0.015607,0.060436,0.061111,0.029686,0.025391,0.035994,0.018234,0.032891
3,0.023267,0.032765,0.018642,0.047954,0.012489,0.027367,0.009971,0.036367,0.037957,0.023325,0.021382,0.026022,0.011841,0.022737
4,0.014025,0.029379,0.014999,0.01741,0.009279,0.014393,0.006543,0.019501,0.023943,0.020356,0.015559,0.017797,0.010011,0.016662
5,0.011857,0.025644,0.06021,0.01069,0.005284,0.005068,0.006631,0.01019,0.019642,0.016398,0.038851,0.01467,0.009991,0.007377
6,0.015573,0.022892,0.119777,0.012523,0.010982,0.009122,0.010058,0.008433,0.017849,0.017246,0.075601,0.017059,0.015274,0.011542
7,0.022212,0.032284,0.101564,0.008552,0.024759,0.020069,0.015137,0.005095,0.016559,0.024314,0.07436,0.021645,0.022757,0.021609
8,0.033324,0.048688,0.074566,0.008247,0.0324,0.017231,0.025859,0.004919,0.019749,0.02799,0.064147,0.031377,0.032932,0.024212
9,0.035732,0.044997,0.053139,0.013134,0.038252,0.021488,0.031942,0.002811,0.02233,0.031807,0.05651,0.029629,0.033159,0.031849
