# Applied Data Visualization – Homework 5
*https://www.dataviscourse.net/2023-applied/*


In this homework we will create charts using Vega-Altair. 



## Your Info and Submission Instructions

* *First name:* Kolton
* *Last name:* Hauck
* *Email:* kolton.hauck@utah.edu
* *UID:* u1019364



For your submission, please do the following things: 
* **rename the file to `HW5_lastname.pynb`**
* **include all files that you need to run the homework, including the data file provided** 
* **don't use absolute paths, but usea relative path to the same directory for referencing data**

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Need for this homework
import altair as alt

plt.style.use('default')
# This next line tells jupyter to render the images inline
%matplotlib inline
import matplotlib_inline
# This renders your figures as vector graphics AND gives you an option to download a PDF too
matplotlib_inline.backend_inline.set_matplotlib_formats('svg', 'pdf')

# Part 1: Avalanche Calendar

In this assignment, we will create an interactive visualization using Vega-Altair that has two linked views:
1. A calendar-like heatmap (day x month) of average number of avalanches by date, and
2. A bar chart of total avalanche count by year.

Chart requirements:
 - The bar chart should display the counts across all dates by default, but should be filtered to only the selected date (or dates) by clicking on the heatmap. 
 - When hovering over a date, a tooltip should appear with the date and the value (average number of avalanches on that date).

See the video below for an example of interaction:

![An example of output interactivity](calendar_example.gif)

Hints:
- Similar to HW 2, you will need to create a data set with *all valid dates*, not only those that appear in the data. We need to account for zeros!
- Highly recommend browsing the Vega-Altair example gallery for help: https://altair-viz.github.io/gallery/index.html

In [30]:
# Read in data
avy_df = pd.read_csv('./avalanches.csv')

# Convert dates to the correct format
avy_df['Date'] = pd.to_datetime(avy_df['Date'])

# Filter out 2009, it's incomplete
avy_df['Year'] = avy_df['Date'].dt.year.astype('Int64')
avy_df = avy_df[avy_df['Year']>2009]

#create day/month columns
avy_df.Date = pd.to_datetime(avy_df.Date)
avy_df['Day'] = avy_df.Date.dt.day
avy_df['Month']= avy_df.Date.dt.month_name()
avy_df['Year'] = avy_df.Date.dt.year

In [28]:
# Example DataFrame (replace this with your actual DataFrame)
data = {
    'date': ['2023-01-05', '2023-03-15', '2023-07-20', '2023-12-10', '2023-11-25', '2023-04-28', '2023-01-21', '2023-03-15', '2023-09-20', '2023-12-10', '2023-11-25', '2023-02-28'],
}
df = pd.DataFrame(data)

# Convert the 'date' column to datetime
df['date'] = pd.to_datetime(df['date'])

# Extract day and month into separate columns
df['day'] = df['date'].dt.day
df['month'] = df['date'].dt.month_name()

average_count = df.groupby(['day', 'month']).groups
average_count

{(5, 'January'): [0], (10, 'December'): [3, 9], (15, 'March'): [1, 7], (20, 'July'): [2], (20, 'September'): [8], (21, 'January'): [6], (25, 'November'): [4, 10], (28, 'April'): [5], (28, 'February'): [11]}

In [59]:
alt.data_transformers.enable("vegafusion")

# Create the heatmap using Altair
heatmap = alt.Chart(avy_df).mark_rect().encode(
    x='Day:O',
    y=alt.Y('month(Date):O', sort=['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']),
    #color=alt.Color('count():Q', scale=alt.Scale(scheme='blues'))
    color='mean(DailyCount):Q'
).transform_aggregate(
    DailyCount='count()',
    groupby=['Date']
).properties(
    width=400,
    height=300
).encode(
    tooltip=[
        alt.Tooltip('Date:T', title='Date', format='%b %e'),
        alt.Tooltip('mean(DailyCount):Q', title='Avg', format='.2f')
    ]
)

heatmap

ValueError: DataFusion error: Schema error: No field named "Day". Valid fields are "mean_DailyCount", _vf_order, "month_Date", "Date".
    Context[0]: Failed to get node value


alt.Chart(...)

In [31]:
alt.Chart(avy_df).mark_bar().encode(
    x='Year:O',
    y=alt.Y('count():Q', title='Count Avalanches')
)

# Part 2: Bubble Chart, Revisited

For this assignment we are recreating the Bubble Chart from Homework 3 but this time using exclusively Vega-Altair. 

A refresher on the requirements:
- Each `Discipline` bubble and label should be colored according to the `Sport` variable. You can pick your own colors, as long as they are discernable.
- Each bubble's size should depend on the number of gold medals awarded. (This can be calculated as the number of unique `Event`-`Gender` pairs in the data set.)
- There should be a label noting that 1940 and 1944 Olympic games were not held (due to World War II).

Plus additional requirement:
- When hovering over a bubble, a tooltip with all the underlying data should appear.

![A bubble grid chart of medals for winter olympics](bubble_chart.svg)

We are giving you the code necessary to prepare the data set, since you have done that already for HW3. This is primarily an exercise in precise formatting using Vega-Altair.

Hints:
- Notice that we converted the `Year` variable to a datetime. Think about how you can leverage that in your encoding.
- There is a variety of ways to create an annotation box. One way to do it is to create a "dummy" DataFrame that you use to `.encode()` your `.mark_rect()` and `.mark_text()`. Another could be to use the `alt.datum()` command (see https://altair-viz.github.io/user_guide/encodings/index.html#datum-and-value).
- Check out `.mark_rect()` properties here: https://altair-viz.github.io/user_guide/marks/rect.html
- Again recommending the Vega-Altair example gallery for help: https://altair-viz.github.io/gallery/index.html

In [60]:
medals_df = pd.read_csv('./winter.csv')

# Convert Year to a datetime
medals_df['Year'] = medals_df['Year'].apply(lambda x: pd.to_datetime(f"{x}-01-01"))

# Concatenate Gender & Event to get unique gender/event variable
medals_df['Gender_Event'] = medals_df['Gender'] + medals_df['Event']

# Count the number of unique events in every year-discipline
medals_df_grouped = (
    medals_df
    .groupby(['Year', 'City', 'Sport', 'Discipline', 'Country', 'Medal'])
    .agg(Count = ('Gender_Event', 'nunique'))
    .reset_index()
)

In [129]:
scatter = alt.Chart(medals_df).mark_point(filled=True).encode(
    x=alt.X('year(Year):O', axis=alt.Axis(title='', domain=False, labelAngle=-90)),
    y=alt.Y('Discipline:O', axis=alt.Axis(title='', domain=False, ticks=False), sort=alt.SortField(field='Sport')),
    color=alt.Color('Sport:O', scale=alt.Scale(scheme='rainbow'), legend=None),
    size=alt.Size('count(Gender_Event):Q', scale=alt.Scale(range=[25, 125]), legend=None)
)

dummy = pd.DataFrame({'x': [1940, 1944]})

annotation = alt.Chart(medals_df).mark_rect(fill='red', opacity=0.5).encode(
    x=alt.X('1948:O'),
    x2=alt.X2('1952:O'),
)

layered = alt.layer(annotation, scatter)
layered.configure_axisY(
    orient='right'
).configure_axisX(
    orient='top'
).configure_view(
    stroke='transparent'
)

# Bonus: Bubble Chart Interactivity

For bonus points, add any non-trivial interactivity to the Bubble Chart from Part 2. See our second Altair lecture for inspiration.

Options include brushing with another linked view, a widget that modifies chart's appearance, a widget that filters the underlying data, etc.

# Grading Scheme

* Part 1: 5 points
* Part 2: 5 points
* Bonus: 2 points