# **SI649 W25 Altair Interaction Homework #4**

# Overview

We will create 4 visualizations about winners of the FIFA soccer World Cup.

The total points of this assignment is **10**:
* 2 points for each visualization (x 4)
* 2 points for embedding all the visualization in a single HTML file

For this lab, please write Altair code to answer the questions. It's fine if your visualization looks slightly different from the example (e.g., getting 1.1 instead of 1.0, use orange instead of red, have different titles, chart width/height,and mark size/opacity). The screenshots are taken in Visual Studio Code with dark mode turned on, so the color choices are based on this configuration. Please adapt colors for readability if you use another mode, e.g., light mode.

When you are finished, upload your .ipynb notebook and html file to Canvas

## Resources:
- Altair Interactive charts gallery: [https://altair-viz.github.io/gallery/index.html#interactive-charts](https://altair-viz.github.io/gallery/index.html#interactive-charts)

## General Hints: 
* We recommend that you finish all the static charts before adding interactions. 
* If you see duplicated axes, use `axis=None` to get rid of unnecessary axes.  
* `resolve_scale` ensures charts share axes and scales. 


In [99]:
# start with the setup

# supress warnings about future deprecations
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import pandas as pd
pd.options.mode.chained_assignment = None

import altair as alt
import numpy as np
import pprint
import datetime as dt
from vega_datasets import data
import matplotlib.pyplot as plt

# Solve a javascript error by explicitly setting the renderer
alt.renderers.enable('jupyterlab')


RendererRegistry.enable('jupyterlab')

In [100]:
#load data
df1 = pd.read_csv("https://raw.githubusercontent.com/jfjelstul/worldcup/refs/heads/master/data-csv/goals.csv")
df2 = pd.read_csv("https://raw.githubusercontent.com/jfjelstul/worldcup/refs/heads/master/data-csv/tournaments.csv")
df3 = pd.read_csv("https://raw.githubusercontent.com/jfjelstul/worldcup/refs/heads/master/data-csv/matches.csv")

#Hint: The fields are described here: https://github.com/jfjelstul/worldcup/blob/master/codebook/csv/variables.csv

In [101]:
#Create List of World Cup (WC) winners

#Only consider WCs since 1950 and simplify DataFrame by removing and renaming columns
tournament_ids = ["WC-" + str(x) for x in range(1950,2023,4)] 
df_wc_winners = df2[df2["tournament_id"].isin(tournament_ids)][["tournament_id", "year", "winner"]].replace(['West Germany'],'Germany').reset_index(drop=True)

wc_winners = list(set(df_wc_winners["winner"].to_list()))
wc_winners.sort()

wc_winners

['Argentina',
 'Brazil',
 'England',
 'France',
 'Germany',
 'Italy',
 'Spain',
 'Uruguay']

In [102]:
#Simplify DataFrame on matches at WC tournaments by removing and renaming columns
match_cols = ["tournament_id", "match_id", "home_team_name","away_team_name","home_team_score","away_team_score", "home_team_win", "away_team_win", "draw"]
df4 = df3[df3["tournament_id"].isin(tournament_ids)][match_cols].replace(['West Germany'],'Germany')

In [103]:
#Prepare data for Part 1

#Create DataFrame with goals WC winners scored at WC matches since 1950
df5 = df4[df4["away_team_name"].isin(wc_winners)].rename(columns={'away_team_name': 'team', 'away_team_score': 'score'}).drop(columns=["home_team_name", "home_team_score"])
df6 = df4[df4["home_team_name"].isin(wc_winners)].rename(columns={'home_team_name': 'team', 'home_team_score': 'score'}).drop(columns=["away_team_name", "away_team_score"])
df7 = pd.concat([df5,df6]).sort_values(by=['tournament_id','match_id']).drop(columns=["home_team_win", "away_team_win", "draw"])
df_goals_winners_per_match = df7.merge(df_wc_winners, on="tournament_id")

df_goals_winners_per_match.head()

Unnamed: 0,tournament_id,match_id,team,score,year,winner
0,WC-1950,M-1950-01,Brazil,4,1950,Uruguay
1,WC-1950,M-1950-03,England,2,1950,Uruguay
2,WC-1950,M-1950-04,Spain,3,1950,Uruguay
3,WC-1950,M-1950-05,Italy,2,1950,Uruguay
4,WC-1950,M-1950-06,Brazil,2,1950,Uruguay


In [104]:
#Prepare data for Part 2

#Simplify DataFrame on WC matches by removing and renaming columns
df8 = df4[df4["away_team_name"].isin(wc_winners) & df4["home_team_name"].isin(wc_winners)].reset_index(drop=True)

#For the two halfs of the heatmap, copy and add mirrored match data (i.e. away and home team are reversed)
df_wc_winner_matchups = pd.concat([df8,df8.rename(columns={'home_team_name': 'away_team_name', 'away_team_name': 'home_team_name', 
                                                                             'home_team_score': 'away_team_score', 'away_team_score': 'home_team_score', 
                                                                             'home_team_win': 'away_team_win', 'away_team_win': 'home_team_win'})])
#Add a year field
df_wc_winner_matchups['year'] = pd.to_numeric(df_wc_winner_matchups['tournament_id'].str.replace("WC-", ''), downcast='integer', errors='coerce')

df_wc_winner_matchups.head()

Unnamed: 0,tournament_id,match_id,home_team_name,away_team_name,home_team_score,away_team_score,home_team_win,away_team_win,draw,year
0,WC-1950,M-1950-13,Spain,England,1,0,1,0,0,1950
1,WC-1950,M-1950-18,Uruguay,Spain,2,2,0,0,1,1950
2,WC-1950,M-1950-19,Brazil,Spain,6,1,1,0,0,1950
3,WC-1950,M-1950-22,Uruguay,Brazil,2,1,1,0,0,1950
4,WC-1954,M-1954-20,Uruguay,England,4,2,1,0,0,1954


In [105]:
#Prepare data for Part 3 and 4

#Simplify DataFrame so it lists at which minute teams scored a goal (own goals are ignored in this exercise)
df_goals = df1[["goal_id", "tournament_id", "match_id", "team_name", "minute_label"]]
df_goals['minute'] = pd.to_numeric(df_goals['minute_label'].str.replace("'", ''), downcast='integer', errors='coerce')
df_goals = df_goals.drop(columns=["minute_label"])

df_goals.head()

Unnamed: 0,goal_id,tournament_id,match_id,team_name,minute
0,G-0001,WC-1930,M-1930-01,France,19.0
1,G-0002,WC-1930,M-1930-01,France,40.0
2,G-0003,WC-1930,M-1930-01,France,43.0
3,G-0004,WC-1930,M-1930-01,Mexico,70.0
4,G-0005,WC-1930,M-1930-01,France,87.0


# Visualization 1: Goals of World Cup Winners since 1950

We will replicate the following visualization: <br>
<img src="https://raw.githubusercontent.com/grill/SI649-hw-interaction/main/line.png?raw=true" alt="drawing" width="500"/>




**Description of the visualization (static):**
*   Use *df_goals_winners_per_match* for this exercise
*   This visualization has 3 components: **bar chart**, **vertical line**, and **average value text** 
*   All 3 components share the same x axis, which displays the *average* number of goals per match of teams that won the World Cup since the 1950s.
*   The bar chart has a low opacity because we want to add interactions (see the next cell). 

**Description of the visualization (interactivity):**
1. When hovering over bars, the associated average score will show up as tooltips. <br>
<img src="https://raw.githubusercontent.com/grill/SI649-hw-interaction/main/line_tooltip.gif?raw=true" alt="drawing" width="500"/>
2. Brushing over the bars will change the opacity of the bars.
3. Brushing over the bars will generate different average score value lines. <br>
<img src="https://raw.githubusercontent.com/grill/SI649-hw-interaction/main/line_hover.gif?raw=true" alt="drawing" width="500"/>

**Sample style settings (optional):**
Here's a list of default style settings we used to generate the graph.
* Original opacity for the bar chart: 0.6. 
* Brushed opacity: 1
* Bar height = 15
* Vertical line size = 3, color = "firebrick"
* Text color='firebrick', fontSize=12, align='left', dx=7
* after building the compound chart, use the following line to disable border : `.configure_view(strokeWidth=0)`

Hint:
* We recommend getting all static components working before writing any interactivity. 
* Add one interaction at a time and test whether or not it works. 
* To add an interaction that's not tooltip and zooming, you need four steps (review in-class demo). 
* Selection is used in two scenarios: 1) to add to a *condition*, which is used in `encode`. 2) to add in `transform_filter`. In this visualization, you will implement both. Think through which you will use where before trying to build this.

In [106]:
data_1 = df_goals_winners_per_match.copy()
data_1 = data_1[data_1['team'] == data_1['winner']]
data_1['record'] = data_1['year'].astype(str) + ' ' + data_1['winner']
data_1 = data_1.groupby(['record'])['score'].mean().reset_index()
data_1['score'] = data_1['score'].round(2)
data_1.head()

Unnamed: 0,record,score
0,1950 Uruguay,3.75
1,1954 Germany,4.17
2,1958 Brazil,2.67
3,1962 Brazil,2.33
4,1966 England,1.83


In [107]:
brush = alt.selection_interval(encodings=['y'])
opacity_condition = alt.condition(brush, alt.value(1), alt.value(0.6))
bar = alt.Chart(data_1).mark_bar(height=15, opacity=0.6).encode(
    x= alt.X('score:Q', axis=alt.Axis(title='')),
    y= alt.Y('record:N', axis=alt.Axis(title='')),
    tooltip = 'score:Q',
    opacity = opacity_condition
).add_params(
    brush
)
line = alt.Chart(data_1).mark_rule(color="firebrick",size=3).encode(
    x = alt.X('score:Q', aggregate='average', axis=alt.Axis(title=''))
).transform_filter(
    brush
)

text = alt.Chart(data_1).mark_text(color="firebrick",fontSize=12,align='left',dx=7).encode(
    x = alt.X('score:Q', aggregate='average', axis=alt.Axis(title='')),
    text = alt.Text('score:Q', aggregate='average',format=".2f")
).transform_filter(
    brush
)

c1 = (bar + line + text).properties(
    title = alt.TitleParams(
        "Goals of World Cup Winners since 1950", 
    subtitle="Average Number of Goals per Match of Winning Teams",
    ),
    width=400,
    height=400
)
c1.save('c1.html')
c1


<VegaLite 5 object>

If you see this message, it means the renderer has not been properly enabled
for the frontend that you are using. For more information, see
https://altair-viz.github.io/user_guide/display_frontends.html#troubleshooting


# Visualization 2: Matchups of World Cup winners

We will replicate the following visualization: <br>
<img src="https://raw.githubusercontent.com/grill/SI649-hw-interaction/main/heat.png?raw=true" alt="drawing" width="500"/>

**Description of the visualization (static):**
*   Use *df_wc_winner_matchups* for this exercise
*   This visualization has 2 components: **heatmap** and **text charts** 
*   Look in the example gallery for inspiration for how to build the 1st and 2nd component (especially in sections "Simple Charts", "Advanced Calculations" and "Case Studies"): https://altair-viz.github.io/gallery/

**Description of the visualization (interactivity):**
1. When brushing over colored boxes in the heatmap, the associated text will be filtered/updated. <br>
<img src="https://raw.githubusercontent.com/grill/SI649-hw-interaction/main/heat_interaction.gif?raw=true" alt="drawing" width="500"/>

**Sample style settings (optional):**
Here's a list of default style settings we used to generate the graph.
* The range of the color scale is set to ['red', "white", 'green'] and domainMid is set to 0. 
* The label angle on the x axis of the heatmap is set to 50.
* There is a spacing of 5 between the concatenated text charts. Also, don't forget to share the y axis.
* The x scale is of the text charts display the ⚽ emoji is set to: domain -1 - 8
* The text size of the ⚽ is set to 30 and the baseline of the right and left text chart are set to 'middle'.
* after building the compound chart, use the following line to disable border : `.configure_view(strokeWidth=0)`


In [108]:
soccer_ball = "⚽"
df = df_wc_winner_matchups.copy()
df["matchup"] = df.apply(lambda row: "-".join(sorted([row["away_team_name"], row["home_team_name"]])),axis=1)
df["match_label"] = df["year"].astype(str) + " (" + df["home_team_score"].astype(str) + " : " + df["away_team_score"].astype(str) + ")"
def get_team1_name(row):
    return row["matchup"].split("-")[0]
def get_team2_name(row):
    return row["matchup"].split("-")[1]
def get_team1_score(row):
    return row["home_team_score"] if row["home_team_name"] == row["matchup"].split("-")[0] else row["away_team_score"]
def get_team2_score(row):
    return row["home_team_score"] if row["home_team_name"] == row["matchup"].split("-")[1] else row["away_team_score"]

df["team1_name"] = df.apply(get_team1_name, axis=1)
df["team2_name"] = df.apply(get_team2_name, axis=1)
df["team1_score"] = df.apply(get_team1_score, axis=1)
df["team2_score"] = df.apply(get_team2_score, axis=1)

df["match_label2"] = (df["year"].astype(str)+ " ("+ df["team1_score"].astype(str)+ " : "+ df["team2_score"].astype(str)+ ")")
df_grouped = df.groupby(
    ["matchup", "home_team_name", "away_team_name"]
).agg(
    home_wins=("home_team_win", "sum"),
    away_wins=("away_team_win", "sum"),
    draws=("draw", "sum")
).reset_index()
df_grouped["win_diff"] = - df_grouped["home_wins"] + df_grouped["away_wins"]

selection = alt.selection_single(fields=["matchup"], on="mouseover", empty="all")
color = alt.Color("win_diff:Q",scale=alt.Scale(domainMid=0, range=["red", "white", "green"]),title="Win Ratio")

heatmap = alt.Chart(df_grouped).mark_rect().encode(
    x=alt.X("home_team_name:N", title="", sort="-y", axis=alt.Axis(labelAngle=50)),
    y=alt.Y("away_team_name:N", title="", sort="-x"),
    color=color,
    tooltip=["matchup"]
).add_params(selection).properties(width=400, height=400)

df = df.drop_duplicates(subset=["match_id"])

left_chart = alt.Chart(df).mark_text(align='left',baseline='middle',fontSize=30,text=soccer_ball,dx=-15).encode(
    y=alt.Y("match_label2:N", title="", axis=None),
    x=alt.X("team1_score:Q", scale=alt.Scale(domain=[8, -1]), title="score")
).transform_filter(selection).properties(width=150, height=300)

right_chart = alt.Chart(df).mark_text(align='right',baseline='middle',fontSize=30,text=soccer_ball,dx=15).encode(
    y=alt.Y("match_label2:N", title="", axis=alt.Axis(ticks=False,domain=False, labelAlign="right", labelPadding=20)),
    x=alt.X("team2_score:Q", scale=alt.Scale(domain=[-1, 8]), title="score")
).transform_filter(selection).properties(width=150, height=300)

bottom_chart = (
    left_chart | right_chart
).resolve_scale(y='shared').resolve_axis(y='shared')

df_title = df.drop_duplicates(subset=["matchup"]).copy()
df_title["title_text"] = df_title["matchup"].str.replace("-", " vs. ")

title_chart = alt.Chart(df_title).mark_text(fontSize=20,align='center',baseline='middle',color='white').encode(
    text=alt.Text("title_text:N")
).transform_filter(selection).properties(width=400, height=30)

bottom_combined = alt.vconcat(title_chart,bottom_chart)

final_chart = alt.vconcat(heatmap,bottom_combined).configure_view(strokeWidth=0)

c2 = final_chart.properties(title = alt.TitleParams("Matchups of World Cup Winners", ))
c2.save('c2.html')
c2



<VegaLite 5 object>

If you see this message, it means the renderer has not been properly enabled
for the frontend that you are using. For more information, see
https://altair-viz.github.io/user_guide/display_frontends.html#troubleshooting


# Visualization 3: Timing of World Cup Goals

We will replicate the following visualization: <br>
<img src="https://raw.githubusercontent.com/grill/SI649-hw-interaction/main/linegoal.png?raw=true" alt="drawing" width="500"/>

**Description of the visualization (static):**
*   Use *df_goals* for this exercise
*   This visualization has 4 components: **line chart**, **vertical line**, **points** and **texts** 

**Description of the visualization (interactive):**
1. Enable zooming and panning along the x-axis. (The gif below only displays the line chart.) <br>
<img src="https://raw.githubusercontent.com/grill/SI649-hw-interaction/main/linegoal_zoom.gif?raw=true" alt="drawing" width="500"/>
2. Display a vertical line that moves with the mouse. This will require you to add additional chart component(let's call it **vLine**). <br>
<img src="https://raw.githubusercontent.com/grill/SI649-hw-interaction/main/linegoal_moveline.gif?raw=true" alt="drawing" width="500"/>
3. Display the intersection of the **vLine** with the **line chart** as 1 circle (let's call this circle **intersection dot**). 
4. When hovering over this **intersection dot**, display *how many goals were scored this minute* in text label.   <br>
<img src="https://raw.githubusercontent.com/grill/SI649-hw-interaction/main/linegoal_points.gif?raw=true" alt="drawing" width="500"/>

**Sample style settings (optional):**
Here's a list of default style settings we used to generate the graph.

* line chart size = 2.5, 
* vLine: size=3, color="lightgray", initial opacity = 0 
* indicator dot: size=90
* text label: fontSize=14, align='left', dx=7
* x scale: domain 0 - 80

**Hint**


* We only want to enable zooming and panning along the x-axis.
*  There are multiple ways of implementing the **vLine**. Here is one of them: 
> 1) use mark_rule to generate a line for every single data point and set these line's opacity to be 0.

> 2) when mouse hovering over a line, display it by changing its opacity. 

*  The implementation of the **intersection dots** is similar to that of the **vLine**. Do you need a new selection/condition for the **intersection dots**?

In [109]:
zoom = alt.selection_interval(bind='scales', encodings=['x'])
nearest = alt.selection_single(empty='none', nearest=True, on='mousemove', fields=['minute'], clear='mouseout')
base = alt.Chart(df_goals).transform_aggregate(count='count()', groupby=['minute'])
line = base.mark_line(strokeWidth=2.5).encode(
    x=alt.X('minute:Q', title="Minute"),
    y=alt.Y('count:Q', title="Goals", scale=alt.Scale(domain=[0,80]))
).add_selection(zoom).properties(width=800, height=400)
vLine = base.mark_rule(color="lightgray", size=3).encode(
    x='minute:Q',
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_selection(nearest)
intersection_dot = base.mark_circle(size=90).encode(
    x='minute:Q',
    y='count:Q'
).transform_filter(nearest)
text_label = base.mark_text(align='left', dx=7, fontSize=14, color='white').encode(
    x='minute:Q',
    y='count:Q',
    text=alt.Text('count:Q')
).transform_filter(nearest)

c3 = (line+vLine+intersection_dot+text_label).properties(title = alt.TitleParams("Timing of World Cup Goals"))
c3.save('c3.html')
c3



<VegaLite 5 object>

If you see this message, it means the renderer has not been properly enabled
for the frontend that you are using. For more information, see
https://altair-viz.github.io/user_guide/display_frontends.html#troubleshooting


# Visualization 4: Zooming in on Goals of World Cup winners since 1950

We will replicate the following visualization: <br>
<img src="https://raw.githubusercontent.com/grill/SI649-hw-interaction/main/zoom.png?raw=true" alt="drawing" width="800"/>

**Description of the visualization (static):**
*   Use *df_goals* for this exercise
*   This visualization has 2 components: **scatter chart original** and **line chart zoomed in** 

**Description of the visualization (interactivity):**
1. Build drop down selections for the home and away team. Theoretically, two teams will be shown at any given time. <br>
<img src="https://raw.githubusercontent.com/grill/SI649-hw-interaction/main/zoom_select.gif?raw=true" alt="drawing" width="400"/>
2. Brushing over the scatter chart will change the color of the points. <br>
<img src="https://raw.githubusercontent.com/grill/SI649-hw-interaction/main/zoom_mouse_select.gif?raw=true" alt="drawing" width="400"/>
3. Brushing over the scatter chart will filter out the associated time interval to create a line chart. <br>
<img src="https://raw.githubusercontent.com/grill/SI649-hw-interaction/main/zoom_interaction.gif?raw=true" alt="drawing" width="800"/>


**Sample style settings (optional):**
Here's a list of default style settings we used to generate the graph.

* scatter chart size = 1,
* line chart size = 2.5 
* Original color="lightgray"
* y scale: domain 0 - 15

**Hint**

* You will have 2 types of selection. One for the team selection and one for the time selection. Ensure that these two interactions work independently before merging them together. 

In [110]:
team_names = sorted(df_goals['team_name'].unique())
home_team_param = alt.param(
    value='Brazil',
    name="home_team",
    bind=alt.binding_select(options=team_names, name="Select Home Team")
)
away_team_param = alt.param(
    value='Spain',
    name="away_team",
    bind=alt.binding_select(options=team_names, name="Select Away Team")
)

base = alt.Chart(df_goals).transform_filter(
    "(datum.team_name == home_team || datum.team_name == away_team) && datum.minute != null"
)
scatter_data = base.transform_aggregate(
    goals='count()',
    groupby=['team_name','minute']
)
brush = alt.selection_interval(encodings=['x', 'y'])
scatter = (
    scatter_data.mark_circle(size=40)
    .encode(
        x=alt.X('minute:Q', scale=alt.Scale(domain=[0,120]), title='Minute'),
        y=alt.Y('goals:Q', scale=alt.Scale(domain=[0,15]), title='Goals'),
        color=alt.condition(brush, 'team_name:N', alt.value('lightgray'))
    )
    .add_params(home_team_param, away_team_param, brush)
    .properties(width=400, height=150)
)

line = (
    scatter_data.transform_filter(brush)
    .mark_line(strokeWidth=2.5)
    .encode(
        x=alt.X('minute:Q', title='Minute'),
        y=alt.Y('goals:Q', scale=alt.Scale(domain=[0,15]), title='Goals'),
        color=alt.Color('team_name:N', legend=alt.Legend(title='Teams'))
    )
    .properties(width=400, height=150)
)
c4 = (alt.hconcat(scatter, line).configure_view(strokeWidth=0).properties(title='Zooming in on Goals of World Cup winners since 1950'))
c4.save('c4.html')
c4

<VegaLite 5 object>

If you see this message, it means the renderer has not been properly enabled
for the frontend that you are using. For more information, see
https://altair-viz.github.io/user_guide/display_frontends.html#troubleshooting


# Final part

Export all of your visualizations to HTML, then put them all into a single HTML file (as we covered in the lab this week).

Upload this .html file to canvas, along with this notebook