# Effects of a Filtering on Point Differential Visualizations

## Introduction

Basketball has become a data-driven sport, with coaches and fans reviewing statistics before, during, and after games. One of the most useful statistics in basketball is the point differential, which is calculated with the following formula:

$$ Team - Opponent $$

Where Team represents the score of the team of interest and Opponent represents the score of the opposing team. If, for example, the team of interest scored 100 points and the opponent scored 93 points, then the game had a point differential of seven. If the team of interest scored 88 points and the opponent scored 90 points, then the game had a point differential of negative two (Note: basketball games cannot end with tie scores).

Visualizations displaying point differentials provide a wealth of information, like the outcome of game and margin of victory or defeat; however, they can be enhanced through the addition of other variables. One such variable is game location. The addition of game location might be of interest to coaches and fans since it can affect a teams' performance. Game location is organized into three categories: home, away, and neutral. For the purpose of this study, events occurring at the team of interest's arena were labeled as home games, events occurring at the opponent’s arena were labeled as away games, and events occurring at any other location were labeled as neutral site games. 

The goal of this project was gain user feedback about a filter for game location when using a point differential visualization. The following evaluation questions were developed to guide the project: 
- Is a filter for game location useful when analyzing point differential data?
- Will users utilize the filter when using the visualization tool? 

## Modules Used

In [1]:
import pandas as pd
import altair as alt
from tabulate import tabulate

## Data

The data used to create the visualization were retrieved from sports-reference.com and summarized the Colorado Buffaloes’ 2020 – 2021 basketball season. The data were downloaded as a CSV file, stored in a Github repository, then uploaded to Google Collaboratory using Pandas.

In [2]:
# URL from GitHub repository
url = 'https://raw.githubusercontent.com/CJTAYL/data/main/cu_20_21.csv?token=GHSAT0AAAAAAB4X6E3OE7K7SYWMRFCDYWAQY5I4SPQ'

# Object containing data
team = pd.read_csv(url, index_col=0)

## Visualization and Key Features

The visualization was a bar graph created using Altair. The key features of the visualization were (a) a y-axis that displayed positive and negative values, (b) bars that were color-coded by game location, (c) a filter for game location, and (d) a tool tip that displayed the opponent, point differential and game location. 

Point differentials were communicated using position (i.e., position on the y-axis) and game locations were communicated using color-coding. Specifically, positive values (i.e., bars above 0 on the y-axis) indicated margin of victory and negative values (i.e., bars below 0 on the y-axis) indicated the margin of defeat. 

Away games were represented by blue bars, home games were represented by orange bars, and neutral site games were represented by pink bars. 

The filter for game location was activated through the legend on the visualization. To activiate one of the three categories, the user clicked on one of the location categories. After activating the feature, the filtered category maintained its color and the non-filtered categories were shown with increased transparency.

The tool tip was activated by hovering over an individual bar graph with the mouse.

The visualization is displayed below. 

In [7]:
# Code for filtering feature
selection = alt.selection_point(fields = ["LOCATION"], bind = "legend")

# Code for visualization
alt.Chart(team).mark_bar().encode(
    x = alt.X("DATE:T", axis = alt.Axis(labelAngle=45)),
    y = "POINT_DIFF:Q",
    color = "LOCATION:N",
    opacity = alt.condition(selection, alt.value(1), alt.value(.1)),
    tooltip=["OPP", "POINT_DIFF", "LOCATION"]
).properties(
    title="CU Basketball 2020 - 2021",
    width=800,
    height=400
).add_params(
    selection
)

## Evaluation

Qualitative methods were used to answer the evaluation questions. Three participants were recruited from a pool of family members and friends of the tool’s creator. The respondents’ demographic information is provided in the table below. 

In [4]:
# Dictionary with respondent demographic information
info = { 'Respondent': [1, 2, 3],
            'Age': [39, 37, 36],
            'Sex': ['Female', 'Male', 'Male'],
            'Race': ['White', 'White', 'White']}

# Create dataframe with respondent information
respondent_df = pd.DataFrame(info)

# Create table using tabulate
print(tabulate(respondent_df, headers= 'keys', showindex= False, tablefmt= 'pretty'))


+------------+-----+--------+-------+
| Respondent | Age |  Sex   | Race  |
+------------+-----+--------+-------+
|     1      | 39  | Female | White |
|     2      | 37  |  Male  | White |
|     3      | 36  |  Male  | White |
+------------+-----+--------+-------+


The respondents were emailed written directions and hyperlinks to access the tool and a Google Forms document with the evaluation questions. The directions requested the respondents use the visualization for approximately 5-minutes, then complete a Google Form document that consisted of two of statements with 5-point Likert scale. The responses in the Likert scale ranged were 1: Strongly Disagree, 2: Disagree, 3: Neither Agree nor Disagree, 4: Agree, and 5: Strongly Agree. 

The statements were: 

1.	I find the filtering feature useful. 
2.	If available, I would use the filtering feature when reviewing point differential data. 

The results of the survey are displayed below:


In [5]:
# Aggregated data from respondents
data = [['1. Strongly Disagree', 0], ['2. Disagree', 0], ['3. Neither Agree nor Disagree' ,0], ['4. Agree', 0], ['5. Strongly Agree', 100]]

# Creating dataframe of respondent data from first statement
df = pd.DataFrame(data, columns=['Response', 'Percent'])

# Graphing the results of the first statement
p1 = alt.Chart(df).mark_bar().encode(
    y = alt.Y('Response', title = ''),
    x = alt.X('Percent', title = 'Percent of Respondents')
).properties(
    title = 'Statement 1: I find the filtering feature useful.',
    width = 600
)

# Creating dataframe of respondent data from second statement
df2 = pd.DataFrame(data, columns=['Response', 'Percent'])

# Graphing the results of the second statement
p2 = alt.Chart(df2).mark_bar().encode(
    y = alt.Y('Response', title = ''),
    x = alt.X('Percent', title = 'Percent of Respondents')
).properties(
    title = 'Statement 2: If available, I would use the filtering feature to analyze point differential data.',
    width = 600
)

# Plotting side-by-side graphs
p1 & p2


All respondents rated both statements as “Strongly Agree”.

## Conclusion

The results of the survey indicate the filter for game location was well-liked and would be used if available. Although the results of the evaluation were encouraging, they should be interpreted with caution due to two limiting factors. First, only three individuals used the tool. Second, the respondents had a personal relationship with the tool’s creator. Both factors may have created biases and limit the generality of the results. Future iterations of the project will address these limitations by including a larger, more diverse pool of independent respondents. If the results of the evaluation are replicated, an evaluation of the filter’s effects on users’ ability to interpret data quickly and accurately will be commissioned. 