In [1]:
import pandas as pd
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output, State

First, we need to import our dataset:

In [2]:
box_scores = pd.read_csv('clean_boxscores.csv', index_col=0)
box_scores

Unnamed: 0,GAME_NUMBER,GAME_ID,TEAM_ID,TEAM_ABBREVIATION,PLAYER_ID,PLAYER_NAME,MIN,FGM,FGA,FG_PCT,...,DREB,REB,AST,STL,BLK,TO,PF,PTS,PLUS_MINUS,TIME_PLAYED
0,1,22300061,1610612747,LAL,1627752,Taurean Prince,29:53,6,8,75.0,...,2,3,1,0,1,1,0,18,-14,29.883333
1,1,22300061,1610612747,LAL,2544,LeBron James,29:01,10,16,62.5,...,7,8,5,1,0,0,1,21,7,29.016667
2,1,22300061,1610612747,LAL,203076,Anthony Davis,34:09,6,17,35.3,...,7,8,4,0,2,2,3,17,-17,34.150000
3,1,22300061,1610612747,LAL,1630559,Austin Reaves,31:20,4,11,36.4,...,4,8,4,2,0,2,2,14,-14,31.333333
4,1,22300061,1610612747,LAL,1626156,D'Angelo Russell,36:11,4,12,33.3,...,4,4,7,1,0,3,3,11,1,36.183333
5,1,22300061,1610612747,LAL,1629060,Rui Hachimura,14:38,3,10,30.0,...,1,3,0,0,0,0,2,6,-8,14.633333
6,1,22300061,1610612747,LAL,1629216,Gabe Vincent,22:18,3,8,37.5,...,0,1,2,1,0,2,3,6,-17,22.300000
7,1,22300061,1610612747,LAL,1629637,Jaxson Hayes,6:54,0,0,0.0,...,1,1,0,0,0,0,1,0,-7,6.900000
8,1,22300061,1610612747,LAL,1629629,Cam Reddish,17:38,2,4,50.0,...,2,4,0,0,1,0,2,7,7,17.633333
9,1,22300061,1610612747,LAL,1626174,Christian Wood,15:28,3,4,75.0,...,3,4,0,0,0,1,1,7,2,15.466667


We should organize this dataset in a way that reflects that each game has two teams, and each team has multiple players. We can use pandas' groupby() method to achieve this. However, we need to pick a way to aggregate the data, as that is required to turn the groups into a tangible dataset. We can use the last() aggregator to return the last value in each group, which we can assume is the only value in the first place because our last column to group by is PLAYER_ID, which must be unique for each GAME_ID, the first column to group by.

NOTE: while we could drop the GAME_ID column and use the GAME_NUMBER column instead, it's good practice to maintain the integrity of the data we have, given that it was extracted using the NBA API, and if we wanted to combine this data with other data from that source, we would need to use GAME_ID to join data tables.

In [3]:
formatted_box_scores = box_scores.groupby(['GAME_NUMBER', 'GAME_ID', 'TEAM_ID', 'PLAYER_ID']).first()
formatted_box_scores

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,TEAM_ABBREVIATION,PLAYER_NAME,MIN,FGM,FGA,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,...,DREB,REB,AST,STL,BLK,TO,PF,PTS,PLUS_MINUS,TIME_PLAYED
GAME_NUMBER,GAME_ID,TEAM_ID,PLAYER_ID,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
1,22300061,1610612743,201599,DEN,DeAndre Jordan,0:00,0,0,0.0,0,0,0.0,0,...,0,0,0,0,0,0,0,0,0,0.000000
1,22300061,1610612743,202704,DEN,Reggie Jackson,24:04,3,8,37.5,2,5,40.0,0,...,3,3,1,1,0,2,0,8,11,24.066667
1,22300061,1610612743,203200,DEN,Justin Holiday,0:00,0,0,0.0,0,0,0.0,0,...,0,0,0,0,0,0,0,0,0,0.000000
1,22300061,1610612743,203484,DEN,Kentavious Caldwell-Pope,36:14,8,12,66.7,2,3,66.7,2,...,1,2,1,3,1,3,5,20,10,36.233333
1,22300061,1610612743,203932,DEN,Aaron Gordon,34:58,7,11,63.6,1,2,50.0,0,...,5,7,5,2,1,0,0,15,6,34.966667
1,22300061,1610612743,203999,DEN,Nikola Jokic,36:15,12,22,54.5,3,5,60.0,2,...,10,13,11,1,1,2,2,29,15,36.250000
1,22300061,1610612743,1627750,DEN,Jamal Murray,34:14,8,13,61.5,3,5,60.0,2,...,2,2,6,0,1,1,3,21,3,34.233333
1,22300061,1610612743,1629008,DEN,Michael Porter Jr.,30:07,5,13,38.5,2,9,22.2,0,...,10,12,2,2,0,0,1,12,12,30.116667
1,22300061,1610612743,1629618,DEN,Jalen Pickett,0:44,1,1,100.0,0,0,0.0,0,...,0,0,0,0,0,0,0,2,0,0.733333
1,22300061,1610612743,1630192,DEN,Zeke Nnaji,11:45,1,3,33.3,0,1,0.0,2,...,0,0,1,0,0,1,2,4,-3,11.750000


Now, we have formatted the dataset in a way that preserves chronology by uing the GAME_NUMBER column as an index, while also breaking down the data into a \[game, team, player\] hierarchy that should make it easier to navigate with our dashboard. Let's see the dat from the first game, for example:

In [4]:
formatted_box_scores.loc[1]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,TEAM_ABBREVIATION,PLAYER_NAME,MIN,FGM,FGA,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,...,DREB,REB,AST,STL,BLK,TO,PF,PTS,PLUS_MINUS,TIME_PLAYED
GAME_ID,TEAM_ID,PLAYER_ID,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
22300061,1610612743,201599,DEN,DeAndre Jordan,0:00,0,0,0.0,0,0,0.0,0,...,0,0,0,0,0,0,0,0,0,0.0
22300061,1610612743,202704,DEN,Reggie Jackson,24:04,3,8,37.5,2,5,40.0,0,...,3,3,1,1,0,2,0,8,11,24.066667
22300061,1610612743,203200,DEN,Justin Holiday,0:00,0,0,0.0,0,0,0.0,0,...,0,0,0,0,0,0,0,0,0,0.0
22300061,1610612743,203484,DEN,Kentavious Caldwell-Pope,36:14,8,12,66.7,2,3,66.7,2,...,1,2,1,3,1,3,5,20,10,36.233333
22300061,1610612743,203932,DEN,Aaron Gordon,34:58,7,11,63.6,1,2,50.0,0,...,5,7,5,2,1,0,0,15,6,34.966667
22300061,1610612743,203999,DEN,Nikola Jokic,36:15,12,22,54.5,3,5,60.0,2,...,10,13,11,1,1,2,2,29,15,36.25
22300061,1610612743,1627750,DEN,Jamal Murray,34:14,8,13,61.5,3,5,60.0,2,...,2,2,6,0,1,1,3,21,3,34.233333
22300061,1610612743,1629008,DEN,Michael Porter Jr.,30:07,5,13,38.5,2,9,22.2,0,...,10,12,2,2,0,0,1,12,12,30.116667
22300061,1610612743,1629618,DEN,Jalen Pickett,0:44,1,1,100.0,0,0,0.0,0,...,0,0,0,0,0,0,0,2,0,0.733333
22300061,1610612743,1630192,DEN,Zeke Nnaji,11:45,1,3,33.3,0,1,0.0,2,...,0,0,1,0,0,1,2,4,-3,11.75


You might notice that, because GAME_ID begins with 1, our DataFrame is effectively 1-indexed instead of 0-indexed. If we wanted to fix this, we could have applied -1 to box_scores\['GAME_NUMBER'\] before applying the groupyby() operation. However, this shouldn't be necessary.

Now that we have loaded the dataset, we can initialize our dashboard:

In [5]:
app = dash.Dash()

Next, we need to define the functionality of the dashboard. Ideally, we'd have the date for each game an allow the user to first select a date, then select a game from the list of games played on that day. Unfortunately, our dataset does not include game dates. Instead, we will need to use a single dropdown menu that contains each game's matchup (since which teams are playing gives us the most information when it comes to box scores). However, since two teams play each other multiple times in a season, we will also use the GAME_NUMBER column for clarity. After that, the user can select a collection of players from either team and compare their performance on one stat that they select.

Let's start with the dropdown menu that allows you to select a game: