In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from scipy import stats
from scipy.stats import binom

In [2]:
import sportsdataverse.nba

One common explanation for the Lakers's underperformance is that as a result of the injuries plaguing the team, the Lakers were unable to reach their full potential. Lebron himself has been quoted as saying, "The reason we were not very good together is we weren’t on the damn floor together." In the quote, "we" refers to 3 players in particular: LeBron James, Anthony Davis, and Russell Westbrook. We will examine the relative performance of the lakers with and without key combinations of players to determine if certain player absences had a significant effect on the team's performance.

This time, since we need up to date individual game data from the 2022 season, we use data provided by sportsdataverse, a python package for working with sports data. The documentation is available at https://py.sportsdataverse.org/.

In [None]:
nba_df = sportsdataverse.nba.load_nba_player_boxscore(seasons=range(2022, 2023))

In [4]:
nba_df.tail()

Unnamed: 0,athlete_display_name,team_short_display_name,min,fg,fg3,ft,oreb,dreb,reb,ast,...,athlete_position_abbreviation,team_name,team_logo,team_id,team_abbreviation,team_color,game_id,season,season_type,game_date
27501,Stephen Curry,Warriors,40,10-27,6-17,3-4,0,7,7,5,...,PG,Warriors,https://a.espncdn.com/i/teamlogos/nba/500/gs.png,9,GS,003da5,401434775,2022,3,2022-05-14
27502,Klay Thompson,Warriors,42,11-22,8-14,0-0,1,7,8,2,...,SG,Warriors,https://a.espncdn.com/i/teamlogos/nba/500/gs.png,9,GS,003da5,401434775,2022,3,2022-05-14
27503,Nemanja Bjelica,Warriors,8,0-1,0-1,0-0,2,0,2,2,...,PF,Warriors,https://a.espncdn.com/i/teamlogos/nba/500/gs.png,9,GS,003da5,401434775,2022,3,2022-05-14
27504,Damion Lee,Warriors,11,1-2,1-2,0-0,2,0,2,0,...,SG,Warriors,https://a.espncdn.com/i/teamlogos/nba/500/gs.png,9,GS,003da5,401434775,2022,3,2022-05-14
27505,Jordan Poole,Warriors,24,4-15,2-11,2-3,0,3,3,2,...,SG,Warriors,https://a.espncdn.com/i/teamlogos/nba/500/gs.png,9,GS,003da5,401434775,2022,3,2022-05-14


We first load in NBA game data for the 2022 season into a pandas DataFrame. We drop the columns we won't be working with.

In [5]:
nba_df = nba_df[['athlete_display_name', 'min', 'team_name', 'pts', 'season', 'game_date', 'game_id']]
nba_df.tail()

Unnamed: 0,athlete_display_name,min,team_name,pts,season,game_date,game_id
27501,Stephen Curry,40,Warriors,29,2022,2022-05-14,401434775
27502,Klay Thompson,42,Warriors,30,2022,2022-05-14,401434775
27503,Nemanja Bjelica,8,Warriors,0,2022,2022-05-14,401434775
27504,Damion Lee,11,Warriors,3,2022,2022-05-14,401434775
27505,Jordan Poole,24,Warriors,12,2022,2022-05-14,401434775


We also need to do some data cleaning. By default the points and minutes columns are stored as strings. However, we need to work with points as integers and minutes as floats. Additionally, if a player on a team doesn't play in a game, then they don't show up in a row with that particular game_id. However, if a player actively plays minutes in a game, but doesn't score any points, then their pts are registered as '--'. We handle this as we iterate through the columns of points scored.

In [6]:
nba_df['pts'] = nba_df['pts'].apply(lambda x: 0 if x == '--' else int(x))
nba_df['min'] = nba_df['min'].apply(lambda x: 0 if x == '--' else float(x))

Since the data doesn't contain a column that records the winner of a game, we write a function that determines the winning team by comparing the total points scored by the teams in the game. We also implement helper functions that compute the win rate for a team in a specific set of games. This will be important later as we want to examine how the Lakers performed when only specific groups of players were playing.

In [7]:
# This function goes through selects the rows corresponding to a particular game_id.
# It then uses df.sum() to get the points scored by each team and then returns the winner.
def get_game_winner(game_id):
    nba_game = nba_df[nba_df['game_id'] == game_id]
    teams = nba_game['team_name'].unique()
    t0, t1 = teams[0], teams[1]
    if nba_game[nba_game['team_name'] == t0]['pts'].sum() > nba_game[
        nba_game['team_name'] == t1]['pts'].sum():
        return t0
    else:
        return t1

def get_number_of_games(team, df=nba_df):
    nba_team = df[df['team_name'] == team]
    return len(nba_team['game_id'].unique())

# This function gets the number of wins for a given team by calling get_game_winner on each game played by the team
# If the winning team matches the team in question, True is stored in the list. Summing of the list's True values
# gives the number of wins.
def get_number_of_wins(team, game_ids):
    return sum([get_game_winner(game_id) == team for game_id in game_ids])

# We compute the winrate of the team for a set of games. We must be careful to only include the team actually played in.
def compute_win_rate(team, game_ids):
    return get_number_of_wins(team, game_ids) / len(game_ids)

Our goal is to be able to consider the games where all members of a specific set of players are playing. The function get_game_ids_for_set_of_players() enables us to extract such games. Once we have these game_ids, we can use our previous functions to compute winrates.

In [8]:
# This function gets the games where all players in a list played in that game
def get_games_with_player(player_name, team_name, season):
    return nba_df[(nba_df['athlete_display_name'] == player_name) & (nba_df['season'] == season) & (nba_df['team_name'] == team_name)]

def get_game_ids_for_player(player_name, team_name, season):
    return nba_df[(nba_df['athlete_display_name'] == player_name) & (nba_df['season'] == season) & (nba_df['team_name'] == team_name)]['game_id'].unique()

# This function returns game_ids for games where all members of a set of players played in that game
# It does this by interatively calling get_game_ids_for_player on each player in the set and taking the intersection with the current result set.
def get_game_ids_for_set_of_players(player_names, team_name, season):
    shared_game_ids = set(get_game_ids_for_player(player_names[0], team_name, season))
    for p in player_names:
        shared_game_ids = shared_game_ids.intersection(set(get_game_ids_for_player(p, team_name, season)))
    return shared_game_ids

# This function allows us to get back the df for a set of games from the game_ids.
def get_games_with_game_id(game_ids):
    return nba_df[nba_df['game_id'].isin(game_ids)]

def get_games_with_set_of_players(player_names):
    return get_games_with_game_id(get_game_ids_for_set_of_players(player_names))

def get_game_ids_for_team(team_name, season):
    return nba_df[(nba_df['team_name'] == team_name) & (nba_df['season'] == season)]['game_id'].unique()

To start, let's look at the number of games that specific players played in.

In [9]:
print(len(get_game_ids_for_team("Lakers", 2022)))
print(len(get_game_ids_for_player("Russell Westbrook", "Lakers", 2022)))
print(len(get_game_ids_for_player("LeBron James", "Lakers", 2022)))
print(len(get_game_ids_for_player("Anthony Davis", "Lakers", 2022)))

82
78
56
40


We can see that out of the 82 total games that the Lakers played, Russell Westbrook played in 78 of them. On the other hand, LeBron only played in 56 games, and Davis played in just 40. Evidently, LeBron and Davis missed a substantial amount of games. But did this absense truly hurt the Lakers' and prevent them from winning? Particularly, we will want to examine the effect of Lebron and Davis playing together, as LeBron himself cited the absence of the duo as a reason for poor peformance. Since we are already examing Russell Westbrook separately, and considering the fact that he played in 78/82 games, we will not consider him here even if did miss 4 games due to injury or otherwise.

On thing we might be concerned with is that players contribute uneven amounts to the games they play in. In other words, maybe LeBron often only plays a couple of minutes in many games and is unable to fully contribute. By examining the minutes played in each game for Lebron and Davis we can compute their mean minutes played as well as the standard deviation. We can see that by and large, when they do play, LeBron and Davis are on the court for a subtantial amount of the game: roughly around 75% of the minutes out of the total 48.

In [10]:
lebron_minutes = np.asarray(nba_df[(nba_df['season'] == 2022) & (nba_df['athlete_display_name'] == "LeBron James")]['min'])
davis_minutes = np.asarray(nba_df[(nba_df['season'] == 2022) & (nba_df['athlete_display_name'] == "Anthony Davis")]['min'])
print(lebron_minutes.mean(), lebron_minutes.std())
print(davis_minutes.mean(), davis_minutes.std())


37.21052631578947 4.174848845274092
35.15 7.384273830242213


With our helper functions written and some exploratory analysis completed, we can begin to examine the winrates for the Lakers depending on who was playing. 

In [11]:
print("The Lakers\' overall win rate in the 2022 season was:", compute_win_rate('Lakers', get_game_ids_for_team('Lakers', 2022)))

with_lebron = get_game_ids_for_set_of_players(['LeBron James'], 'Lakers', 2022)
with_davis = get_game_ids_for_set_of_players(['Anthony Davis'], 'Lakers', 2022)
with_lebron_and_davis = get_game_ids_for_set_of_players(['LeBron James', 'Anthony Davis'], 'Lakers', 2022)

with_lebron_without_davis = with_lebron - with_lebron_and_davis
with_davis_without_lebron = with_davis - with_lebron_and_davis

print("The Lakers' winrate with LeBron James and Anthony Davis is: ", compute_win_rate('Lakers', with_lebron_and_davis))
print("The Lakers' winrate with LeBron James but without Anthony Davis is: ", compute_win_rate('Lakers', with_lebron_without_davis))
print("The Lakers' winrate with Anthony Davis but without LeBron James is: ", compute_win_rate('Lakers', with_davis_without_lebron))

The Lakers' overall win rate in the 2022 season was: 0.4024390243902439
The Lakers' winrate with LeBron James and Anthony Davis is:  0.5
The Lakers' winrate with LeBron James but without Anthony Davis is:  0.4117647058823529
The Lakers' winrate with Anthony Davis but without LeBron James is:  0.3333333333333333


We start by looking at how many games the Lakers won overall: they won 33 out of 82 games for a winrate of about 40%. We can see that when either only LeBron is playing, the Lakers' winrate is slightly above their overall winrate for the season. And while only Davis is playing, the team's winrate drops to only 1/3. On the other hand, when both are playing together, the winrate of the team shoots up to 50%. Thus, one surmise that if the LeBron and Davis were able to play together every game, the Lakers' might be able to acheive a 50% winrate for the season.

But we should ask ourselves, is this result statistically significant?

We first reweight the winrate of with LeBron without Davis and without LeBron with Davis to reflect the proportion of the games that they represent. We do this to combine the two winrates into a single winrate that reflects reflects both of their individual contribution to the team without eachother.

In [12]:
wr_lebron_no_davis = compute_win_rate('Lakers', with_lebron_without_davis)
wr_davis_no_lebron = compute_win_rate('Lakers', with_davis_without_lebron)

wr_lebron_no_davis *= (len(with_lebron_without_davis) / (len(with_lebron_without_davis) + len(with_davis_without_lebron)))
wr_davis_no_lebron *= (len(with_davis_without_lebron) / (len(with_lebron_without_davis) + len(with_davis_without_lebron)))

print(wr_lebron_no_davis + wr_davis_no_lebron)

0.3846153846153846


Doing so yields a winrate of about 38.5%. For our statistical model, we model the Lakers' season with a binomial distribution. You can read more about the binomial distribution here: http://www.stat.yale.edu/Courses/1997-98/101/binom.htm. As it applies here, we assume the Lakers' have a certain probability for winning each game, and the result of each game is independent from the result of all others. We take that probability to be the 38.5% we calculated, so we assume that team's performance (probability of victory for each game) is determined purely from the individual contribution of Davis and LeBron. Our question then is, what is the probability that the 50% win rate acheived when LeBron and Davis were playing together simply happend by chance? Or is it that the combination of the two players created a synergy that led to the strong performance?

Our Null Hypothesis can thus be summarized as follows: **The combination of LeBron and Davis playing yielded no improvement on the Laker's performance over their performance fmom LeBron and Davis's individual contributions.**

We should consider whether a binomial distribution is really appropriate to model the team's games. We can definitely treat each game as a bernoulii trial, with outcomes win and not win for the lakers. Each game is, in theory, not influeced by previous games, so we can treat the outcome of each game as independent events. Indeed, the idea of the "hot hand," where a string a successes implies a greater chance of future success is a fallacy when considering independent trials. However, some have argued that effect can exist in practice, as the psychological effects of the results of prior games/events can influence future performance. In other words, the events aren't truly independent. Whether the "hot hand" exists empiracally is still up for debate. However, here, we will assume that the outcomes of games are truly independent, allowing us to apply the binomial distribution.

You can look to the following articles for further discussion and analysis of the "hot hand" effect:

Gilovich, Thomas; Tversky, A.; Vallone, R. (1985). "The Hot Hand in Basketball: On the Misperception of Random Sequences". Cognitive Psychology. 17 (3): 295–314. doi:10.1016/0010-0285(85)90010-6. S2CID 317235.

https://marketing.wharton.upenn.edu/wp-content/uploads/2018/11/Paper-Joshua-Miller.pdf

Roney, Christopher J. R.; Trick, Lana M. (2009). "Roney, C. R., Trick, L. M. (2009)". Sympathetic Magic and Perceptions of Randomness: The Hot Hand Versus the Gambler's Fallacy.

We can use SciPy to help us with the computation.

In [14]:
n, p = 82, 0.3846153846153846 # 82 games in a season, 38.46%
print("P(Lakers win 41 or more games) = ", 1 - stats.binom.cdf(40, n, p))

P(Lakers win 41 or more games) =  0.02200429595851483


We see that if the Lakers' games are modelled with a binomial distribution, there is a < 5% probability that we see the Lakers win 41 or more games (>= 50% winrate). Therefore, we conclude that the combination of LeBron James and Anthony Davis has a stastically significant (positive) impact on the Lakers' performance. By extension, we agree that the fact that they were not able to consistently play together had a detrimental effect on the Lakers' 2022 season.

END HERE