# NBA Draft Analysis - Which NBA team selects the best players relative to the average player at their respective draft position?

## Index
1. [Introduction](#Introduction)
2. [Data](#Data)
3. [Player Analysis](#player_analysis)
4. [Team Analysis by Season](#team_season_analysis)
5. [Team Analysis](#team_analysis)
6. [Aggregated Team Analysis](#aggr_team_analysis)
4. [Conclusions, Limitations, and Further Research Ideas](#conclusions)

## Introduction
The NBA Draft is an annual event in the National Basketball Association (NBA) where teams select eligible players to join their rosters. It consists of two rounds since 1989, with teams choosing players based on a predetermined order, primarily determined by the previous season's standings. The draft serves as a way for teams to acquire new talent, including young prospects from college or international leagues, and provides an opportunity for emerging basketball stars to realize their dreams of playing in the NBA.

For example, in 2003, the famous basketball player LeBron James was the first overall pick by the Cleveland Cavaliers. His selection was the culmination of immense hype and anticipation, as he was considered a generational talent straight out of high school. The NBA Draft is a pivotal moment for the league's future, as it shapes the composition of teams and can have a significant impact on the sport's landscape.

A team’s choice during the draft can make or break its season, making it crucial to have effective tools for selecting the best players from the available pool. With this analysis, we aim to understand which team selects the best players relative to the average player at their respective draft position. To do so, we will get the historical data of the drafts for the NBA seasons from 1996 until 2023. This project focuses on NBA player and team statistics analysis and visualizations, offering interactive tools to explore and compare player and team data, and ultimately aiding in the assessment of draft choices, individual player achievements, and team success over different seasons.

## Data
Information about the [draft selections](../data/raw/nba_draft_data_bbref_raw.csv) was manually exported from the draft section of [Basketball Reference](https://www.basketball-reference.com/draft/). This data also contains advanced statistics for each player, aggregated to their careers. After combining the exported data for all draft years into one file, we ended up with 22 variables for each player drafted between 1996 and 2022. The draft year 2023 was excluded for this analysis because at the time this research was conducted, the 2023-24 NBA season had not started.

The [raw data](../data/raw/nba_stats_1996-2022_raw.csv) about further player performance metrics is accessed using the [NBA API](https://github.com/swar/nba_api/blob/master/docs/nba_api/stats/examples.md) and it contains a variety of advanced statistics of all official NBA players from year 1996 until 2023. The statistics include the player ID, his name, how many years he played, points scored and many more. The NBA API offers 265 variables, but we will use only 13 of them. For brevity's sake, in the data there are many abbreviations. A [reference table](../references/abbreviation_reference_table.md) comes in handy.

We need to import the following libraries to manage the data handling and visualizations.

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import ipywidgets as widgets
import plotly.express as px
from IPython.display import display, HTML, clear_output
from tabulate import tabulate
from jinja2 import Template

In a Data Frame called df, we load the [player career average](../data/interim/player_career_avg.csv) data. This data was obtained by combining the statistics from Basketball Reference which was already aggregated to career-level with the weighted average of the season-level statistics for each player. For more details, see the [season2career_stats.ipynb notebook](../src/data_processing/season2career_stats.ipynb).

In [2]:
# load in the data
df = pd.read_csv('../data/processed/player_career_avg.csv', index_col=0)

We make a copy of the Data Frame to filter for seasons to analyze. With the slider below you can adjust the range of draft years you want to take into account in the analysis.

In [3]:
# widget styling
slider_style = {'description_width': 'initial'}
slider_css = """
.widget-label { 
    min-width: fit-content !important; 
} 
"""

# Creating a widget box for applying the CSS
slider_style_widget = widgets.HTML(
    value="<style>" + slider_css + "</style>"
)

In [4]:
# copy the dataframe
df_seasons_filtered = df.copy()

# create list with all seasons between 1996 and 2023
seasons = range(1996, 2023)

# create a slider widget to select the seasons
season_slider = widgets.SelectionRangeSlider(
    options=seasons,
    index=(0, len(seasons)-1),
    description='Select Seasons to Analyze',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    layout=widgets.Layout(width='100%'),
)

# get the subset of the dataframe that matches the selected seasons
def get_seasons(season_slider):
    # declare global variable
    global df_seasons_filtered
    # get values from slider
    start_season = int(season_slider['new'][0])
    end_season = int(season_slider['new'][1])
    df_seasons_filtered = df[(df['Season'] >= start_season) & (df['Season'] <= end_season)]

# observe the slider widget to get the selected seasons
season_slider.observe(get_seasons, names='value')

# display the slider widget
display(slider_style_widget, season_slider)

HTML(value='<style>\n.widget-label { \n    min-width: fit-content !important; \n} \n</style>')

SelectionRangeSlider(continuous_update=False, description='Select Seasons to Analyze', index=(0, 26), layout=L…

**Select a penalty for unbounded stats**

Drafted players who never played a game in the NBA are perceived as bad draft picks but never accumulated any stats. Since excluding them from the analysis would worsen the relative performance of other players (who did play in the NBA) selected at that respective draft position, the empty stats should be replaced by a "bad stat", as never playing a game in the NBA objectively makes that player a bad selection. For total points, assists and rebounds, 0 is the absolute minimum and therefore suitable as a replacement stat. However, all other stats are unbounded and don't have a definite minimum value. Using the minimum value in our data would be greatly affected by outliers which is why we opted for a percentile that can be dynamically chosen using the slider below. We recommend using a percentile between 1% and 5%.

In [5]:
# set standard penalty percentile
penalty_percentile = 0.02

# create a float slider widget to select the penalty percentile
penalty_percentile_slider = widgets.FloatSlider(
    value=0.02,
    min=0,
    max=1.0,
    step=0.01,
    description='Penalty Percentile:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='.2f',
    layout=widgets.Layout(width='100%')
)

# set the penalty percentile
def set_penalty_percentile(penalty_percentile_slider):
    # declare global variable
    global penalty_percentile
    # get value from slider
    penalty_percentile = penalty_percentile_slider['new']

# observe the slider widget to get the penalty percentile
penalty_percentile_slider.observe(set_penalty_percentile, names='value')

# display the slider widget
display(slider_style_widget, penalty_percentile_slider)

HTML(value='<style>\n.widget-label { \n    min-width: fit-content !important; \n} \n</style>')

FloatSlider(value=0.02, continuous_update=False, description='Penalty Percentile:', layout=Layout(width='100%'…

> ### **IMPORTANT**
> Re-run all cells after this one if you changed the penalty percentile and/or the season range

In the next cell, your active selections for the season range and penalty percentile are shown. Additionally, a list called relevant_stats is created which are the player performance metrics that we base our analysis on. Finally, the stats of the above-mentioned drafted players who did not play in the NBA are replaced with the chosen penalty values.

In [6]:
# get min and max seasons from df_seasons_filtered
start_season = df_seasons_filtered['Season'].min()
end_season = df_seasons_filtered['Season'].max()

print("Your selected penalty percentile for unbounded stats is", penalty_percentile)
print("Your selected seasons are", start_season , "to", end_season)

relevant_stats = ['PTS', 'TRB', 'AST', 'WS', 'WS/48', 'BPM', 'VORP', 'PIE', 'OFF_RATING', 'DEF_RATING', 'NET_RATING']
na_fill_values = {'PTS': 0, 'TRB': 0, 'AST': 0, 'WS': df_seasons_filtered['WS'].quantile(penalty_percentile), 'WS/48': df_seasons_filtered['WS/48'].quantile(penalty_percentile), 'BPM': df_seasons_filtered['BPM'].quantile(penalty_percentile), 'VORP': df_seasons_filtered['VORP'].quantile(penalty_percentile), 'PIE': df_seasons_filtered['PIE'].quantile(penalty_percentile), 'OFF_RATING': df_seasons_filtered['OFF_RATING'].quantile(penalty_percentile), 'DEF_RATING': df_seasons_filtered['DEF_RATING'].quantile(1-penalty_percentile), 'NET_RATING': df_seasons_filtered['NET_RATING'].quantile(penalty_percentile)}

# fill the NaNs with the respective entry in the na_fill_values dict for each column
df_career_na_filled = df_seasons_filtered.fillna(value=na_fill_values)

Your selected penalty percentile for unbounded stats is 0.02
Your selected seasons are 1996 to 2022


Next, the average value for each player statistic is calculated per draft position. The averages are stored in the df_avg dataframe which is indexed by the draft position (1-60).

In [7]:
# group the df_career dataframe by 'Pk' and calculate the average for each relevant stat
df_avg = df_career_na_filled.groupby('Pk')[relevant_stats].mean(numeric_only=True)
df_avg = df_avg.reset_index()

# add 1 to every index
df_avg.index += 1

To compare the performance of a player to the average player at their draft position, we calculate the difference between the player's statistics to the previously calculated average values. For most statistics, a positive difference signifies that this player performed better than the average selection at their pick. An exception is DEF_RATING, for which lower values are better, meaning that a negative difference is desirable.

In [8]:
# for each relevant stat, create a new column in the df_career_na_filled dataframe with the difference between the player's stat and the average for that stat for their draft position
for stat in relevant_stats:
    df_career_na_filled[stat + '_diff'] = df_career_na_filled[stat] - df_career_na_filled['Pk'].map(df_avg[stat])

## Player Analysis
<a id='player_analysis'></a>
**Calculate ranks for all players above a minimum amount of games played**

This section calculates the ranking of the performance above/below the average player at that draft position for each statistic. Some of the statistics that are normalized per 48 minutes can be affected by players that only played very little but performed well in this limited time (e.g., a player only ever played two minutes at the end of a blowout game but scored four points). To avoid such players being ranked very highly, a minimum amount of games can be set using the slider below and all players that do not meet this requirement are excluded from the ranking. Be aware that these players are only excluded from the individual player analysis but not from the team analysis further below.

In [9]:
# set standard minimum number of games played
min_games = 82

# create an int slides widget to select the minimum number of games played
min_games_slider = widgets.IntSlider(
    value=82,
    min=0,
    max=500,
    step=1,
    description='Minimum Games Played:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='d',
    layout=widgets.Layout(width='100%')
)

# set the minimum number of games played
def set_min_games(min_games_slider):
    # declare global variable
    global min_games
    # get value from slider
    min_games = min_games_slider['new']

# observe the slider widget to get the minimum number of games played
min_games_slider.observe(set_min_games, names='value')

# display the slider widget
display(min_games_slider)

IntSlider(value=82, continuous_update=False, description='Minimum Games Played:', layout=Layout(width='100%'),…

> ### **IMPORTANT**
> Re-run all cells from here after changing the minimum games

Below, you can see the active minimum games requirement and the player ranks are calculated for each performance metric.

In [10]:
print("You chose", min_games, "as the minimum number of games played.")

# create a new column for each relevant stat with the rank of the player's stat for their draft position, only using player with min_games
for stat in relevant_stats:
    if stat == 'DEF_RATING': # lower DEF_RATING is better
        df_career_na_filled[stat + '_rank'] = df_career_na_filled[df_career_na_filled['G'] >= min_games][stat + '_diff'].rank(ascending=True, method='min')
    else:
        df_career_na_filled[stat + '_rank'] = df_career_na_filled[df_career_na_filled['G'] >= min_games][stat + '_diff'].rank(ascending=False, method='min')

You chose 82 as the minimum number of games played.


**Show Performance of Selected Player**

The visualization below allows interactively selecting a player. Basic information about the chosen player as well as each performance metric, the corresponding difference to the average selection at their draft pick and the player's rank are displayed.

In [11]:
# create a combobox widget with all 'Player' values
player_widget = widgets.Combobox(
    placeholder='Choose a Player',
    options=df_career_na_filled['Player'].unique().tolist(),
    description='Player:',
    ensure_option=True,
    disabled=False,
    layout=widgets.Layout(width='100%')
)

# Define the output area to display additional information
output_player_ranks = widgets.Output()

# Function to update the output area based on the selected player
def on_value_change_player_ranks(change):
    output_player_ranks.clear_output()
    selected_player = change['new']
    with output_player_ranks:
        # get player data
        player_data = df_career_na_filled[df_career_na_filled['Player'] == selected_player]

        # display player information
        display(HTML(f"<h3>Player: {player_data['Player'].values[0]}</h3>"))
        display(HTML(f"<p>Year: {player_data['Season'].values[0]} Pick: {player_data['Pk'].values[0]} - Drafted by: {player_data['Tm'].values[0]}</p>"))
        display(HTML(f"<p>Played {int(player_data['G'].values[0])} games in {int(player_data['Yrs'].values[0])} years</p>"))
        
        # create a table with the player's ranks, total value and diff for each relevant stat
        stats_table = [['Stat', 'Rank', 'Raw Stat', 'Difference to Average for Draft Position']]
        for stat in relevant_stats:
            stats_table.append([stat, int(player_data[stat + '_rank'].values[0]), round(player_data[stat].values[0], 2), round(player_data[stat + '_diff'].values[0], 2)])

        # transpose the table
        stats_table = list(map(list, zip(*stats_table)))
            
        display(HTML(tabulate(stats_table, tablefmt="html")))

# Observe changes in the value of the combobox and call the function
player_widget.observe(on_value_change_player_ranks, names='value')

# Display the widgets
display(player_widget)
display(output_player_ranks)

Combobox(value='', description='Player:', ensure_option=True, layout=Layout(width='100%'), options=('Allen Ive…

Output()

**Player Analysis: Performance Metric Differences vs Draft Pick**

In this scatter plot, the difference to the average selection at a given draft pick for a selected performance metric is plotted against the draft picks. If desired, a player can be searched for and is then highlighted in the visualization. Hovering over a dot shows additional information.

In [12]:
# Filter players based on the minimum games played
filtered_df = df_career_na_filled[df_career_na_filled['G'] >= min_games]

# create a combobox widget with all 'Player' values
player_widget_diff_scatter = widgets.Combobox(
    placeholder='Choose a Player',
    options=filtered_df['Player'].unique().tolist(),
    description='Player:',
    ensure_option=True,
    disabled=False
)

# create a dropdown widget with all relevant stats
stat_dropdown_diff_scatter = widgets.Dropdown(
    options=relevant_stats,
    description='Select Stat:',
    disabled=False,
)

hover_template_player = Template('<b>{{player_name}}</b><br><br>'
                          'Pick: {{pick}} ({{season}}) <br>' +
                          '{{selected_stat}} Rank: {{rank}} <br>' +
                          '{{selected_stat}} diff. to avg player at pick {{pick}}: {{diff}}<br>' +
                          'Career {{selected_stat}}: {{raw_stat}}<br><extra></extra>')

# Define the output area to display the scatter plot
output_diff_scatter = widgets.Output()

# Function to update the output area based on the selected stat
def on_value_change_diff_scatter(change):
    clear_output()
    output_diff_scatter.clear_output()
    selected_stat = stat_dropdown_diff_scatter.value
    selected_player = player_widget_diff_scatter.value

    display(player_widget_diff_scatter)
    display(stat_dropdown_diff_scatter)
    with output_diff_scatter:
        # scatter plot for all players
        fig = px.scatter(filtered_df, x='Pk', y=f'{selected_stat}_diff', hover_name='Player',
                         hover_data={'Pk': True, f'{selected_stat}_diff': ':.1f', 'Player': False, selected_stat: ':.1f', f'{selected_stat}_rank': True, 'Season': True})
        
        # red dot for selected player
        if selected_player in filtered_df['Player'].values:
            highlighted_player = filtered_df[filtered_df['Player'] == selected_player]
            fig.add_trace(px.scatter(highlighted_player, x='Pk', y=f'{selected_stat}_diff', hover_name='Player',
                                     hover_data={'Pk': True, f'{selected_stat}_diff': ':.1f', 'Player': False, selected_stat: ':.1f', f'{selected_stat}_rank': True, 'Season': True},
                                     color_discrete_sequence=['red']).data[0])
        
        # styling the figure
        fig.update_traces(hovertemplate=hover_template_player.render(player_name='%{hovertext}',
            pick='%{x}',
            season='%{customdata[3]}',
            selected_stat=selected_stat,
            diff='%{y:.2f}',
            rank='%{customdata[2]}',
            raw_stat='%{customdata[1]:.2f}'))
        fig.update_traces(marker=dict(size=12), showlegend=False)
        fig.update_layout(title=f'{selected_stat} Difference vs Draft Pick for Players with at least {min_games} Games Played',
                          xaxis_title='Draft Pick', yaxis_title=f'{selected_stat} Difference')
        fig.show()
    
# Observe changes in the value of the dropdown and call the function
stat_dropdown_diff_scatter.observe(on_value_change_diff_scatter, names='value')
player_widget_diff_scatter.observe(on_value_change_diff_scatter, names='value')

# Display the dropdown and the output area
display(player_widget_diff_scatter)
display(stat_dropdown_diff_scatter)
display(output_diff_scatter)

Combobox(value='', description='Player:', ensure_option=True, options=('Allen Iverson', 'Marcus Camby', 'Share…

Dropdown(description='Select Stat:', options=('PTS', 'TRB', 'AST', 'WS', 'WS/48', 'BPM', 'VORP', 'PIE', 'OFF_R…

Output()

**Player Analysis: Performance Metric vs Draft Pick**

Similarly to the visualization above, the chosen statistics of individual players are plotted against their draft position. However, instead of showing the differences, this plot uses the raw statistics. The yellow dots represent the average statistic for that draft pick.

In [13]:
# create a combobox widget with all 'Player' values
player_widget_total_scatter = widgets.Combobox(
    placeholder='Choose a Player',
    options=filtered_df['Player'].unique().tolist(),
    description='Player:',
    ensure_option=True,
    disabled=False
)

# create a dropdown widget with all relevant stats
stat_dropdown_total_scatter = widgets.Dropdown(
    options=relevant_stats,
    description='Select Stat:',
    disabled=False,
)

hover_template_avg_player = Template('<b>Pick: {{pick}}</b><br><br>'
                          'Average {{selected_stat}}: {{raw_stat}} <br><extra></extra>')

# Define the output area to display the scatter plot
output_total_scatter = widgets.Output()

# Function to update the output area based on the selected stat
def on_value_change_total_scatter(change):
    clear_output()
    output_total_scatter.clear_output()
    selected_stat = stat_dropdown_total_scatter.value
    selected_player = player_widget_total_scatter.value

    display(player_widget_total_scatter)
    display(stat_dropdown_total_scatter)
    with output_total_scatter:
        fig = px.scatter(filtered_df, x='Pk', y=selected_stat, hover_name='Player',
                         hover_data={'Pk': True, selected_stat: True, 'Player': False, f'{selected_stat}_diff': True, f'{selected_stat}_rank': True, 'Season': True})
        
        if selected_player in filtered_df['Player'].values:
            highlighted_player = filtered_df[filtered_df['Player'] == selected_player]
            fig.add_trace(px.scatter(highlighted_player, x='Pk', y=selected_stat, hover_name='Player',
                                    hover_data={'Pk': True, selected_stat: True, 'Player': False, f'{selected_stat}_diff': True, f'{selected_stat}_rank': True, 'Season': True},
                                    color_discrete_sequence=['red']).data[0])
        
        # styling the figure
        fig.update_traces(hovertemplate=hover_template_player.render(player_name='%{hovertext}',
            pick='%{x}',
            season='%{customdata[3]}',
            selected_stat=selected_stat,
            diff='%{customdata[1]:.2f}',
            rank='%{customdata[2]}',
            raw_stat='%{y:.2f}'))
        
        # TODO: style this differently
        # add a yellow dot for each average in the df_avg dataframe
        fig.add_trace(px.scatter(df_avg, x='Pk', y=selected_stat, hover_name='Pk',
                                 hover_data={selected_stat: True, 'Pk': True}, 
                                 color_discrete_sequence=['yellow']).data[0])
        
        # styling the figure only for the yellow dots
        fig.update_traces(hovertemplate=hover_template_avg_player.render(pick='%{hovertext}',
            selected_stat=selected_stat,
            raw_stat='%{y:.2f}'),
            selector=dict(marker_color='yellow'))
        
        fig.update_traces(marker=dict(size=12), showlegend=False)
        fig.update_layout(title=f'{selected_stat} vs Draft Pick for Players with at least {min_games} Games Played',
                          xaxis_title='Draft Pick', yaxis_title=selected_stat)
        fig.show()
    
# Observe changes in the value of the dropdown and call the function
stat_dropdown_total_scatter.observe(on_value_change_total_scatter, names='value')
player_widget_total_scatter.observe(on_value_change_total_scatter, names='value')

# Display the dropdown and the output area
display(player_widget_total_scatter)
display(stat_dropdown_total_scatter)
display(output_total_scatter)

Combobox(value='', description='Player:', ensure_option=True, options=('Allen Iverson', 'Marcus Camby', 'Share…

Dropdown(description='Select Stat:', options=('PTS', 'TRB', 'AST', 'WS', 'WS/48', 'BPM', 'VORP', 'PIE', 'OFF_R…

Output()

# Team Analysis by Season
<a id='team_season_analysis'></a>

In this section, the analysis is aggregated to a team level in order to answer the initial research question, which NBA team is best at drafting.

Since some teams have relocated and/or rebranded since 1996 they may have different abbreviation over time in the data. However, we want to conduct our analysis on a franchise level and therefore replace all outdated abbreviations with the most current one for each franchise.

In [14]:
# function to replace outdated abbreviations with the current ones
def clean_team_names(df):
    team_dict = {'CHH': 'CHO', 'CHA': 'CHO', 'NJN': 'BRK', 'NOH': 'NOP', 'NOK': 'NOP', 'SEA': 'OKC', 'VAN': 'MEM', 'WSB': 'WAS'}
    df['Tm'] = df['Tm'].replace(team_dict)
    return df

In this cell, we calculate the average performance metrics of all player selected by a team in the same draft year. This allows us to visualize the draft performance of each team over time. The blue dots in the scatter plot below represent the draft performance of a team in the respective season. Using the dropdown widget, a team can be selected and it's performance is shown in red. Optionally, the individual players that were picked by the selected team can also be visualized (yellow dots). 

An orange dot means that the yellow and red dot are exactly overlapping. In other words, the chosen team only selected one player in the respecitve draft year, resulting in the player's values and team average being equal.

In [15]:
# group the df_career_na_filled by team and season and calculate the average
df_team_avg_by_season = clean_team_names(df_career_na_filled).groupby(['Tm', 'Season']).mean(numeric_only=True)
df_team_avg_by_season = df_team_avg_by_season.reset_index()

In [16]:
hover_template_team_season = Template('<b>{{team_name}}</b><br><br>'
                          'Season: {{season}} <br>' +
                          '{{selected_stat}} diff to avg players: {{diff}}<br>' +
                          '{{selected_stat}}: {{stat}} <br><extra></extra>')

# change all relevant_stat ranks to 'Not Ranked' if the value is NaN in the df_career_na_filled dataframe
for stat in relevant_stats:
    df_career_na_filled.loc[df_career_na_filled[stat + '_rank'].isna(), stat + '_rank'] = 'Not Ranked'

# create a scatter plot with the average of each team for each season. The seasons should be the x axis and the selected relevant stat differences should be the y axis
def create_team_scatter(selected_stat, selected_team, show_players):
    fig = px.scatter(df_team_avg_by_season, x='Season', y=f'{selected_stat}_diff', hover_name='Tm',
                     hover_data={'Tm': True, selected_stat: True, f'{selected_stat}_diff': True})
    
    # add a red dot for all stats of the selected team
    fig.add_trace(px.scatter(df_team_avg_by_season[df_team_avg_by_season['Tm'] == selected_team], x='Season', y=f'{selected_stat}_diff', hover_name='Tm',
                                hover_data={'Tm': True, selected_stat: True, f'{selected_stat}_diff': True},
                                color_discrete_sequence=['red']).data[0])
    
    # styling the figure
    fig.update_traces(hovertemplate=hover_template_team_season.render(team_name='%{hovertext}',
        season='%{x}',
        diff='%{y:.2f}',
        stat='%{customdata[1]:.2f}',
        selected_stat=selected_stat))
    
    if show_players:
        # add a yellow dot for each player of the selected team
        fig.add_trace(px.scatter(df_career_na_filled[df_career_na_filled['Tm'] == selected_team], x='Season', y=f'{selected_stat}_diff', hover_name='Player',
                                    hover_data={'Player': True, selected_stat: True, f'{selected_stat}_diff': True, f'{selected_stat}_rank': True, 'Pk': True},
                                    color_discrete_sequence=['yellow']).data[0])
        # make yellow dots semi-transparent
        fig.update_traces(marker=dict(size=12, opacity=0.7), showlegend=False, selector=dict(marker_color='yellow'))
    
    # add hover info for the yellow dots
    fig.update_traces(hovertemplate=hover_template_player.render(player_name='%{hovertext}',
        pick='%{customdata[3]}',
        season='%{x}',
        selected_stat=selected_stat,
        diff='%{y:.2f}',
        rank='%{customdata[2]}',
        raw_stat='%{customdata[1]:.2f}'),
        selector=dict(marker_color='yellow'))

    
    fig.update_traces(marker=dict(size=12), showlegend=False)
    fig.update_layout(title=f'{selected_stat} Difference vs Season for Teams',
                      xaxis_title='Season', yaxis_title=f'{selected_stat} Difference')
    fig.show()

# create a dropdown widget with all relevant stats
stat_dropdown_team_scatter = widgets.Dropdown(
    options=relevant_stats,
    description='Select Stat:',
    disabled=False,
)

# create a dropdown widget with all teams
team_dropdown_team_scatter = widgets.Dropdown(
    options=df_team_avg_by_season['Tm'].unique().tolist(),
    description='Select Team:',
    disabled=False,
)

# create a checkbox widget to toggle between showing players or not
player_checkbox_team_scatter = widgets.Checkbox(
    value=True,
    description='Show Players selected by Team',
    disabled=False,
    indent=False
)

# Define the output area to display the scatter plot
output_team_scatter = widgets.Output()

# Observe changes in the value of the dropdown and call the function
def on_value_change_team_scatter(change):
    clear_output()
    output_team_scatter.clear_output()
    selected_stat = stat_dropdown_team_scatter.value
    selected_team = team_dropdown_team_scatter.value
    show_players = player_checkbox_team_scatter.value

    display(stat_dropdown_team_scatter)
    display(team_dropdown_team_scatter)
    display(player_checkbox_team_scatter)
    with output_team_scatter:
        create_team_scatter(selected_stat, selected_team, show_players)

stat_dropdown_team_scatter.observe(on_value_change_team_scatter, names='value')
team_dropdown_team_scatter.observe(on_value_change_team_scatter, names='value')
player_checkbox_team_scatter.observe(on_value_change_team_scatter, names='value')

# Display the dropdown and the output area
display(stat_dropdown_team_scatter)
display(team_dropdown_team_scatter)
display(player_checkbox_team_scatter)
display(output_team_scatter)

Dropdown(description='Select Stat:', options=('PTS', 'TRB', 'AST', 'WS', 'WS/48', 'BPM', 'VORP', 'PIE', 'OFF_R…

Dropdown(description='Select Team:', options=('ATL', 'BOS', 'BRK', 'CHI', 'CHO', 'CLE', 'DAL', 'DEN', 'DET', '…

Checkbox(value=True, description='Show Players selected by Team', indent=False)

Output()

# Team Analysis
<a id='team_analysis'></a>

In this section, each team's draft performance is aggregated over the entire selected time period. In the scatter plot below, the average differences for the selected statistic are plotted against the average draft position of each team.

Be aware that the minimum games restriction does not apply to this section.

In [17]:
df_team_avg = clean_team_names(df_career_na_filled).groupby('Tm').mean(numeric_only=True)
df_team_avg = df_team_avg.reset_index()

In [18]:
# define hover template for team vs avg pick scatter plot
hover_template_team = Template('<b>{{team_name}}</b><br><br>' +
                            'Average Pick: {{pick}} <br>' +
                            '{{selected_stat}} diff to avg players: {{diff}}<br>' +
                            '{{selected_stat}}: {{stat}} <br><extra></extra>')

# create a scatter plot with the difference of the selected stat for each team. The y axis should be the selected stat and the x axis should be the average Pk
def teams_vs_avg_pick_scatter(selected_stat):
    fig = px.scatter(df_team_avg, x='Pk', y=f'{selected_stat}_diff', hover_name='Tm',
                     hover_data={'Tm': True, selected_stat: True, f'{selected_stat}_diff': True, 'Pk': True})
    
    # styling the figure
    fig.update_traces(hovertemplate=hover_template_team.render(team_name='%{hovertext}',
        pick='%{x}',
        diff='%{y:.2f}',
        stat='%{customdata[1]:.2f}',
        selected_stat=selected_stat))
    
    fig.update_traces(marker=dict(size=12), showlegend=False)
    fig.update_layout(title=f'{selected_stat} Difference vs Average Draft Position for Teams',
                      xaxis_title='Average Draft Position', yaxis_title=f'{selected_stat} Difference')
    fig.show()

# create a dropdown widget with all relevant stats
stat_dropdown_avg_position_scatter = widgets.Dropdown(
    options=relevant_stats,
    description='Select Stat:',
    disabled=False,
)

# Define the output area to display the scatter plot
output_avg_position_scatter = widgets.Output()

# Observe changes in the value of the dropdown and call the function
def on_value_change_avg_position_scatter(change):
    clear_output()
    output_avg_position_scatter.clear_output()
    selected_stat = stat_dropdown_avg_position_scatter.value

    display(stat_dropdown_avg_position_scatter)
    with output_avg_position_scatter:
        teams_vs_avg_pick_scatter(selected_stat)

stat_dropdown_avg_position_scatter.observe(on_value_change_avg_position_scatter, names='value')

# Display the dropdown and the output area
display(stat_dropdown_avg_position_scatter)
display(output_avg_position_scatter)

Dropdown(description='Select Stat:', options=('PTS', 'TRB', 'AST', 'WS', 'WS/48', 'BPM', 'VORP', 'PIE', 'OFF_R…

Output()

The table below shows the team with the highest difference for each statistic.

In [19]:
# create a best_team_by_stat dict with all stats as keys and the best team for that stat as value
best_team_by_stat = {}
for stat in relevant_stats:
    best_team_by_stat[stat] = df_team_avg[df_team_avg[f'{stat}_diff'] == df_team_avg[f'{stat}_diff'].max()][['Tm', f'{stat}_diff']].values[0][0]

# transform the dict into a dataframe
df_best_team_by_stat = pd.DataFrame.from_dict(best_team_by_stat, orient='index', columns=['Team'])

# display a title for the table
display(HTML("<h3>Best Team by Stat</h3>"))
# display the dataframe as a table
display(HTML(tabulate(df_best_team_by_stat, tablefmt="html")))

0,1
PTS,SAS
TRB,TOR
AST,SAS
WS,SAS
WS/48,DEN
BPM,DEN
VORP,SAS
PIE,DEN
OFF_RATING,DEN
DEF_RATING,SAS


# Aggregated Team Analysis
<a id='aggr_team_analysis'></a>

In this section, we attempt to aggregate all statistics by creating a weighted average rank for each team. The weights for each statistic can be set using the sliders below.

In [20]:
# create a dict to assign a weight to each stat
stat_weights = {'PTS': 1, 'TRB': 1, 'AST': 1, 'WS': 1, 'WS/48': 1, 'BPM': 1, 'VORP': 1, 'PIE': 1, 'OFF_RATING': 1, 'DEF_RATING': 1, 'NET_RATING': 1}

# for each stat, create a slider widget to select the weight
stat_weight_sliders = {}
for stat in stat_weights:
    stat_weight_sliders[stat] = widgets.FloatSlider(
        value=1,
        min=0,
        max=1.0,
        step=0.1,
        description=f'{stat} Weight:',
        disabled=False,
        continuous_update=False,
        orientation='horizontal',
        readout=True,
        readout_format='.2f',
        style=slider_style,
        layout=widgets.Layout(width='600px')
    )

# set the weight for each stat
def set_stat_weight(stat_weight_slider):
    # get value from slider
    stat_weights[stat_weight_slider['owner'].description.split()[0]] = stat_weight_slider['new']

# observe the slider widget to get the weight for each stat
for stat in stat_weight_sliders:
    stat_weight_sliders[stat].observe(set_stat_weight, names='value')

# display the slider widget
for stat in stat_weight_sliders:
    display(stat_weight_sliders[stat])

FloatSlider(value=1.0, continuous_update=False, description='PTS Weight:', layout=Layout(width='600px'), max=1…

FloatSlider(value=1.0, continuous_update=False, description='TRB Weight:', layout=Layout(width='600px'), max=1…

FloatSlider(value=1.0, continuous_update=False, description='AST Weight:', layout=Layout(width='600px'), max=1…

FloatSlider(value=1.0, continuous_update=False, description='WS Weight:', layout=Layout(width='600px'), max=1.…

FloatSlider(value=1.0, continuous_update=False, description='WS/48 Weight:', layout=Layout(width='600px'), max…

FloatSlider(value=1.0, continuous_update=False, description='BPM Weight:', layout=Layout(width='600px'), max=1…

FloatSlider(value=1.0, continuous_update=False, description='VORP Weight:', layout=Layout(width='600px'), max=…

FloatSlider(value=1.0, continuous_update=False, description='PIE Weight:', layout=Layout(width='600px'), max=1…

FloatSlider(value=1.0, continuous_update=False, description='OFF_RATING Weight:', layout=Layout(width='600px')…

FloatSlider(value=1.0, continuous_update=False, description='DEF_RATING Weight:', layout=Layout(width='600px')…

FloatSlider(value=1.0, continuous_update=False, description='NET_RATING Weight:', layout=Layout(width='600px')…

In [21]:
# calculate the team rank for each stat from the df_team_avg dataframe
for stat in relevant_stats:
    if stat == 'DEF_RATING': # lower DEF_RATING is better
        df_team_avg[stat + '_rank'] = df_team_avg[stat + '_diff'].rank(ascending=True, method='min')
    else:
        df_team_avg[stat + '_rank'] = df_team_avg[stat + '_diff'].rank(ascending=False, method='min')

# calculate the average rank for each team by weighting each stat rank with the respective weight
df_team_avg['Weighted Average Rank'] = 0
for stat in relevant_stats:
    df_team_avg['Weighted Average Rank'] += df_team_avg[stat + '_rank'] * stat_weights[stat]

df_team_avg['Weighted Average Rank'] /= len(relevant_stats)

# sort the df_team_avg dataframe by the weighted average rank
df_team_avg = df_team_avg.sort_values('Weighted Average Rank', ascending=True)
# reset the index
df_team_avg = df_team_avg.reset_index(drop=True)
# add 1 to every index
df_team_avg.index += 1
# rename the index column to 'Rank'
df_team_avg.index.names = ['Rank']
# rename the 'Tm' column to 'Team'
df_team_avg = df_team_avg.rename(columns={'Tm': 'Team'})

# round the weighted average rank to 2 decimals
df_team_avg['Weighted Average Rank'] = df_team_avg['Weighted Average Rank'].round(2)

# print the df_team_avg dataframe 'Tm' and 'weighted_avg_rank' columns
df_team_avg[['Team', 'Weighted Average Rank']]

Unnamed: 0_level_0,Team,Weighted Average Rank
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1
1,CLE,6.0
2,UTA,8.45
3,DEN,8.55
4,MIL,8.55
5,TOR,9.09
6,PHI,9.18
7,LAL,10.0
8,BOS,10.36
9,SAS,10.55
10,IND,11.45


# Conclusion, Limitations, and Further Research Ideas
<a id='conclusions'></a>

Using the standard parameters (time range: 1996-2022, penalty percentile: 0.02, equal-weighted statistics for aggregation), the best NBA team at drafting are the **Cleveland Cavaliers**!

Even when changing the weights for the aggregation drastically, the top teams remain relatively constant. A savy basketball fan can quickly discover that the high ranking of some teams can be linked to individual player selections that greatly outperformed their peers at their respective draft positions (e.g., CLE: LeBron James, DEN: Nikola Jokic). We attribute this to the comparatively short time period available for analysis which leads us to our first possible avenue for further research. By finding the used statistics for a longer time period than available through the NBA API or using a different set of statistics that are available for extended time periods, the analysis could be more robust and less sensitive to individual players.

A limitation of many performance metrics we used is that they tend to increase the longer a player's career is. For example, points, rebounds, and assists are strictly increasing over time and a hypothetical mediocre player who has already finished his 10-year career will likely have better statistics than a recently drafted superstar who has only been in the NBA for a couple of seasons but would be perceived as the better draft selection by most people. We thought of several options to address this limitation but eventually decided that each option also introduces new drawbacks. Normalizing all stats to a per-game basis might disadvantage players who had long careers and would be negatively affected by lower statistics in their mid-/late-thirties (e.g.,, Dirk Nowitzki). Alternatively, one could decide to only use a subset of the available statistics in order to exclude strictly increasing metrics such as the ones mentioned above. However, this increases the reliance on individual advanced statistics. Since there is no one true measure of player performance, the combination of multiple statistics is a crucial part of the analysis which would suffer if certain metrics were to be excluded. Future researchers could attempt to find other solutions to this issue.

Further research ideas:
- Investigate relationship between team success and draft performance
- Conduct ablation studies to investigate the impact of different weighting and/or fully excluding certain metrics
- Break down the analysis on a positional basis (e.g., "Which team is best at drafting guards?")
- Extend the time period
- Add and/or change the performance metrics