# Compiling draft data

It'd be useful to have a dataframe with the format `PLAYER_ID, DRAFT_POSITION` giving where they were drafted (and something like NAN, which we can handle later, if they were never drafted).

There is a Kaggle dataset here (https://www.kaggle.com/datasets/mattop/nba-draft-basketball-player-data-19892021) which has data from Basketball Reference through the 2021 draft.

In this notebook we'll load that data and combine it with data from Basketball Reference (which I just exported as CSV on the website and copy-pasted into CSV files) for 2022, 2023.  Because the 2024 season is ongoing, we won't be able to use it for this project.

In [1]:
import numpy as np
import pandas as pd

In [2]:
kaggle_df = pd.read_csv("nbaplayersdraft.csv")
df_2022 = pd.read_csv("draft_2022.csv")
df_2023 = pd.read_csv("draft_2023.csv")

In [3]:
print(kaggle_df.columns)
print(df_2022.columns)

Index(['id', 'year', 'rank', 'overall_pick', 'team', 'player', 'college',
       'years_active', 'games', 'minutes_played', 'points', 'total_rebounds',
       'assists', 'field_goal_percentage', '3_point_percentage',
       'free_throw_percentage', 'average_minutes_played', 'points_per_game',
       'average_total_rebounds', 'average_assists', 'win_shares',
       'win_shares_per_48_minutes', 'box_plus_minus',
       'value_over_replacement'],
      dtype='object')
Index(['Rk', 'Pk', 'Tm', 'Player', 'College', 'Yrs', 'G', 'MP', 'PTS', 'TRB',
       'AST', 'FG%', '3P%', 'FT%', 'MP.1', 'PTS.1', 'TRB.1', 'AST.1', 'WS',
       'WS/48', 'BPM', 'VORP'],
      dtype='object')


In [13]:
#let's keep just "overall_pick", "player" from the Kaggle data
k_df = kaggle_df[["player", "overall_pick"]].copy()

In [14]:
df_22 = df_2022[["Player", "Pk"]].copy()
df_22.rename(columns={"Player":"player", "Pk":"overall_pick"}, inplace=True)

In [15]:
df_23 = df_2023[["Player", "Pk"]].copy()
df_23.rename(columns={"Player":"player", "Pk":"overall_pick"}, inplace=True)

In [16]:
total_df = pd.concat([k_df, df_22, df_23])

In [17]:
total_df

Unnamed: 0,player,overall_pick
0,Pervis Ellison,1.0
1,Danny Ferry,2.0
2,Sean Elliott,3.0
3,Glen Rice,4.0
4,J.R. Reid,5.0
...,...,...
55,Tarik Biberovic,56.0
56,Trayce Jackson-Davis,57.0
57,Chris Livingston,58.0
58,,


# Get `nba_api` player IDs

In [81]:
from nba_api.stats.static import players

In [102]:
def find_nba_id(x):
    matches = players.find_players_by_full_name(x.player)

    if len(matches) == 0:
        return np.nan

    player_id = matches[0]["id"]

    return player_id

In [103]:
total_df["PLAYER_ID"] = total_df.apply(find_nba_id, axis=1).astype("Int64")

In [104]:
players.find_players_by_full_name("Reggie Turner")

[]

In [105]:
total_df[total_df["PLAYER_ID"].isna()]

Unnamed: 0,player,overall_pick,PLAYER_ID
41,Michael Cutright,42.0,
43,Reggie Cross,44.0,
46,Reggie Turner,47.0,
47,Junie Lewis,48.0,
52,Jeff Hodge,53.0,
...,...,...,...
55,Luke Travers,56.0,
57,Hugo Besson,58.0,
30,James Nnaji,31.0,
46,Mojave King,47.0,


In [107]:
total_df

Unnamed: 0,player,overall_pick,PLAYER_ID
0,Pervis Ellison,1.0,442
1,Danny Ferry,2.0,198
2,Sean Elliott,3.0,251
3,Glen Rice,4.0,779
4,J.R. Reid,5.0,462
...,...,...,...
53,Jalen Slawson,54.0,1641771
54,Isaiah Wong,55.0,1631209
55,Tarik Biberovic,56.0,
56,Trayce Jackson-Davis,57.0,1631218
