# Compiling draft data

It'd be useful to have a dataframe with the format `PLAYER_ID, DRAFT_POSITION` giving where they were drafted (and something like NAN, which we can handle later, if they were never drafted).

There is a Kaggle dataset here (https://www.kaggle.com/datasets/mattop/nba-draft-basketball-player-data-19892021) which has data from Basketball Reference through the 2021 draft.

In this notebook we'll load that data and combine it with data from Basketball Reference (which I just exported as CSV on the website and copy-pasted into CSV files) for 2022, 2023.  Because the 2024 season is ongoing, we won't be able to use it for this project.

In [51]:
import numpy as np
import pandas as pd

In [52]:
kaggle_df = pd.read_csv("nbaplayersdraft.csv")
df_2022 = pd.read_csv("draft_2022.csv")
df_2023 = pd.read_csv("draft_2023.csv")

In [53]:
print(kaggle_df.columns)
print(df_2022.columns)

Index(['id', 'year', 'rank', 'overall_pick', 'team', 'player', 'college',
       'years_active', 'games', 'minutes_played', 'points', 'total_rebounds',
       'assists', 'field_goal_percentage', '3_point_percentage',
       'free_throw_percentage', 'average_minutes_played', 'points_per_game',
       'average_total_rebounds', 'average_assists', 'win_shares',
       'win_shares_per_48_minutes', 'box_plus_minus',
       'value_over_replacement'],
      dtype='object')
Index(['Rk', 'Pk', 'Tm', 'Player', 'College', 'Yrs', 'G', 'MP', 'PTS', 'TRB',
       'AST', 'FG%', '3P%', 'FT%', 'MP.1', 'PTS.1', 'TRB.1', 'AST.1', 'WS',
       'WS/48', 'BPM', 'VORP'],
      dtype='object')


In [54]:
#let's keep just "id", "overall_pick", "player" from the Kaggle data
k_df = kaggle_df[["player", "id", "overall_pick"]].copy()
k_df.rename(columns={"id":"BBREF_id"}, inplace=True)

In [55]:
df_22 = df_2022[["Player", "Pk"]].copy()
df_22.rename(columns={"Player":"player", "Pk":"overall_pick"}, inplace=True)
df_22["BBREF_id"] = np.nan

In [56]:
df_23 = df_2023[["Player", "Pk"]].copy()
df_23.rename(columns={"Player":"player", "Pk":"overall_pick"}, inplace=True)
df_23["BBREF_id"] = np.nan

In [57]:
total_df = pd.concat([k_df, df_22, df_23])

In [58]:
total_df

Unnamed: 0,player,BBREF_id,overall_pick
0,Pervis Ellison,1.0,1.0
1,Danny Ferry,2.0,2.0
2,Sean Elliott,3.0,3.0
3,Glen Rice,4.0,4.0
4,J.R. Reid,5.0,5.0
...,...,...,...
55,Tarik Biberovic,,56.0
56,Trayce Jackson-Davis,,57.0
57,Chris Livingston,,58.0
58,,,


# Convert `BBREF_id` to `nba_api` ID