# Retrieving Draft Data

We will be using the `nba_api` which is an inbuilt library in python that allows us to scrape different categories of data (depending on the endpoint) directly from the official nba site (https://www.nba.com/stats).

Drafts are when players fresh out of college are picked by teams to play in the NBA. In this notebook we'll scrape 2 different datasets : past drafts and information about players (such as height, weight etc). We will use the `drafthistory` and `draftcombinestats` endpoints included in the `nba_api` to do this. We will then merge these two datasets (when cleaning this data) which will enable us to analyse one of our research questions a lot better 

In [15]:
import pandas as pd
from nba_api.stats.endpoints import drafthistory
from nba_api.stats.endpoints import draftcombinestats
import matplotlib.pyplot as plt
import os

Collecting draft data :

In [25]:
draft_data = drafthistory.DraftHistory()

In [26]:
draft_df = draft_data.get_data_frames()[0]
draft_df

Unnamed: 0,PERSON_ID,PLAYER_NAME,SEASON,ROUND_NUMBER,ROUND_PICK,OVERALL_PICK,DRAFT_TYPE,TEAM_ID,TEAM_CITY,TEAM_NAME,TEAM_ABBREVIATION,ORGANIZATION,ORGANIZATION_TYPE,PLAYER_PROFILE_FLAG
0,1641705,Victor Wembanyama,2023,1,1,1,Draft,1610612759,San Antonio,Spurs,SAS,Metropolitans 92 (France),Other Team/Club,1
1,1641706,Brandon Miller,2023,1,2,2,Draft,1610612766,Charlotte,Hornets,CHA,Alabama,College/University,1
2,1630703,Scoot Henderson,2023,1,3,3,Draft,1610612757,Portland,Trail Blazers,POR,Ignite (G League),Other Team/Club,1
3,1641708,Amen Thompson,2023,1,4,4,Draft,1610612745,Houston,Rockets,HOU,Overtime Elite,Other Team/Club,1
4,1641709,Ausar Thompson,2023,1,5,5,Draft,1610612765,Detroit,Pistons,DET,Overtime Elite,Other Team/Club,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8252,79312,Herb Wilkinson,1947,0,0,0,Draft,1610610034,St. Louis,Bombers,BOM,Iowa,College/University,0
8253,79284,Jack Stone,1947,0,0,0,Draft,1610610025,Chicago,Stags,CHS,Kansas State,College/University,0
8254,79313,Frank Broyles,1947,0,0,0,Draft,1610610035,Toronto,Huskies,HUS,Georgia Tech,College/University,0
8255,79283,Hank Decker,1947,0,0,0,Draft,1610610025,Chicago,Stags,CHS,West Texas A&M,College/University,0


In [29]:
draft_df.columns

Index(['PERSON_ID', 'PLAYER_NAME', 'SEASON', 'ROUND_NUMBER', 'ROUND_PICK',
       'OVERALL_PICK', 'DRAFT_TYPE', 'TEAM_ID', 'TEAM_CITY', 'TEAM_NAME',
       'TEAM_ABBREVIATION', 'ORGANIZATION', 'ORGANIZATION_TYPE',
       'PLAYER_PROFILE_FLAG'],
      dtype='object')

As we can see we have data on past drafts which goes all the way back to 1947. Now, we likely wont use all this data and will only include drafts from 1990 onwards (refer to `clean_draft_data` notebook)

Lets save this data to a csv file :

In [16]:
output_dir = os.path.join("..", "data", "raw", "draft_data")
os.makedirs(output_dir, exist_ok=True)
output_filename = os.path.join(output_dir, "draft_data.csv")
draft_df.to_csv(output_filename, index=False)

## Collecting more detailed information about every player :

In [18]:
player_additional_info = draftcombinestats.DraftCombineStats(season_all_time= 'All Time')
player_additional_info_df= player_additional_info.get_data_frames()[0]

In [19]:
player_additional_info_df

Unnamed: 0,SEASON,PLAYER_ID,FIRST_NAME,LAST_NAME,PLAYER_NAME,POSITION,HEIGHT_WO_SHOES,HEIGHT_WO_SHOES_FT_IN,HEIGHT_W_SHOES,HEIGHT_W_SHOES_FT_IN,...,STANDING_REACH_FT_IN,BODY_FAT_PCT,HAND_LENGTH,HAND_WIDTH,STANDING_VERTICAL_LEAP,MAX_VERTICAL_LEAP,LANE_AGILITY_TIME,MODIFIED_LANE_AGILITY_TIME,THREE_QUARTER_SPRINT,BENCH_PRESS
0,2002,2403,Nene,,Nene,PF,81.25,6' 9.25'',,,...,9' 1'',,,,30.0,34.0,10.73,,3.19,16.0
1,2021,1630546,Max,Abmas,Max Abmas,PG,70.50,5'10.5'',71.75,5'11.75'',...,7'10.0'',5.50,8.0,7.75,28.5,32.5,10.90,3.49,3.12,
2,2007,12204,Mohamed,Abukar,Mohamed Abukar,SF,80.00,6' 8'',81.75,6' 9.75'',...,8' 7'',8.90,,,30.5,35.0,11.78,,3.37,15.0
3,2020,1630173,Precious,Achiuwa,Precious Achiuwa,PF,79.50,6'7.50'',80.70,6'8.75'',...,9'0.50'',6.70,9.0,10.00,,,,,,
4,2005,101165,Alex,Acker,Alex Acker,SG,75.75,6' 3.75'',76.75,6' 4.75'',...,8' 6.5'',,,,28.0,32.0,11.67,,3.35,11.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1628,2012,203092,Tyler,Zeller,Tyler Zeller,PF-C,83.25,6' 11.25'',84.50,7' 0.5'',...,8' 8.5'',6.40,9.0,9.75,30.0,34.0,11.13,,3.40,16.0
1629,2013,203469,Cody,Zeller,Cody Zeller,C-PF,82.75,6' 10.75'',84.25,7' 0.25'',...,8' 10'',4.75,8.5,10.50,35.5,37.5,10.82,2.69,3.15,17.0
1630,2022,1630855,Fanbo,Zeng,Fanbo Zeng,SF-PF,,,,,...,,,,,,,,,,
1631,2016,1627757,Stephen,Zimmerman,Stephen Zimmerman,C,82.25,6' 10.25'',83.75,6' 11.75'',...,9' 0.5'',11.15,9.0,9.00,26.0,31.0,12.08,3.16,3.43,


In [20]:
player_additional_info_df.columns

Index(['SEASON', 'PLAYER_ID', 'FIRST_NAME', 'LAST_NAME', 'PLAYER_NAME',
       'POSITION', 'HEIGHT_WO_SHOES', 'HEIGHT_WO_SHOES_FT_IN',
       'HEIGHT_W_SHOES', 'HEIGHT_W_SHOES_FT_IN', 'WEIGHT', 'WINGSPAN',
       'WINGSPAN_FT_IN', 'STANDING_REACH', 'STANDING_REACH_FT_IN',
       'BODY_FAT_PCT', 'HAND_LENGTH', 'HAND_WIDTH', 'STANDING_VERTICAL_LEAP',
       'MAX_VERTICAL_LEAP', 'LANE_AGILITY_TIME', 'MODIFIED_LANE_AGILITY_TIME',
       'THREE_QUARTER_SPRINT', 'BENCH_PRESS'],
      dtype='object')

In [21]:
# saving data

output_dir = os.path.join("..", "data", "raw", "player_extra_info")
os.makedirs(output_dir, exist_ok=True)
output_filename = os.path.join(output_dir, "player_moreinfo.csv")
player_additional_info_df.to_csv(output_filename, index=False)