## Data Scrape - Draft Picks

This notebook will collect the results of all NBA drafts since 2008 from basketball-reference.com. This data will be used to filter out the players that weren't drafted in my single_season data set.

### Contents

- [Imports](#Imports)
- [Test Scrape](#Test-Scrape)
- [Complete Scrape](#Complete-Scrape)

### Imports

In [1]:
# Import libraries
import pandas as pd
import requests
from bs4 import BeautifulSoup
import time

### Test Scape

In [2]:
#2018 NBA Draft URL
url = 'https://www.basketball-reference.com/draft/NBA_2018.html'
rec = requests.get(url)

In [3]:
rec.status_code

200

In [4]:
soup = BeautifulSoup(rec.content, 'lxml')

In [5]:
#setting variable equal to table with all the data
table = soup.find('table', {'class': 'sortable stats_table'})

In [6]:
#creating list of columns for dataframe
#taken from table
columns = [th.text for th in table.find_all('tr')[1].find_all('th')]

In [7]:
#example scrape for one player
[td.text for td in table.find('tbody').find_all('tr')[0].find_all('td')]

['1',
 'PHO',
 'Deandre Ayton',
 'University of Arizona',
 '1',
 '71',
 '2183',
 '1159',
 '729',
 '125',
 '.585',
 '.000',
 '.746',
 '30.7',
 '16.3',
 '10.3',
 '1.8',
 '5.8',
 '.128',
 '0.2',
 '1.2']

### Complete Scrape

Scraping NBA drafts since 2008

In [8]:
# list to hold each player and their stats
draft_list = []

# loop through draft by year
for i in range(2008,2020):
    print(f'Scraping {i} draft')
    #url for the draft includes the specific year
    url = f'https://www.basketball-reference.com/draft/NBA_{i}.html'
    rec = requests.get(url)
    
    #running loop is status code is 200
    if rec.status_code == 200:
        soup = BeautifulSoup(rec.content, 'lxml')
        table = soup.find('table', {'class': 'sortable stats_table'})
        
        for player in table.find('tbody').find_all('tr'):
            player_info = [td.text for td in player.find_all('td')]
            draft_list.append(player_info)
        time.sleep(1)
    else:
        print('website error')
        

Scraping 2008 draft
Scraping 2009 draft
Scraping 2010 draft
Scraping 2011 draft
Scraping 2012 draft
Scraping 2013 draft
Scraping 2014 draft
Scraping 2015 draft
Scraping 2016 draft
Scraping 2017 draft
Scraping 2018 draft
Scraping 2019 draft


In [9]:
#creating dataframe from the scraped data
draft_df = pd.DataFrame(draft_list, columns = columns[1:])

In [10]:
draft_df.head()

Unnamed: 0,Pk,Tm,Player,College,Yrs,G,MP,PTS,TRB,AST,...,3P%,FT%,MP.1,PTS.1,TRB.1,AST.1,WS,WS/48,BPM,VORP
0,1,CHI,Derrick Rose,University of Memphis,10,546,18104,10281,1910,3056,...,0.304,0.824,33.2,18.8,3.5,5.6,37.2,0.099,0.3,10.5
1,2,MIA,Michael Beasley,Kansas State University,11,609,13903,7568,2861,788,...,0.349,0.759,22.8,12.4,4.7,1.3,15.6,0.054,-3.2,-4.3
2,3,MIN,O.J. Mayo,University of Southern California,8,547,16919,7574,1706,1607,...,0.373,0.82,30.9,13.8,3.1,2.9,21.8,0.062,-0.8,5.1
3,4,SEA,Russell Westbrook,"University of California, Los Angeles",11,821,28330,18859,5760,6897,...,0.308,0.801,34.5,23.0,7.0,8.4,96.9,0.164,6.6,61.6
4,5,MEM,Kevin Love,"University of California, Los Angeles",11,657,21023,12006,7397,1519,...,0.37,0.827,32.0,18.3,11.3,2.3,78.2,0.179,2.7,24.9


In [11]:
#dropping nulls
draft_df.dropna(inplace = True)

In [12]:
#checking for null values
draft_df.isnull().sum().sum()

0

In [13]:
draft_df.shape

(720, 21)

In [14]:
#saving file as csv to access in other notebooks
draft_df.to_csv('../Data_Files/draftpicks.csv')