<img src="assets/br_logo.png" width="400px">

## Web Scraping [Basketball-Reference.com](www.basketball-reference.com) Using Selenium

Example Player Page:
http://www.basketball-reference.com/players/b/bryanko01.html

 

### Using the `web_scrape` module

**`web_scrape`**

This module contains all the code necessary to web scrape www.Basketball-Reference.com. With this module, you can scrape a player's career averages from four tables on any player's page. In order for the web scrapper to work, you must also feed it a specific player's URL.
- `get_per_game()`
- `get_100()`
- `get_shooting()`
- `get_advanced()`

In order to obtain the column names, you must also instantiate the column names using these functions. Keep in mind how you order for column name alignment: 
- `get_pergame_cols()`
- `get_100_cols()`
- `get_shoot_cols()`
- `get_adv_cols()`

### Import Module

In [1]:
from lib.web_scrape import get_100, get_100_cols, get_shooting, get_shoot_cols, get_advanced, get_adv_cols

### Import Packages

In [2]:
import numpy as np
import pandas as pd

### Import Data

In [3]:
file_loc = "data/example_players.csv"

In [4]:
player_stats = pd.read_csv(file_loc)
player_stats.head()

Unnamed: 0,Season,Player,Pos,Player_ID,url
0,Inactive,A.J. Price,PG,priceaj01,file:///Users/alexcheng/Downloads/us.sitesucke...
1,Active,Aaron Brooks,PG,brookaa01,file:///Users/alexcheng/Downloads/us.sitesucke...
2,Active,Aaron Gordon,SF,gordoaa01,file:///Users/alexcheng/Downloads/us.sitesucke...
3,Active,Aaron Harrison,SG,harriaa01,file:///Users/alexcheng/Downloads/us.sitesucke...
4,Active,Adreian Payne,PF,paynead01,file:///Users/alexcheng/Downloads/us.sitesucke...


### Web Scrape

In [5]:
# Create a list of URLs to feed the web scraper
url_list = list(player_stats['url'])

In [6]:
# Temporary list used to save scraped data in memory
tmp_list = []

In [7]:
# Web Scraping
for url in url_list:
    # Select which tables to scrape from
    per_100 = get_100(url)
    shooting = get_shooting(url)
    advanced = get_advanced(url)
    
    # Build dataframe in order
    df = [per_100, shooting, advanced]
    player_stats = []
    for sublist in df:
        for val in sublist:
            player_stats.append(val)
    
    # Save to temporary list
    tmp_list.append(player_stats)

AttributeError: 'NoneType' object has no attribute 'is_displayed'

In [None]:
# Retrieve columns
per_100_cols = get_100_cols()
shoot_cols = get_shoot_cols()
advanced_cols = get_adv_cols()

tmp_list1 = [per_100_cols, shoot_cols, advanced_cols]
cols = []
for sublist in tmp_list1:
    for val in sublist:
        cols.append(val)

In [None]:
# Build dataframe
cleaned_stats = pd.DataFrame(tmp_list, columns=cols)
cleaned_stats.head()

In [None]:
# Merge dataframes to combine names 
cumulative_player_stats = pd.merge(player_stats, cleaned_stats, on='Player_ID', how='right')

In [None]:
file1_loc = "data/example_player_stats.csv"

# Save to CSV
cumulative_player_stats.to_csv(file1_loc)