# Content of Notebook

In this notebook, we will improve upon the code written in "WebScraping May 21". The goal will still be to convert the Team Stats table for Game 3 of the 2018 Western Conference Finals between the Houston Rockets and the Golden State Warriors. 


## Getting team stats table

We begin with a link that has a table for the team stats. We use the requests and BeautifulSoup library to extract this table.

In [1]:
#link to Team stats page for Game 3
url = "http://www.espn.com/nba/matchup?gameId=401032763"

import requests

page = requests.get(url) #Response object
html = page.content #html file

In [2]:
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'lxml')

tables = soup.find_all('table') 

In [3]:
tb0 = tables[0].tbody #name of team, points per quarter, total quarter
tb1 = tables[1].tbody #team stats

visitor_team_name_points, home_team_name_points = [], []

[visitor_team_row, home_team_row] = [row for row in tb0.find_all('tr')]

visitor_team_name_points = [val.contents[0].strip() for val in visitor_team_row.find_all('td')]
home_team_name_points = [val.contents[0].strip() for val in home_team_row.find_all('td')]

## Lists of names of stats with corresponding stats

We go through the table to create lists of stat names, the visitor stats and the home stats. Then we append these stats to the team name and point totals found before. 

In [4]:
#Different names of stats with corresponding stats for visitors and home 
tb1_stat_names, tb1_visitor_stats, tb1_home_stats = [], [], []

#cycles over different stats
for row in tb1.find_all('tr'):
    
    tdx = [val for val in row.find_all('td')]
    
    #extract names, visitor stat, home stat for each stat and store in lists
    #split by '-' to accomodate stats with 'Made-Attempted' (split into two stats)
    tb1_stat_names += tdx[0].contents[0].strip().split('-')
    tb1_visitor_stats += tdx[1].contents[0].strip().split('-')
    tb1_home_stats += tdx[2].contents[0].strip().split('-')
    
#precede each name of 'Attempted' with type of shot attempted
tb1_stat_names[1] = 'FG Attempted'
tb1_stat_names[4] = '3PT Attempted'
tb1_stat_names[7] = 'FT Attempted'

tb0_stat_names = ['Team name', '1st Qtr Points', '2nd Qtr Points', \
                  '3rd Qtr Points', '4th Qtr Points', 'Total Points']
stat_names = tb0_stat_names + tb1_stat_names
visitor_stats = visitor_team_name_points + tb1_visitor_stats
home_stats = home_team_name_points + tb1_home_stats

print(visitor_stats)

['HOU', '22', '21', '24', '18', '85', '32', '81', '39.5', '11', '34', '32.4', '10', '13', '76.9', '41', '10', '31', '19', '3', '5', '20', '28', '10', '40', '19', '2', '0']


In [5]:
#all list items are currently string type
#change % stat items to float type
#change all other stats to int type

'''

for idx in range(len(tb1_stat_names)):
    if '%' in stat_names[idx]:
        visitor_stats[idx] = float(visitor_stats[idx])
        home_stats[idx] = float(home_stats[idx])
    elif stat_names[idx] == 'Team name':
        pass
    else:
        visitor_stats[idx] = int(visitor_stats[idx])
        home_stats[idx] = int(home_stats[idx])
        
'''

"\n\nfor idx in range(len(tb1_stat_names)):\n    if '%' in stat_names[idx]:\n        visitor_stats[idx] = float(visitor_stats[idx])\n        home_stats[idx] = float(home_stats[idx])\n    elif stat_names[idx] == 'Team name':\n        pass\n    else:\n        visitor_stats[idx] = int(visitor_stats[idx])\n        home_stats[idx] = int(home_stats[idx])\n        \n"

## Store stats in DataFrame

We store the visitor team's stats and home team's stats in a DataFrame. We append an additional column that states which team won. 

Each of the entries of the DataFrame are currently type string. We convert each percentage stat to type float. We convert all other stats (besides Team Name) to type int.

In [6]:
import pandas as pd

stats_df = pd.DataFrame(columns=stat_names)
stats_df.loc[0] = visitor_stats
stats_df.loc[1] = home_stats

In [7]:
#append a final column that states which team won
if int(stats_df.loc[0,'Total Points']) > int(stats_df.loc[1,'Total Points']):
    stats_df.loc[:,'Won?'] = pd.Series([1,0])
else:
    stats_df.loc[:,'Won?'] = pd.Series([0,1])

In [8]:
#convert types of entries to correct type

column_names = list(stats_df.columns)

for column in column_names:
    if column == 'Team name':
        pass
    elif '%' in column:
        stats_df[column] = stats_df[column].apply(lambda num: float(num))
    else:
        stats_df[column] = stats_df[column].apply(lambda num: int(num))

In [9]:
print(stats_df)

  Team name  1st Qtr Points  2nd Qtr Points  3rd Qtr Points  4th Qtr Points  \
0       HOU              22              21              24              18   
1        GS              31              23              34              38   

   Total Points  FG Made  FG Attempted  Field Goal %  3PT Made  ...   Steals  \
0            85       32            81          39.5        11  ...        3   
1           126       48            92          52.2        13  ...       11   

   Blocks  Total Turnovers  Points Off Turnovers  Fast Break Points  \
0       5               20                    28                 10   
1       7                8                     8                 23   

   Points in Paint  Personal Fouls  Technical Fouls  Flagrant Fouls  Won?  
0               40              19                2               0     0  
1               56              16                1               0     1  

[2 rows x 29 columns]
