# Team Stats DataFrame Function

In this notebook, we will define a function that stores the team stats of a basketball game in a DataFrame. This function will take as input a url which has a team stats table in it.
It will output a DataFrame with 2 records (visitor team and home team) and columns for stats, such as team name, points scored each quarters, and rebounds.

This wil build upon the last two notebooks: "WebScraping May 21" and "WebScraping for Game 3 of 2018 WCF May22".

We begin by importing the libraries we will need for this notebook.

In [1]:
import requests #convert page at url into html file
from bs4 import BeautifulSoup #for processing and reading html file
import pandas as pd

We now define the url to DataFrame function.

One case we haven't had to deal with yet is the case that the game goes into overtime. The function will give an error if we try to run the function on a game that went into overtime. To rectify this issue, we will store the number of overtimes in a separate column and store the points scored per overtime period in a list in another column.

In [2]:
def teamstats(url):
    '''
    Extract a table of team stats from a webpage at a url and store the stats in a DataFrame.
    
    Input:
    url to page with team stats table
    
    Output:
    DataFrame (2 records x 29 columns)
        One record each for visistor team and home team
    '''
    
    page = requests.get(url)
    html = page.content
    
    soup = BeautifulSoup(html, 'lxml')
    tables = soup.find_all('table')
    
    tb0 = tables[0].tbody #table with team names, points scored each quarter, total points
    tb1 = tables[1].tbody #table with traditional team stats (assists, total rebounds, etc.)
    
    [visitor_team_row, home_team_row] = [row for row in tb0.find_all('tr')]
    #lists of team name, points scored each quarter, total points    
    visitor_team_name_points = [val.contents[0].strip() for val in visitor_team_row.find_all('td')]
    home_team_name_points = [val.contents[0].strip() for val in home_team_row.find_all('td')]
    
    #Handling the case that the game went to overtime
    if len(visitor_team_name_points) > 6: #it was an overtime game
        num_overtimes = len(visitor_team_name_points) - 6 #number of overtime periods
        #List of int's for visitor and home teams of points scored per overtime period
        visitor_overtime_points = list(map(int,visitor_team_name_points[5:5+num_overtimes]))
        home_overtime_points = list(map(int,home_team_name_points[5:5+num_overtimes]))
        #then remove these items from lists of team name and point totals
        for period in range(num_overtimes):
            visitor_team_name_points.pop(5 + period)
            home_team_name_points.pop(5 + period)
    else: #no overtime
        num_overtimes = 0
        visitor_overtime_points = []
        home_overtime_points = []

    #create 3 lists for Team Stats table
    #List 1: names of stats in Team Stats table
    #List 2: corresponding visitor team's stat
    #List 3: corresponding home team's stat
    tb1_stat_names, tbl_visitor_stats, tb1_home_stats = [], [], []
    
    #cycle over different stats
    for row in tb1.find_all('tr'):
        
        tdx = [val for val in row.find_all('td')]
        
        tb1_stat_names += tdx[0].contents[0].strip().split('-')
        tbl_visitor_stats += tdx[1].contents[0].strip().split('-')
        tb1_home_stats += tdx[2].contents[0].strip().split('-')
    
    
    #precede each stat 'Attempted' with type of shot attempted
    tb1_stat_names[1] = 'FG Attempted'
    tb1_stat_names[4] = '3PT Attempted'
    tb1_stat_names[7] = 'FT Attempted'
    
    tb0_stat_names = ['Team name', '1st Qtr Points', '2nd Qtr Points', \
                  '3rd Qtr Points', '4th Qtr Points', 'Total Points']
    
    #names of all stats, including team name, rebounds, etc.
    stat_names = tb0_stat_names + tb1_stat_names + ['Number of OT Periods', 'OT Points']
    #corresponding stats for teams
    visitor_stats = visitor_team_name_points + tbl_visitor_stats + [num_overtimes, visitor_overtime_points]
    home_stats = home_team_name_points + tb1_home_stats + [num_overtimes, home_overtime_points]
    
    #create DataFrame of all stats (all entries will be type string (why??))
    stats_df = pd.DataFrame(columns=stat_names)
    stats_df.loc[0] = visitor_stats
    stats_df.loc[1] = home_stats
    
    #append column of which team won (1 if won, 0 if lost)
    if int(stats_df.loc[0,'Total Points']) > int(stats_df.loc[1,'Total Points']):
        stats_df.loc[:,'Won?'] = pd.Series([1,0])
    else:
        stats_df.loc[:,'Won?'] = pd.Series([0,1])
        
    
    #convert all entries from string type to (int or float) type (except for Team Name)
    #column_names = list(stats_df.columns)
    
    for stat in stat_names:
        if (stat == 'Team name') or (stat == 'OT Points'):
            pass
        elif '%' in stat: #convert percentage stats to float type
            stats_df[stat] = stats_df[stat].apply(lambda num: float(num))
        else: #convert other stats to int type
            stats_df[stat] = stats_df[stat].apply(lambda num: int(num))
            
    return stats_df    

## Testing function

We check that our function works for the original game we were interested in- Game 3 of the 2018 Western Conference Finals. We then check that the function works applied to another function.

In [3]:
#link to page with team stats
url = "http://www.espn.com/nba/matchup?gameId=401032763"

#Test function on Game 3 of 2018 WCF
stats_df = teamstats(url)
print(stats_df)

  Team name  1st Qtr Points  2nd Qtr Points  3rd Qtr Points  4th Qtr Points  \
0       HOU              22              21              24              18   
1        GS              31              23              34              38   

   Total Points  FG Made  FG Attempted  Field Goal %  3PT Made  ...   \
0            85       32            81          39.5        11  ...    
1           126       48            92          52.2        13  ...    

   Total Turnovers  Points Off Turnovers  Fast Break Points  Points in Paint  \
0               20                    28                 10               40   
1                8                     8                 23               56   

   Personal Fouls  Technical Fouls  Flagrant Fouls  Number of OT Periods  \
0              19                2               0                     0   
1              16                1               0                     0   

   OT Points  Won?  
0         []     0  
1         []     1  

[2 rows x 3

In [4]:
#Test function on Game 4 of 2018 WCF
url_game4 = 'http://www.espn.com/nba/matchup?gameId=401032764'

stats_df_game4 = teamstats(url_game4)
print(stats_df_game4)

  Team name  1st Qtr Points  2nd Qtr Points  3rd Qtr Points  4th Qtr Points  \
0       HOU              19              34              17              25   
1        GS              28              18              34              12   

   Total Points  FG Made  FG Attempted  Field Goal %  3PT Made  ...   \
0            95       30            77          39.0        12  ...    
1            92       35            89          39.3         9  ...    

   Total Turnovers  Points Off Turnovers  Fast Break Points  Points in Paint  \
0               13                    18                 13               34   
1               16                    20                  8               36   

   Personal Fouls  Technical Fouls  Flagrant Fouls  Number of OT Periods  \
0              19                0               0                     0   
1              24                0               0                     0   

   OT Points  Won?  
0         []     1  
1         []     0  

[2 rows x 3

We now check the function on a game that went to overtime, specifically the December 18, 2017 game between the Los Angeles Lakers and Golden State Warriors. 

The final score of the game was 116-114 and it lasted one overtime.


In [5]:
url_ot = 'http://www.espn.com/nba/matchup?gameId=400975201'

stats_df_ot = teamstats(url_ot)

print(stats_df_ot)

  Team name  1st Qtr Points  2nd Qtr Points  3rd Qtr Points  4th Qtr Points  \
0        GS              32              25              26              19   
1       LAL              24              29              29              20   

   Total Points  FG Made  FG Attempted  Field Goal %  3PT Made  ...   \
0           116       43           107          40.2        11  ...    
1           114       41            92          44.6        10  ...    

   Total Turnovers  Points Off Turnovers  Fast Break Points  Points in Paint  \
0               15                    12                 17               56   
1               12                    11                 17               56   

   Personal Fouls  Technical Fouls  Flagrant Fouls  Number of OT Periods  \
0              26                1               0                     1   
1              26                0               0                     1   

   OT Points  Won?  
0       [14]     1  
1       [12]     0  

[2 rows x 3