# NFL Team Scrapper for Turnovers

Scrape the website, get turnover stats (following), update the 'teams' table for each year.  Per Game Average:

        # Average per game
        TOnet = round(int(stats[1])/16,1)                # Turnovers Net (TOpos - TOneg)
        TOpos = round(int(stats[4])/16,1)                # Turnovers Positive (i.e., takeaways)
        TOneg = -round(int(stats[7])/16,1)                # Turnovers Negative (i.e., giveaways)
        
        # Turnover Net: The combined total of Takeaway and Giveaway stat.
        # Takeaway:  When the team takes the ball away, from the opponent, with a fumble recovery or interception.
        # Giveaway:  When the team gives the ball away, to the opponent, with a fumble or interception lost.

### Data sources:

Football Database: https://www.footballdb.com/stats/turnovers.html

Turnover Differential is calculated by subtracting the total number of giveaways (interceptions & fumbles lost) from the total number of takeaways (interceptions & opponent fumble recoveries).    

In [1]:
# Imports, etc.
import re
import pandas as pd
import urllib.request
from urllib.request import urlopen

def getTurnovers(yr):
    """
    Input:    year 
    Function: Scrape web page for given year 
              Extract team turnovers (net, takeaways (positive), and giveaways (negatives)) 
              Convert from total for year to average per game
    Notes:    32 teams in NFL; 16 games per year (season)
    Return:   Results in list format (one entry per team that includes the turnover statistics above)
    """
    # This is the url that we are going to extract the data from
    url = 'https://www.footballdb.com/stats/turnovers.html?yr=' + str(yr) + '&conf='
    html = urlopen(url).read()                               # opens and reads
    rawdata = str(html)                                      # converts to string 
    rawdata = rawdata.replace('"', '')                       # replace quote markers (trouble-makers)
    #print(rawdata) -- explore: each team data should be between 'hidden-xs' ..... '</td></tr>'
    webdata = re.findall('hidden-(xs.+?\<\/td\>)\<\/tr\>', rawdata)  # will be 1 per team
    results = []                                             # empty list for results
    for i in range (32):                                     # there are 32 teams in NFL
        temp = webdata[i]                                     # team data                    
        temp = temp.replace("<", "BREAK1  ")                     # replace marker
        temp = temp.replace(">", "BREAK2")                       # replace marker
        name = re.findall('\s([A-Z]{0,1}[49]{0,2}[a-z]{1,})?BREAK1', temp)[0]     # capture team name
        temp = temp.replace("BREAK2BREAK1", " ")
        temp = temp.replace("/", "X")
        stats = re.findall('BREAK2(\D{0,1}\d{1,2})?BREAK1', temp)                     # capture team statistics
        # Average per game for selected statistics (16 games in regular season)
        TOnet = round(int(stats[1])/16,1)                # Turnovers Net (TOpos - TOneg)
        TOpos = round(int(stats[4])/16,1)                # Turnovers Positive (i.e., takeaways)
        TOneg = round(int(stats[7])/16,1)                # Turnovers Negative (i.e., giveaways)
        results.append([yr, name, TOnet,TOpos,TOneg])
    return results

###########    Execute

results_list = getTurnovers(2017)             # Get the turnover stats from web 
df = pd.DataFrame(results_list, columns=["Year", "Team", "Net", "Pos", "Neg"])      # put in a dataframe
df.head()                                            # sample of first five

Unnamed: 0,Year,Team,Net,Pos,Neg
0,2017,Ravens,1.1,2.1,1.1
1,2017,Chiefs,0.9,1.6,0.7
2,2017,Chargers,0.8,1.7,0.9
3,2017,Eagles,0.7,1.9,1.2
4,2017,Lions,0.6,2.0,1.4
