<div style="font-size:16px; border:1px solid black; padding:10px">
<center><h1><br><font color="blue">SportsDataIO: The Best Hub for Sports Data</font></h1></center>    
    <ul>
        <li>Access to sports data is limited, and in many cases requires manual webscrapping information from major sports websites.</li><br>
        <li><a href="https://sportsdata.io/">SportDataIO</a> provides a Complete Sports Data Solution with their api.</li><br>
        <li>This company provides limited free developer access to their API as well as a paid subscription for advance and accurate real-time sports data.</li><br>
        <li>The goal of this data is to gather NFL player information for all teams in the 2020-2021 NFL season</li><br>
        <li>The free api membership method will be utilized in this post, and requires the use of each team abbreviation as part of their API endpoint.</li><br>
        <li>To get the all NFL team abbreviations, webscrapping of a wiki page that contains this data will first be conducted</li>
    </ul>
</div>

<hr style="border-top: 5px solid black;">

<div style="font-size:16px; border:1px solid black; padding:10px">
<center><h1><br><font color="blue">Post Goals</font></h1></center>    
    <ol>
        <li><a href="#part1">Demonstrate how to get data from website using Python</a>.</li><br>
        <li><a href="#part2">Demonstrate how to obtain data from <em>SportsDataIo</em> using Python</a>.</li><br>
</ol>
</div>

<hr style="border-top: 5px solid black;">

<div id="part1" style="font-size:16px; border:1px solid black; padding:10px">
    <center><h1><strong>Part 1a: </strong>Import Dependencies</h1></center>
    <ul>
        <ul><code>requests</code>: 
        <li>Requests allows you to send HTTP/1.1 requests extremely easily. There’s no need to manually add query strings to your URLs, or to form-encode your PUT, POST data and just use the json method.</li>
        <li><a href="https://pypi.org/project/requests/">Documentation 2.25.1</a> as of 2/15/20</li><br>
        </ul>
        <ul><code>lxml.html</code>: 
        <li>lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language.</li>
        <li><a href="https://lxml.de/parsing.html">Documentation</a> as of 2/15/20</li><br>            
        </ul>
        <ul><code>pandas</code>: 
        <li>pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.</li>
        <li><a href="https://pandas.pydata.org/">Documentation 1.2.2</a> as of 2/15/20</li><br>            
        </ul>        
    </ul>
</div>

In [44]:
import requests
import lxml.html as lh
import pandas as pd

%store -r SPORTIO_API #API key

<hr style="border-top: 5px solid black;">

<div style="font-size:16px; border:1px solid black; padding:10px">
    <center><h1><strong>Part 1b: </strong>Webscrape NFL Abbreviations from Wiki</h1></center>
    <ul>
       <li>To make the final API call on sportsdataio, and get NFL player information, the team abbreviation must be used in the API endpoint.</li><br>
        <li>All NFA team abbreviations are available on wikipedia.</li><br>
        <li>Webscrapping using requests, and lxml will be used to parse the wiki page, and loaded onto pandas.</li><br>
        <li>The extracted data will also be cleaned up by removing any irrelevant substrings.</li>
    </ul>
</div>

<h2>Use Webscrapping to get NFL Abbreviation Data fro Wikipedia</h2>

In [23]:
# Store website in a url variable
url = "https://en.wikipedia.org/wiki/Wikipedia:WikiProject_National_Football_League/National_Football_League_team_abbreviations"
# Use the request method on the url
req = requests.get(url)

# Store the contents of the website using the html lxml.html parser lh
doc = lh.fromstring(req.content)

#Parse data that is stored between HTML table row tags <tr>..</tr>
tr_elements = doc.xpath('//tr')

# Confirm that you are gathering tabular data
# You can do that by inspecting the length of 
# the rows by using a list comprehension
# make sure that each row has the same number of columns
[len(T) for T in tr_elements[:33]]

#Create empty list
table_data=[]
i=0
#For each row, store each first element (header) and an empty list
for t in tr_elements[0]:
    i+=1

    #lxlm text_content() method extracts text values
    name=t.text_content()
    #Check you are gathering the header information with print statement
    #This is optional   
    
    #use the List .append() method to add the text from each row
    #into the empty list col you created
    table_data.append((name,[]))
    
# The header data was stored in the first row 
# From above code(index 0 = tr_elements[0])
# data is stored on the second row onwards
# Use a for loop to eterate through the reamining tr elements
# Make sure that each row is tabular, if not break out
# For tabular data (row with equal columms) 
# Store the data into your table_data list.
for j in range(1,len(tr_elements)):
    
    #T is our j'th row
    T=tr_elements[j]
    
    #Count the number of columns in the table and assign to variable
    #In this example, we have 3 columns    
    col_nbr = 3
    
    #If row is not of size 3 (# columns), 
    #the //tr data is not from our table 
    if len(T)!=col_nbr:
        break
    
    #i is the index of our column
    i=0
    
    #Iterate through each element of the row
    #This for loop uses the lxlm .iterchildren() method
    for t in T.iterchildren():
        #This code uses the lxlm .text_content() method         
        data=t.text_content() 
        
        #Check if row is empty
        if i>0:
        #Convert any numerical value to integers
            try:
                data=int(data)
            except:
                pass
        #Append the data to the empty list of the i'th column
        table_data[i][1].append(data)
        #Increment i for the next column
        i+=1    

        
# Create the dataframe
Dict={title:column for (title,column) in table_data}
nfl_web_data=pd.DataFrame(Dict)

# Inspect dataframe
nfl_web_data.head()

Unnamed: 0,Official Team Abbreviation Codes\n,Commonly Used Abbreviations\n,Franchise\n
0,ARZ\n,ARI\n,Arizona Cardinals\n
1,ATL\n,ATL\n,Atlanta Falcons\n
2,BLT\n,BAL\n,Baltimore Ravens\n
3,BUF\n,BUF\n,Buffalo Bills\n
4,CAR\n,CAR\n,Carolina Panthers\n


<h2>Clean Data and remove irrelevant substrings</h2>

In [26]:
# Clean pandas columns
colnames = nfl_web_data.columns.values
col_names = []
for col in colnames:
    col_names.append(col.replace('\n', ''))
nfl_web_data.columns = col_names

# clean pandas data and remove the '\n' substring
for col in nfl_web_data:
    nfl_web_data[col] = nfl_web_data[col].str.replace(r'\n', '')
    
nfl_web_data.head()    

Unnamed: 0,Official Team Abbreviation Codes,Commonly Used Abbreviations,Franchise
0,ARZ,ARI,Arizona Cardinals
1,ATL,ATL,Atlanta Falcons
2,BLT,BAL,Baltimore Ravens
3,BUF,BUF,Buffalo Bills
4,CAR,CAR,Carolina Panthers


<h2>Save Abbreviations into an array to be used in next step</h2>

In [30]:
nfl_team_abbr = nfl_web_data['Commonly Used Abbreviations'].values
nfl_team_abbr

array(['ARI', 'ATL', 'BAL', 'BUF', 'CAR', 'CHI', 'CIN', 'CLE', 'DAL',
       'DEN', 'DET', 'GB', 'HOU', 'IND', 'JAX', 'KC', 'LV', 'LAC', 'LAR',
       'MIA', 'MIN', 'NE', 'NO', 'NYG', 'NYJ', 'PHI', 'PIT', 'SF', 'SEA',
       'TB', 'TEN', 'WAS'], dtype=object)

<hr style="border-top: 5px solid black;">

<div id ="part2" style="font-size:16px; border:1px solid black; padding:10px">
    <center><h1><strong>Part 2: </strong>Get All NFL Player Data by Team</h1></center>
    <ul>
       <li>You must first obtain an API key to make a call, read website on <a href="https://sportsdata.io/developers/getting-started">getting started</a> link as of 2-15-2021</li><br>
        <li>The NFL free trial was used to gather the data in this project <a href="https://sportsdata.io/cart/free-trial/nfl">Documentation</a> as of 2-15-2021.</li><br>
        <li>The endpoint/url has the following format: https://api.sportsdata.io/v3/nfl/scores/json/Players/{team}?key={API_KEY}<code></code></li><br>
        <li>A for loop that iterates through each NFL team abbreviation was used with pandas <code>read_json()</code> method to download each team player information, and saved into a dataframe.</li><br>
        <li>Each team dataframe was concatenated into a single dataframe using pandas <code>concat()</code> method, and a csv file saved using <code>to_csv().</code></li><br>
    </ul>
</div>

In [33]:
# Carry out API calls by team and save each dataframe into a list of dataframes
team_df = []
for team in nfl_team_abbr:
    url = f"https://api.sportsdata.io/v3/nfl/scores/json/Players/{team}?key={SPORTIO_API}"
    df = pd.read_json(url)
    team_df.append(df)

In [38]:
# Concatenate all dataframes
nfl_df = pd.concat(team_df)
nfl_df.head()

Unnamed: 0,PlayerID,Team,Number,FirstName,LastName,Position,Status,Height,Weight,BirthDate,...,GlobalTeamID,FantasyDraftPlayerID,FantasyDraftName,UsaTodayPlayerID,UsaTodayHeadshotUrl,UsaTodayHeadshotNoBackgroundUrl,UsaTodayHeadshotUpdated,UsaTodayHeadshotNoBackgroundUpdated,PlayerSeason,LatestNews
0,21208,ARI,,Ian,Bunting,TE,Practice Squad,"6'7""",255,1996-02-10T00:00:00,...,1,830692.0,Ian Bunting,8158691.0,http://cdn.usatsimg.com/api/download/?imageID=...,http://cdn.usatsimg.com/api/download/?imageID=...,2020-08-30T11:01:46,2020-08-30T11:01:53,,[]
1,21266,ARI,,Shaq,Calhoun,G,Practice Squad,"6'3""",310,1996-02-20T00:00:00,...,1,,,8158697.0,,,,,,[]
2,21822,ARI,,Cole,McDonald,QB,Practice Squad,"6'4""",220,1998-05-20T00:00:00,...,1,,,6307159.0,,,,,,[]
3,15854,ARI,,Brett,Maher,K,Practice Squad,"6'0""",190,1989-11-21T00:00:00,...,1,,,8158859.0,http://cdn.usatsimg.com/api/download/?imageID=...,http://cdn.usatsimg.com/api/download/?imageID=...,2018-11-20T19:12:51,2018-11-20T19:12:53,,[]
4,16911,ARI,,David,Parry,DT,Practice Squad,"6'1""",308,1992-03-07T00:00:00,...,1,,,8158901.0,http://cdn.usatsimg.com/api/download/?imageID=...,http://cdn.usatsimg.com/api/download/?imageID=...,2016-10-06T22:36:46,2018-05-14T08:59:53,,[]


In [43]:
nfl_df.to_csv('nfl_df.csv', index=False)