# // Getting Individual Player Data
___
The first step in getting this project underway is going to be getting **massive** amounts of NFL data from the web. I will be working in this notebook to "show my work" and for others to learn how to if they're curious. Ultimately, I'll also turn it into a regular `.py` Python script that you can run if you're so inclined.

For that, we're going to rely on the `requests` and `BeautifulSoup` libraries to glean information from:

- [FantasyPros](https://www.fantasypros.com/)
- [Pro Football Reference](https://www.pro-football-reference.com/)
- [FFToday](http://www.fftoday.com/stats/)
- [The Football Database](https://www.footballdb.com/fantasy-football/index.html)

**Editor's Note:** After researching several different options for quite some time and attempting to use many of them, I will be going with **FantasyPros** for reasons outlined below.

In [40]:
# Importing our necessary libraries
import requests
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import requests
import re

from bs4 import BeautifulSoup
from selenium import webdriver
from time import sleep
%matplotlib inline

## FantasyPros
I'm choosing to go with this FantasyPros for the following reasons:
- You don't have to register to access the data
- Separately, I am currently a user and like their site overall
- Has available .5 PPR scoring data (FanDuel does .5 PPR so DFS appeal, and it will be most applicable for most leagues)
- The Football Database seemed like it was missing quite a few players from each week
- FFToday scoring was inconsistent in many places for .5 PPR scoring

#### Following the FanDuel vein, since Daily Fantasy Sports (DFS) is where the biggest value is going to be coming from, I will be scraping data for the following positions and referring to them from here on out by the name in parentheses:
1. Quarterback (QB)
2. Running Back (RB)
3. Wide Receiver (WR)
4. Tight End (TE)
5. Defense / Special Teams (DST)


> - For a more detailed breakdown on positional scoring, please reference FanDuel's scoring and rules reference [here](https://www.fanduel.com/rules).
- For a quick reference on what each column means, check out ESPN's stat reference [here](http://www.espn.com/nfl/news/story?id=2128923).

___
### Scraping for our QBs
Going to walk through the specific steps for this first position then blast through the others.

• Let's make some empty lists to throw all of our data into:

In [41]:
player = []
pass_comp = []
pass_att = []
pass_pct = []
pass_yds = []
yds_per_att = []
pass_TD = []
pass_INT = []
sacks_taken = []
rush_att = []
rush_yds = []
rush_TD = []
fumbles_lost = []
active = []
fpoints = []
fpoints_g = []
own_pct = []

In [42]:
# Make sure to incoporate this into the scrape!
week = []
year = []

• And a list of those lists to make `dataframe` creation nice and easy:

In [43]:
qb_stats_lists = [player, pass_comp, pass_att, pass_pct, pass_yds, 
                  yds_per_att, pass_TD, pass_INT, sacks_taken, 
                  rush_att, rush_yds, rush_TD, fumbles_lost, 
                  active, fpoints, fpoints_g, own_pct]



• Doing the actual scrape by going through each week of the season (via each URL iteration), grabbing the right `table`, iterating through each `row` and then `cell`, and putting that data into the appropriate `lists`.

In [44]:
# First let's add a quick little variable for our current week in the 2018 NFL season
# res = requests.get('https://www.fantasypros.com/nfl/')
# soup = BeautifulSoup(res.content, 'lxml')
# current_week = [int(s) for s in soup.find_all('span', {'class': 'item-title'})[8].text.split() if s.isdigit()][0]

# the above code worked during the season but now due to changes on the site, we can just leave it as follows:
current_week = 18

In [45]:
for week_number in range(1,current_week):

    res = requests.get('https://www.fantasypros.com/nfl/stats/qb.php?week={}&scoring=HALF&range=week'.format(week_number))
    soup = BeautifulSoup(res.content, 'lxml')

    for row in soup.find('div', {'class':'mobile-table'}).find('tbody').find_all('tr'):
        cells = row.find_all('td')
        for index, selection in enumerate(qb_stats_lists):
            selection.append(cells[index].text.lstrip().strip())
        # I want "week" to become a unique identifier to bring in opponents
        week.append(week_number)
    # And adding a 1 second delay 
    sleep(1)

• Let's make an empty `dataframe` and then fill it using our list of lists:

In [46]:
qbdf = pd.DataFrame(columns= ['player', 'pass_comp', 'pass_att', 'pass_pct',
                              'pass_yds', 'yds_per_att', 'pass_TD', 'pass_INT', 
                              'sacks_taken', 'rush_att', 'rush_yds', 'rush_TD', 
                              'fumbles_lost', 'active', 'fpoints', 'fpoints_g', 'own_pct'])
qbdf

Unnamed: 0,player,pass_comp,pass_att,pass_pct,pass_yds,yds_per_att,pass_TD,pass_INT,sacks_taken,rush_att,rush_yds,rush_TD,fumbles_lost,active,fpoints,fpoints_g,own_pct


In [47]:
for index, column in enumerate(qbdf.columns):
    qbdf[column] = qb_stats_lists[index]

• Let's add on our `week` and `year` information as well as a separate `team` column from our `player` column (to later be used to bring in opponent data).

In [48]:
# Week
qbdf['week'] = week
qbdf['week'] = qbdf.week.astype(str)

In [49]:
team = []
for individual in qbdf['player']:
    team.append(re.findall('\(([^\)]+)\)', individual)[0])
qbdf['team'] = team

• And let's just clean up that `player` name a bit too with another `regex`:

In [50]:
qbdf['player'] = [ind[0] for ind in [re.findall('^.*?(?=\s\()', quarterback) for quarterback in qbdf['player']]]

• While we're at the cleanup, we're going to need to do some with the `own_pct`:

In [51]:
qbdf['own_pct'] = qbdf.own_pct.apply(lambda x: x.strip('%'))
qbdf.own_pct.replace(to_replace='', value='0.0', inplace=True)

• Since most if not all of these values were actually read in as `strings`, we'll need to convert them to numeric columns (with the exceptions of `player`, `team`, and `week`.

In [52]:
for x in [col for col in qbdf.columns if col not in ['player','week', 'team', 'year']]:
    qbdf[x] = qbdf[x].astype(float)

**Since we're going to likely be wanting to clean up our other dataframes in the same way, let's make a function to clean up our future code:**

In [53]:
def table_cleaner(df):
    # Adding 'week' as a string
    df['week'] = week
    df['week'] = df.week.astype(str)
    
    # Cleaning 'team'
    team = []
    for individual in df['player']:
        team.append(re.findall('\(([^\)]+)\)', individual)[0])
    df['team'] = team
    
    # Removing team from 'player'
    df['player'] = [ind[0] for ind in [re.findall('^.*?(?=\s\()', person) for person in df['player']]]    
    
    # Converting 'own_pct' to float
    df['own_pct'] = df.own_pct.apply(lambda x: x.strip('%'))
    df.own_pct.replace(to_replace='', value='0.0', inplace=True)    
    
    # Converting everything else to float where appropriate
    for x in [col for col in df.columns if col not in ['player','week', 'team', 'year']]:
        df[x] = df[x].astype(float)
    
    return df

In [54]:
# Checking to see if we'll be able to manipulate it how we want
qbdf.sort_values('fpoints', ascending=False).head(10)

Unnamed: 0,player,pass_comp,pass_att,pass_pct,pass_yds,yds_per_att,pass_TD,pass_INT,sacks_taken,rush_att,rush_yds,rush_TD,fumbles_lost,active,fpoints,fpoints_g,own_pct,week,team
486,Mitch Trubisky,19.0,26.0,73.1,354.0,13.6,6.0,0.0,1.0,3.0,53.0,0.0,0.0,1.0,43.5,43.5,75.9,4,CHI
1939,Aaron Rodgers,37.0,55.0,67.3,442.0,8.0,2.0,0.0,4.0,5.0,32.0,2.0,0.0,1.0,42.9,42.9,99.1,16,GB
32,Ryan Fitzpatrick,21.0,28.0,75.0,417.0,14.9,4.0,0.0,0.0,12.0,36.0,1.0,0.0,1.0,42.3,42.3,8.5,1,TB
276,Drew Brees,39.0,49.0,79.6,396.0,8.1,3.0,0.0,1.0,3.0,7.0,2.0,0.0,1.0,40.5,40.5,97.1,3,NO
2184,Josh Allen,17.0,26.0,65.4,224.0,8.6,3.0,1.0,1.0,9.0,95.0,2.0,0.0,1.0,40.5,40.5,33.3,17,BUF
288,Matt Ryan,26.0,35.0,74.3,374.0,10.7,5.0,0.0,3.0,4.0,12.0,0.0,0.0,1.0,40.2,40.2,95.9,3,ATL
474,Jared Goff,26.0,33.0,78.8,465.0,14.1,5.0,0.0,1.0,2.0,7.0,0.0,0.0,1.0,39.3,39.3,95.6,4,LAR
132,Ben Roethlisberger,39.0,60.0,65.0,452.0,7.5,3.0,0.0,1.0,2.0,9.0,1.0,0.0,1.0,39.0,39.0,96.6,2,PIT
230,Patrick Mahomes,23.0,28.0,82.1,326.0,11.6,6.0,0.0,1.0,5.0,18.0,0.0,0.0,1.0,38.8,38.8,99.2,2,KC
2037,Deshaun Watson,29.0,40.0,72.5,339.0,8.5,2.0,0.0,4.0,8.0,49.0,2.0,1.0,1.0,36.5,36.5,96.8,16,HOU


In [55]:
qbdf.week.unique()

array(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12',
       '13', '14', '15', '16', '17'], dtype=object)

Through Week 5, no player has posted 2 of the season's Top 10 QB performances — yay parity!

In [56]:
qbdf.groupby('player')['fpoints'].mean().sort_values(ascending=False)[0:10]

player
Patrick Mahomes       24.529412
Matt Ryan             20.852941
Ben Roethlisberger    20.088235
Deshaun Watson        19.523529
Andrew Luck           19.270588
Aaron Rodgers         18.382353
Jared Goff            18.252941
Drew Brees            17.900000
Russell Wilson        17.582353
Dak Prescott          16.805882
Name: fpoints, dtype: float64

Through Week 5, we see some familiar suspects in the Top 10 points per game, but also some young guns like Mahomes, Goff, Watson, Cousins, and a resurgence for Luck.

• Now let's dump everything we have so far into a `csv` file that we can save to our current working directory:

In [57]:
qbdf.to_csv('./data/qb_stats_2018.csv')

___
### Running Backs
(Note that the tables are structured significantly differently for each position otherwise we would be able to use one nice and clean single scrape).

In [58]:
player = []

rush_att = []
rush_yds = []
rush_ypc = []
long = []
over_20 = []
rush_TD = []

receptions = []
targets = []
rec_yds = []
rec_ypr = []
rec_TD = []

fumbles_lost = []
active = []
fpoints = []
fpoints_g = []
own_pct = []

week = []
year = []

In [59]:
rb_stats_lists = [player,rush_att,rush_yds,rush_ypc,long,over_20,rush_TD,
                  receptions,targets,rec_yds,rec_ypr,rec_TD,
                  fumbles_lost,active,fpoints,fpoints_g,own_pct]

for week_number in range(1,current_week):
    res = requests.get('https://www.fantasypros.com/nfl/stats/rb.php?week={}&scoring=HALF&range=week'.format(week_number))
    soup = BeautifulSoup(res.content, 'lxml')
    for row in soup.find('div', {'class':'mobile-table'}).find('tbody').find_all('tr'):
        cells = row.find_all('td')
        for index, selection in enumerate(rb_stats_lists):
            selection.append(cells[index].text.lstrip().strip())
        # I want "week" to become a unique identifier to bring in opponents
        week.append(week_number)
    sleep(1)

rbdf = pd.DataFrame(columns= ['player','rush_att','rush_yds','rush_ypc','long','over_20','rush_TD',
                  'receptions','targets','rec_yds','rec_ypr','rec_TD',
                  'fumbles_lost','active','fpoints','fpoints_g','own_pct'])
for index, column in enumerate(rbdf.columns):
    rbdf[column] = rb_stats_lists[index]

rbdf = table_cleaner(rbdf)

In [60]:
rbdf.to_csv('./data/rb_stats_2018.csv')

In [61]:
# Little spot check to make sure we're Gucci
rbdf.loc[rbdf['fpoints'] == rbdf.fpoints.max(), :]

Unnamed: 0,player,rush_att,rush_yds,rush_ypc,long,over_20,rush_TD,receptions,targets,rec_yds,rec_ypr,rec_TD,fumbles_lost,active,fpoints,fpoints_g,own_pct,week,team
3741,Derrick Henry,17.0,238.0,14.0,99.0,2.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,47.8,47.8,95.4,14,TEN


___
### Wide Receivers

In [62]:
player = []

receptions = []
targets = []
rec_yds = []
rec_ypr = []
long = []
over_20 = []
rec_TD = []

rush_att = []
rush_yds = []
rush_TD = []

fumbles_lost = []
active = []
fpoints = []
fpoints_g = []
own_pct = []

week = []
year = []

In [63]:
wr_stats_lists = [player, receptions, targets, rec_yds, rec_ypr, long, over_20, rec_TD,
                  rush_att, rush_yds, rush_TD,
                  fumbles_lost, active, fpoints, fpoints_g, own_pct]

for week_number in range(1,current_week):
    res = requests.get('https://www.fantasypros.com/nfl/stats/wr.php?week={}&scoring=HALF&range=week'.format(week_number))
    soup = BeautifulSoup(res.content, 'lxml')
    for row in soup.find('div', {'class':'mobile-table'}).find('tbody').find_all('tr'):
        cells = row.find_all('td')
        for index, selection in enumerate(wr_stats_lists):
            selection.append(cells[index].text.lstrip().strip())
        # I want "week" to become a unique identifier to bring in opponents
        week.append(week_number)
    sleep(1)

wrdf = pd.DataFrame(columns= ['player', 'receptions', 'targets', 'rec_yds', 'rec_ypr', 'long', 'over_20', 'rec_TD',
                  'rush_att', 'rush_yds', 'rush_TD',
                  'fumbles_lost', 'active', 'fpoints', 'fpoints_g', 'own_pct'])
for index, column in enumerate(wrdf.columns):
    wrdf[column] = wr_stats_lists[index]

wrdf = table_cleaner(wrdf)

In [64]:
wrdf.to_csv('./data/wr_stats_2018.csv')

In [65]:
# Andddd our usual sanity check
wrdf.loc[wrdf['fpoints'] > 30, :]

Unnamed: 0,player,receptions,targets,rec_yds,rec_ypr,long,over_20,rec_TD,rush_att,rush_yds,rush_TD,fumbles_lost,active,fpoints,fpoints_g,own_pct,week,team
220,Tyreek Hill,7.0,8.0,169.0,24.1,58.0,5.0,2.0,2.0,4.0,0.0,0.0,1.0,32.8,32.8,100.0,1,KC
568,Stefon Diggs,9.0,13.0,128.0,14.2,75.0,1.0,2.0,1.0,1.0,0.0,0.0,1.0,31.4,31.4,99.7,2,MIN
1102,Calvin Ridley,7.0,8.0,146.0,20.9,75.0,1.0,3.0,1.0,9.0,0.0,0.0,1.0,37.0,37.0,83.9,3,ATL
1427,Cooper Kupp,9.0,11.0,162.0,18.0,70.0,2.0,2.0,0.0,0.0,0.0,0.0,1.0,32.7,32.7,16.6,4,LAR
2093,Davante Adams,10.0,16.0,132.0,13.2,38.0,1.0,2.0,0.0,0.0,0.0,0.0,1.0,30.2,30.2,99.4,6,GB
2176,Tyreek Hill,7.0,12.0,142.0,20.3,75.0,2.0,3.0,1.0,0.0,0.0,0.0,1.0,35.7,35.7,100.0,6,KC
3318,Michael Thomas,12.0,15.0,211.0,17.6,72.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,33.1,33.1,99.7,9,NO
3998,T.Y. Hilton,9.0,9.0,155.0,17.2,68.0,1.0,2.0,0.0,0.0,0.0,0.0,1.0,32.0,32.0,99.0,11,IND
4113,Tyreek Hill,10.0,14.0,215.0,21.5,73.0,5.0,2.0,0.0,0.0,0.0,0.0,1.0,38.5,38.5,100.0,11,KC
4442,Amari Cooper,8.0,9.0,180.0,22.5,90.0,2.0,2.0,0.0,0.0,0.0,0.0,1.0,34.0,34.0,98.5,12,DAL


___
### Tight Ends

In [66]:
player = []

receptions = []
targets = []
rec_yds = []
rec_ypr = []
long = []
over_20 = []
rec_TD = []

rush_att = []
rush_yds = []
rush_TD = []

fumbles_lost = []
active = []
fpoints = []
fpoints_g = []
own_pct = []

week = []
year = []

In [67]:
te_stats_lists = [player, receptions, targets, rec_yds, rec_ypr, long, over_20, rec_TD,
                  rush_att, rush_yds, rush_TD,
                  fumbles_lost, active, fpoints, fpoints_g, own_pct]

for week_number in range(1,current_week):
    res = requests.get('https://www.fantasypros.com/nfl/stats/te.php?week={}&scoring=HALF&range=week'.format(week_number))
    soup = BeautifulSoup(res.content, 'lxml')
    for row in soup.find('div', {'class':'mobile-table'}).find('tbody').find_all('tr'):
        cells = row.find_all('td')
        for index, selection in enumerate(te_stats_lists):
            selection.append(cells[index].text.lstrip().strip())
        # I want "week" to become a unique identifier to bring in opponents
        week.append(week_number)
    sleep(1)

tedf = pd.DataFrame(columns= ['player', 'receptions', 'targets', 'rec_yds', 'rec_ypr', 'long', 'over_20', 'rec_TD',
                  'rush_att', 'rush_yds', 'rush_TD',
                  'fumbles_lost', 'active', 'fpoints', 'fpoints_g', 'own_pct'])
for index, column in enumerate(tedf.columns):
    tedf[column] = te_stats_lists[index]

tedf = table_cleaner(tedf)

In [68]:
tedf.to_csv('./data/te_stats_2018.csv')

In [69]:
# Chiggity check it
tedf.loc[tedf['fpoints'] > 25, :]

Unnamed: 0,player,receptions,targets,rec_yds,rec_ypr,long,over_20,rec_TD,rush_att,rush_yds,rush_TD,fumbles_lost,active,fpoints,fpoints_g,own_pct,week,team
296,Travis Kelce,7.0,10.0,109.0,15.6,31.0,2.0,2.0,0.0,0.0,0.0,0.0,1.0,26.4,26.4,100.0,2,KC
697,Jared Cook,8.0,13.0,110.0,13.8,24.0,1.0,2.0,0.0,0.0,0.0,0.0,1.0,27.0,27.0,93.5,4,OAK
984,Eric Ebron,9.0,15.0,105.0,11.7,28.0,1.0,2.0,0.0,0.0,0.0,0.0,1.0,27.0,27.0,96.2,5,IND
1863,Travis Kelce,7.0,9.0,99.0,14.1,21.0,1.0,2.0,0.0,0.0,0.0,0.0,1.0,25.4,25.4,100.0,9,KC
2085,Zach Ertz,14.0,16.0,145.0,10.4,23.0,2.0,2.0,0.0,0.0,0.0,0.0,1.0,33.5,33.5,100.0,10,PHI
2097,Eric Ebron,3.0,3.0,69.0,23.0,53.0,1.0,2.0,1.0,2.0,1.0,0.0,1.0,26.6,26.6,96.2,10,IND
2748,Travis Kelce,12.0,13.0,168.0,14.0,28.0,4.0,2.0,0.0,0.0,0.0,1.0,1.0,32.8,32.8,100.0,13,KC
3062,George Kittle,7.0,9.0,210.0,30.0,85.0,3.0,1.0,0.0,0.0,0.0,0.0,1.0,30.5,30.5,98.1,14,SF
3396,Kyle Rudolph,9.0,9.0,122.0,13.6,44.0,1.0,2.0,0.0,0.0,0.0,0.0,1.0,28.7,28.7,80.3,16,MIN
3412,Zach Ertz,12.0,16.0,110.0,9.2,23.0,2.0,2.0,0.0,0.0,0.0,0.0,1.0,29.0,29.0,100.0,16,PHI


___
### Defense / Special Teams

In [70]:
player = []

sacks = []
INTs = []
fum_rec = []
force_fum = []
def_td = []
safeties = []
spc_TD = []

active = []
fpoints = []
fpoints_g = []
own_pct = []

week = []
year = []

In [71]:
dst_stats_lists = [player, sacks, INTs, fum_rec, force_fum, 
                   def_td, safeties, spc_TD, 
                   active, fpoints, fpoints_g, own_pct]

for week_number in range(1,current_week):
    res = requests.get('https://www.fantasypros.com/nfl/stats/dst.php?week={}&scoring=HALF&range=week'.format(week_number))
    soup = BeautifulSoup(res.content, 'lxml')
    for row in soup.find('div', {'class':'mobile-table'}).find('tbody').find_all('tr'):
        cells = row.find_all('td')
        for index, selection in enumerate(dst_stats_lists):
            selection.append(cells[index].text.lstrip().strip())
        # I want "week" to become a unique identifier to bring in opponents
        week.append(week_number)
    sleep(1)

dstdf = pd.DataFrame(columns= ['player', 'sacks', 'INTs', 'fum_rec', 'force_fum', 
                   'def_td', 'safeties', 'spc_TD', 
                   'active', 'fpoints', 'fpoints_g', 'own_pct'])
for index, column in enumerate(dstdf.columns):
    dstdf[column] = dst_stats_lists[index]

dstdf = table_cleaner(dstdf)

In [72]:
dstdf.to_csv('./data/dst_stats_2018.csv')

In [73]:
dstdf.loc[dstdf.fpoints >= 20, :].sort_values('fpoints', ascending = False)

Unnamed: 0,player,sacks,INTs,fum_rec,force_fum,def_td,safeties,spc_TD,active,fpoints,fpoints_g,own_pct,week,team
201,Denver Broncos,6.0,3.0,2.0,3.0,2.0,0.0,0.0,1.0,32.0,32.0,0.0,7,DEN
261,Chicago Bears,4.0,3.0,1.0,1.0,2.0,0.0,0.0,1.0,28.0,28.0,0.0,9,CHI
272,Miami Dolphins,4.0,4.0,0.0,0.0,1.0,0.0,0.0,1.0,25.0,25.0,0.0,9,MIA
527,Kansas City Chiefs,3.0,2.0,2.0,3.0,1.0,0.0,0.0,1.0,24.0,24.0,0.0,17,KC
107,Green Bay Packers,7.0,2.0,1.0,1.0,0.0,0.0,0.0,1.0,23.0,23.0,0.0,4,GB
530,New England Patriots,4.0,0.0,3.0,3.0,1.0,0.0,0.0,1.0,23.0,23.0,0.0,17,NE
143,Kansas City Chiefs,5.0,4.0,1.0,1.0,1.0,0.0,0.0,1.0,22.0,22.0,0.0,5,KC
134,Cincinnati Bengals,3.0,2.0,1.0,1.0,2.0,0.0,0.0,1.0,22.0,22.0,0.0,5,CIN
273,Minnesota Vikings,10.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,22.0,22.0,0.0,9,MIN
162,Baltimore Ravens,11.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,21.0,21.0,0.0,6,BAL


___
# // Wrap-up
___
In this notebook, we brought in game-level stats for individual positions through the current week of the 2018 season and saved them to our local machine.

Overall, we need to bring in:
- ~~QBs~~
- ~~RBs~~
- ~~WRs~~
- ~~TEs~~
- ~~DSTs~~
- Opponents (including game location)
- Defensive stats
- Team play-calling
- Depth charts?
- Weather