# // Getting NFL Data
___
The first step in getting this project underway is going to be getting **massive** amounts of NFL data from the web. I will be working in this notebook to "show my work" and for others to learn how to if they're curious. Ultimately, I'll also turn it into a regular `.py` Python script that you can run if you're so inclined.

For that, we're going to rely on the `requests` and `BeautifulSoup` libraries to glean information from:

- [FantasyPros](https://www.fantasypros.com/)
- [Pro Football Reference](https://www.pro-football-reference.com/)
- [FFToday](http://www.fftoday.com/stats/)
- [The Football Database](https://www.footballdb.com/fantasy-football/index.html)

**Editor's Note:** After researching several for quite some time and attempting to use many of them, I will be going with **FantasyPros** for reasons outlined below.

In [11]:
# Importing our necessary libraries
import requests
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import requests
import re

from bs4 import BeautifulSoup
from selenium import webdriver
from time import sleep
%matplotlib inline

## FantasyPros
I'm choosing to go with this FantasyPros for the following reasons:
- You don't have to register to access the data
- Separately, I am currently a user and like their site overall
- Has available .5 PPR scoring data (FanDuel does .5 PPR so DFS appeal, and it will be most applicable for most leagues)
- The Football Database seemed like it was missing quite a few players from each week
- FFToday scoring was inconsistent in many places for .5 PPR scoring

#### Following the FanDuel vein, since Daily Fantasy Sports (DFS) is where the biggest value is going to be coming from, I will be scraping data for the following positions and referring to them from here on out by the name in parentheses:
1. Quarterback (QB)
2. Running Back (RB)
3. Wide Receiver (WR)
4. Tight End (TE)
5. Defense / Special Teams (DST)


> - For a more detailed breakdown on positional scoring, please reference FanDuel's scoring and rules reference [here](https://www.fanduel.com/rules).
- For a quick reference on what each column means, check out ESPN's stat reference [here](http://www.espn.com/nfl/news/story?id=2128923).

### Scraping for our QBs

1. Let's make some empty lists to throw all of our data into:

In [2]:
player = []
pass_comp = []
pass_att = []
pass_pct = []
pass_yds = []
yds_per_att = []
pass_TD = []
pass_INT = []
sacks_taken = []
rush_att = []
rush_yds = []
rush_TD = []
fumbles_lost = []
active = []
fpoints = []
own_pct = []

In [3]:
# Make sure to incoporate this into the scrape!
week = []

2. And a list of those lists to make `dataframe` creation nice and easy:

In [4]:
qb_stats_lists = [player, pass_comp, pass_att, pass_pct, pass_yds, yds_per_att, pass_TD, pass_INT, 
                    sacks_taken, rush_att, rush_yds, rush_TD, fumbles_lost, active, fpoints, own_pct]

3. Doing the actual scrape by going through each week of the season (via each URL iteration), grabbing the right `table`, iterating through each `row` and then `cell`, and putting that data into the appropriate `lists`.

In [5]:
for week_number in range(1,6):

    res = requests.get('https://www.fantasypros.com/nfl/stats/qb.php?week={}&scoring=HALF&range=week'.format(week_number))
    soup = BeautifulSoup(res.content, 'lxml')

    for row in soup.find('div', {'class':'mobile-table'}).find('tbody').find_all('tr'):
        cells = row.find_all('td')
        for index, selection in enumerate(qb_stats_lists):
            selection.append(cells[index].text.lstrip().strip())
            # I want "week" to become a unique identifier to bring in opponents
        week.append(week_number)

4. Let's make an empty `dataframe` and then fill it using our list of lists:

In [6]:
qbdf = pd.DataFrame(columns= ['player', 'pass_comp', 'pass_att', 'pass_pct',
                              'pass_yds', 'yds_per_att', 'pass_TD', 'pass_INT', 
                              'sacks_taken', 'rush_att', 'rush_yds', 'rush_TD', 
                              'fumbles_lost', 'active', 'fpoints', 'own_pct'])
qbdf

Unnamed: 0,player,pass_comp,pass_att,pass_pct,pass_yds,yds_per_att,pass_TD,pass_INT,sacks_taken,rush_att,rush_yds,rush_TD,fumbles_lost,active,fpoints,own_pct


In [7]:
for index, column in enumerate(qbdf.columns):
    qbdf[column] = qb_stats_lists[index]

5. Let's add on our `week` information and also separate a `team` column from our `player` column (to later be used to bring in opponent data).

In [9]:
qbdf['week'] = week

In [10]:
qbdf.head()

Unnamed: 0,player,pass_comp,pass_att,pass_pct,pass_yds,yds_per_att,pass_TD,pass_INT,sacks_taken,rush_att,rush_yds,rush_TD,fumbles_lost,active,fpoints,own_pct,week
0,Aaron Rodgers (GB),20,30,66.7,286,9.5,3,0,2,1,15,0,0,1,24.9,24.9,1
1,Alex Smith (WAS),21,30,70.0,255,8.5,2,0,3,8,14,0,0,1,19.6,19.6,1
2,Ben Roethlisberger (PIT),23,41,56.1,335,8.2,1,3,4,3,16,0,2,1,9.0,9.0,1
3,Brett Ratliff (TEN),0,0,0.0,0,0.0,0,0,0,0,0,0,0,0,0.0,0.0,1
4,Brian Brohm (BUF),0,0,0.0,0,0.0,0,0,0,0,0,0,0,0,0.0,0.0,1


In [20]:
team = []
for individual in qbdf['player']:
    team.append(re.findall('\(([^\)]+)\)', individual)[0])
qbdf['team'] = team

6. Since most if not all of these values were actually read in as `strings`, we'll need to convert them to numeric columns (with the exceptions of `player`, `team`, and `week`.

In [None]:
for x in [col for col in qbdf.columns if col not in ['player', 'team' 'week']]:
    qbdf[x] = qbdf[x].a

### Setting up the scraping for the QB iteration of our scrape:

1. Making an empty `dataframe` to throw our scrape into:

In [2]:
qbs_df = pd.DataFrame(columns= ['player', 'team', 'week', 'pass_comp',
                                'pass_att', 'pass_pct', 'pass_yds', 'yds_per_att', 'pass_TD', 'pass_INT', 'sacks_taken',
                                'rush_att', 'rush_yds', 'rush_TD', 'fumbles_lost', 'active', 'fpoints'])
qbs_df

Unnamed: 0,player,team,week,pass_comp,pass_att,pass_yds,pass_TD,pass_INT,rush_att,rush_yds,rush_TD,fpoints


2. Making some empty `lists` to fill with stats:

In [70]:
player = []
team = []
week = []
pass_comp = []
pass_att = []
pass_yds = []
pass_TD = []
pass_INT = []
rush_att = []
rush_yds = []
rush_TD = []
fpoints = []

In [71]:
qb_columns_lists = [player, team, week, pass_comp, pass_att, pass_yds, pass_TD, pass_INT, 
                    rush_att, rush_yds, rush_TD, fpoints]

3. Doing the actual scrape by going through each week of the season (via each URL iteration), grabbing the right `table`, iterating through each `row` and then `cell`, and putting that data into the appropriate `lists`.

In [72]:
res = requests.get('http://www.fftoday.com/stats/playerstats.php?Season=2018&GameWeek=1&PosID=10&LeagueID=193033')
soup = BeautifulSoup(res.content, 'lxml')


# for number in range(1,6):
res = requests.get('http://www.fftoday.com/stats/playerstats.php?Season=2018&GameWeek={}&PosID=10&LeagueID=193033'.format(number))
soup = BeautifulSoup(res.content, 'lxml')
for row in soup.find('table', {'cellpadding':2}).find_all('tr')[2:]:
    cells = row.find_all('td')
    for index, selection in enumerate(qb_columns_lists):
        selection.append(cells[index].text.strip())
        # The "week" column from our website is actually always 1, and I'm not sure what it does
        # I want "week" to become a unique identifier to bring in opponents
        week.pop()
        week.append(number)

In [73]:
fpoints

['31.5',
 '30.7',
 '30.6',
 '28.7',
 '26.6',
 '26.6',
 '23.6',
 '23.1',
 '23.0',
 '23.0',
 '21.4',
 '21.2',
 '20.4',
 '20.4',
 '20.0',
 '19.9',
 '19.8',
 '18.3',
 '17.7',
 '16.4',
 '16.1',
 '15.7',
 '15.2',
 '15.2',
 '14.9',
 '13.2',
 '13.1',
 '11.2',
 '10.5',
 '8.3',
 '6.2',
 '1.0',
 '0.8',
 '-0.3',
 '-0.3']

4. Cleaning up a little kerfuffle in the `player` column.

In [6]:
player = [individual[3:].lstrip() for individual in player]
qb_columns_lists[0] = player

5. Filling our `dataframe` with all of the appropriate `lists`.

In [7]:
for index, column in enumerate(qbs_df.columns):
    qbs_df[str(column)] = qb_columns_lists[index]

In [8]:
qbs_df.head()

Unnamed: 0,player,team,week,pass_comp,pass_att,pass_yds,pass_TD,pass_INT,rush_att,rush_yds,rush_TD,fpoints
0,Ryan Fitzpatrick,TB,1,21,28,417,4,0,12,36,1,42.3
1,Drew Brees,NO,1,37,45,439,3,0,0,0,0,29.6
2,Philip Rivers,LAC,1,34,51,424,3,1,0,0,0,29.0
3,Patrick Mahomes,KC,1,15,27,256,4,0,5,21,0,28.3
4,Tyrod Taylor,CLE,1,15,40,197,1,1,8,77,1,25.6


- Spot checking with a single player:

In [9]:
qbs_df[qbs_df['player'] == 'Tom Brady']

Unnamed: 0,player,team,week,pass_comp,pass_att,pass_yds,pass_TD,pass_INT,rush_att,rush_yds,rush_TD,fpoints
8,Tom Brady,NE,1,26,39,277,3,1,1,2,0,23.3
52,Tom Brady,NE,2,24,35,234,2,0,3,10,0,18.4
98,Tom Brady,NE,3,14,26,133,1,1,1,2,0,9.5
119,Tom Brady,NE,4,23,35,274,3,2,0,0,0,23.0
145,Tom Brady,NE,5,34,44,341,3,2,3,-1,1,31.5


6. Alright! Looking like what we've got so far is working! Except...

In [10]:
qbs_df.dtypes

player       object
team         object
week          int64
pass_comp    object
pass_att     object
pass_yds     object
pass_TD      object
pass_INT     object
rush_att     object
rush_yds     object
rush_TD      object
fpoints      object
dtype: object

In [11]:
for col in qbs_df.columns[2:]:
    qbs_df[col] = qbs_df[col].astype(float)

In [12]:
qbs_df.dtypes

player        object
team          object
week         float64
pass_comp    float64
pass_att     float64
pass_yds     float64
pass_TD      float64
pass_INT     float64
rush_att     float64
rush_yds     float64
rush_TD      float64
fpoints      float64
dtype: object

In [13]:
# Saving our dataframe to a 'csv' file in the current directory
qbs_df.to_csv("quarterback_stats.csv")

### Running Backs: