# // Getting Team-Level Statistics
___
In our [first notebook](https://github.com/dodgemcintosh/FFML/blob/master/1-Getting-NFL-Data.ipynb), we used several web scraping techniques to bring in individual player statistics from each game in the 2018 season.

In this notebook, we're going to be bringing in some team-level stats from both the defensive and offensive sides of the ball that should prove to be useful features for our predictive modeling. Some of those features are:
- Defensive formations
- Offensive play distribution
- Weather conditions
- And more!

While we're going to continue to use our good friends resources over at [FantasyPros](https://www.fantasypros.com/nfl/), we'll also be bringing these other friends to the party:
- [The Football Database](https://www.footballdb.com/)
- [Fantasy Sports Doctors](http://fantasysportdrs.com/)
- [IDPGuru](http://www.idpguru.com/)
- [FantasyGuru](https://www.fantasyguru.com/)

In [1]:
# Let's import everything we're going to need

import requests
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import requests
import re

from bs4 import BeautifulSoup
from selenium import webdriver
from time import sleep
%matplotlib inline

## Going to start with defensive schemes and tendencies
- After searching around quite a bit, I decided that the most robust resource will actually require me to input some of the data by hand; the resulting `.csv` is explored briefly below:

In [6]:
df_schemes = pd.read_csv('./data/nfl_defensive_schemes_2018.csv')
df_schemes.head()

Unnamed: 0,team,division,head_coach,def_coordinator,base,tendency
0,ARI,NFC West,Steve Wilks,Al Holcomb,4-3,Zone-heavy mix
1,ATL,NFC South,Dan Quinn,Marquand Manuel,4-3,Cover-3 (Zone)
2,BAL,AFC North,John Harbaugh,Don Martindale,3-4,Mix of Man/Zone
3,BUF,AFC East,Sean McDermott,Leslie Frazier,4-3,Zone
4,CAR,NFC South,Ron Rivera,Eric Washington,4-3,Cover-3 (Zone)


In [8]:
# Doing one quick spot check to make sure that all divisions are represented equally
df_schemes.division.value_counts()

AFC South    4
AFC West     4
NFC North    4
NFC East     4
AFC East     4
AFC North    4
NFC West     4
NFC South    4
Name: division, dtype: int64

## Coolio, now let's switch to some offensive play selection through the current point in the season
- This will be drawn from [here](https://www.footballdb.com/stats/play-selection.html).

In [22]:
res = requests.get('https://www.footballdb.com/stats/play-selection.html')
soup = BeautifulSoup(res.content, 'lxml')

In [23]:
soup.prettify

<bound method Tag.prettify of <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access /stats/play-selection.html
on this server.</p>
<hr/>
<address>Apache/2.2.31 (Amazon) Server at www.footballdb.com Port 80</address>
</body></html>
>

**Looks like we're going to have to tweak our scrape just a tad...**

In [110]:
url = 'https://www.footballdb.com/stats/play-selection.html'
# Setting the user agent makes us seem like a human browser 
agent = {"User-Agent":'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.84 Safari/537.36'}
res = requests.get(url, headers=agent)
soup = BeautifulSoup(res.content, 'lxml')

In [112]:
team = []
games = []
plays = []
rushing = []
rush_pct = []
passing = []
pass_pct = []

In [113]:
team_offense_stats = [team, games, plays, rushing, rush_pct, passing, pass_pct]
clean_teams = []
for row in soup.find('table', {'class':'statistics scrollable'}).find('tbody').find_all('tr'):
    cells = row.find_all('td')
    for thing in cells[0].find_all('span', {'class':'visible-xs'}):
        clean_teams.append(thing.text)
    for index, selection in enumerate(team_offense_stats):
        selection.append(cells[index].text.lstrip().strip())

In [116]:
off_play_sel = pd.DataFrame(columns=['team', 'games', 'plays', 'rushing', 'rush_pct', 'passing', 'pass_pct'])
for index, column in enumerate(off_play_sel.columns):
    off_play_sel[column] = team_offense_stats[index]
off_play_sel.team = clean_teams

Unnamed: 0,team,games,plays,rushing,rush_pct,passing,pass_pct
0,Baltimore,6,455,174,38.2,281,61.8
1,Cleveland,6,441,178,40.4,263,59.6
2,Indianapolis,6,423,124,29.3,299,70.7
3,Green Bay,6,415,132,31.8,283,68.2
4,Philadelphia,6,415,156,37.6,259,62.4


In [122]:
for x in [col for col in off_play_sel.columns if col != 'team']:
    off_play_sel[x] = off_play_sel[x].astype(float)

Who are at the extremes of the league in `pass_pct`?

In [127]:
off_play_sel[(off_play_sel.pass_pct == off_play_sel.pass_pct.max()) | (off_play_sel.pass_pct == off_play_sel.pass_pct.min())]

Unnamed: 0,team,games,plays,rushing,rush_pct,passing,pass_pct
2,Indianapolis,6.0,423.0,124.0,29.3,299.0,70.7
20,Seattle,6.0,364.0,180.0,49.5,184.0,50.5


- Looks like the Colts pass the **most** while the Seahawks pass the **least** (presumably rushing the most).