# Defense Wins Championships

It has been said that defense wins championships. Well, more precisely, the saying goes, "Offense wins games, defense wins championships." So I want to take a look at how offenses and defenses affect the chances of a team winning a given game. I will take a look the past 10 seasons of NFL games as the basis for my study. The regular season games will be used to build two models: offensive and defensive. Then I will test the models with the playoff data from the same 10 seasons to see which model is more accurate which will tell us whether defense truly is what wins championships.

### Hypothesis

Having a good offense is effective in winning regular season games, but in order to win championships, rather, playoff games leading to and including the championship game, defense will be what makes the difference.

### Method

I will gather offensive and defensive stats from the past 10 NFL seasons. They will be divided into regular season and postseason(playoff) games. The regular season games will be the dataset on which I will be training the models and the postseason data will be used to test the models. Each NFL season has 256 games (16 games/week x 16 weeks/season) for a total of 2560 games. For the purposes of this exercise, I will randomly pick 1000 games since the method of gathering each game data is labor intensive. In the postseason, 12 teams qualify and play a total of 11 games(4-wild card, 4-divisional, 2-conference, and the Super Bowl). That equates to 110 postseason games which would be roughly 10% of the size of the training dataset. Then I will compare the offensive model prediction accuracy with that of the defensive model to prove that defense is what makes a championship winning team.

### Data

The method which I will be gathering the data will be brute-force which is why it will be labor-intensive. I will be using the website www.pro-football-reference.com to convert the game stats to csv by copying the offense and defense game data on to a spreadsheet and then exporting it as a .csv file.

In [2]:
# Let's load the necessary packages and the data
import numpy as np
import pandas as pd
import scipy
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import sklearn
from sklearn import ensemble
from sklearn import linear_model
from sklearn import svm
from sklearn import neighbors
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
import time

In [59]:
# import data
offense = pd.read_csv('../../../Data/offense.csv')
offense.head(5)

Unnamed: 0,L,Unnamed: 1,16,20,26,44,259,1,2,3,...,3.2,1.1,1.2,6,328,4,14,0.1,1.3,28:23
0,W,@,31,17,27,32,265,2,0,0,...,2,4,4,3,139,2,9,1,1,31:21
1,L,,21,34,27,46,226,2,5,2,...,0,3,3,4,210,4,10,1,2,27:02
2,W,@,41,21,22,33,256,5,1,1,...,2,5,5,4,166,8,14,0,0,33:16
3,W,,31,20,29,38,340,2,0,0,...,1,4,4,7,364,2,12,1,2,31:30
4,W,@,21,13,25,33,261,2,0,2,...,0,3,3,5,202,6,11,0,0,31:38


In [58]:
defense = pd.read_csv('../../../Data/defense.csv')
de.head(5)

Unnamed: 0,L,Unnamed: 1,16,20,26,44,259,1,2,3,...,3.2,1.1,1.2,6,328,4,14,0.1,1.3,28:23
0,W,@,31,17,27,32,265,2,0,0,...,2,4,4,3,139,2,9,1,1,31:21
1,L,,21,34,27,46,226,2,5,2,...,0,3,3,4,210,4,10,1,2,27:02
2,W,@,41,21,22,33,256,5,1,1,...,2,5,5,4,166,8,14,0,0,33:16
3,W,,31,20,29,38,340,2,0,0,...,1,4,4,7,364,2,12,1,2,31:30
4,W,@,21,13,25,33,261,2,0,2,...,0,3,3,5,202,6,11,0,0,31:38


In [6]:
# Let's gather some stats on the stats
print(df.shape)
print(df.columns)

(160, 29)
Index(['Unnamed: 0', 'Unnamed: 1', 'Tm', 'Opp', 'Cmp', 'Att', 'Yds', 'TD',
       'Int', 'Sk', 'Yds.1', 'Y/A', 'Cmp%', 'Rate', 'Att.1', 'Yds.2', 'Y/A.1',
       'TD.1', 'FGM', 'FGA', 'XPM', 'XPA', 'Pnt', 'Yds.3', '3DConv', '3DAtt',
       '4DConv', '4DAtt', 'ToP'],
      dtype='object')


In [44]:
# name unnamed columns
df.columns = ['win', 'venue', 'score', 'opp_score', 'completions', 'pass_att', 'pass_yds', 'pass_tds',
              'interceptions', 'sacks', 'yds_lost', 'yds_per_pass_att', 'comp_%', 'qb_rtg',
              'rush_att', 'rush_yds', 'yds_per_rush_att', 'rush_tds', 'fgs_made', 'fg_att',
              'x_pts_made', 'x_pt_att', 'punts', 'punt_yds', '3_down_conv', '3_down_att', 
              '4_down_conv', '4_down_att', 'time_of_poss']
df.head(5)

Unnamed: 0,win,venue,score,opp_score,completions,pass_att,pass_yds,pass_tds,interceptions,sacks,...,fg_att,x_pts_made,x_pt_att,punts,punt_yds,3_down_conv,3_down_att,4_down_conv,4_down_att,time_of_poss
0,W,,25,24,39,53,368,2,1,1,...,3,1,1,1,42,10,16,0,2,37:08
1,L,@,9,16,23,47,216,0,1,0,...,3,0,0,6,221,5,15,0,1,30:50
2,W,,26,10,25,42,277,1,0,0,...,4,2,2,2,87,8,18,3,3,39:49
3,W,,27,21,21,32,234,1,0,3,...,2,3,3,3,115,4,10,1,1,34:56
4,L,@,17,20,19,33,209,2,0,1,...,2,2,2,5,222,5,14,0,0,28:22


In [45]:
# fix win and venue columns
df['win'] = np.where(df['win'] == 'W', 1, 0)
df['venue'] = np.where(df['venue'] == '@', 'away', 'home')
df.head(5)

Unnamed: 0,win,venue,score,opp_score,completions,pass_att,pass_yds,pass_tds,interceptions,sacks,...,fg_att,x_pts_made,x_pt_att,punts,punt_yds,3_down_conv,3_down_att,4_down_conv,4_down_att,time_of_poss
0,1,home,25,24,39,53,368,2,1,1,...,3,1,1,1,42,10,16,0,2,37:08
1,0,away,9,16,23,47,216,0,1,0,...,3,0,0,6,221,5,15,0,1,30:50
2,1,home,26,10,25,42,277,1,0,0,...,4,2,2,2,87,8,18,3,3,39:49
3,1,home,27,21,21,32,234,1,0,3,...,2,3,3,3,115,4,10,1,1,34:56
4,0,away,17,20,19,33,209,2,0,1,...,2,2,2,5,222,5,14,0,0,28:22


It's plottin' time!