# Defense Wins Championships

It has been said that defense wins championships. Well, more precisely, the saying goes, "Offense wins games, defense wins championships." So I want to take a look at how offenses and defenses affect the success of a team. In football, the New England Patriots are considered to be one of the most successful franchises (being a Denver Broncos fan, that's probably the highest praise you'll get out of me) in the modern era. The only reason I picked the Pats is due to their consistency in making it to the post season which means there will be a relatively sizeable dataset. I will take a look at the last 10 years worth of game statistics to see whether having a good offense result in wins vs having a good defense during the regular season, and vice-a-versa in the playoffs.

### Data

I gathered past 10 years' regular and post season stats for the New England Patriots. Let's first look at the regular season data.

In [2]:
# Let's load the necessary packages and the data
import numpy as np
import pandas as pd
import scipy
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import sklearn
from sklearn import ensemble
from sklearn import linear_model
from sklearn import svm
from sklearn import neighbors
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
import time

In [43]:
# import data
df = pd.read_csv('../../../Data/NEPatsstats.csv')
df.head(5)

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Tm,Opp,Cmp,Att,Yds,TD,Int,Sk,...,FGA,XPM,XPA,Pnt,Yds.3,3DConv,3DAtt,4DConv,4DAtt,ToP
0,W,,25,24,39,53,368,2,1,1,...,3,1,1,1,42,10,16,0,2,37:08
1,L,@,9,16,23,47,216,0,1,0,...,3,0,0,6,221,5,15,0,1,30:50
2,W,,26,10,25,42,277,1,0,0,...,4,2,2,2,87,8,18,3,3,39:49
3,W,,27,21,21,32,234,1,0,3,...,2,3,3,3,115,4,10,1,1,34:56
4,L,@,17,20,19,33,209,2,0,1,...,2,2,2,5,222,5,14,0,0,28:22


In [6]:
# Let's gather some stats on the stats
print(df.shape)
print(df.columns)

(160, 29)
Index(['Unnamed: 0', 'Unnamed: 1', 'Tm', 'Opp', 'Cmp', 'Att', 'Yds', 'TD',
       'Int', 'Sk', 'Yds.1', 'Y/A', 'Cmp%', 'Rate', 'Att.1', 'Yds.2', 'Y/A.1',
       'TD.1', 'FGM', 'FGA', 'XPM', 'XPA', 'Pnt', 'Yds.3', '3DConv', '3DAtt',
       '4DConv', '4DAtt', 'ToP'],
      dtype='object')


In [44]:
# name unnamed columns
df.columns = ['win', 'venue', 'score', 'opp_score', 'completions', 'pass_att', 'pass_yds', 'pass_tds',
              'interceptions', 'sacks', 'yds_lost', 'yds_per_pass_att', 'comp_%', 'qb_rtg',
              'rush_att', 'rush_yds', 'yds_per_rush_att', 'rush_tds', 'fgs_made', 'fg_att',
              'x_pts_made', 'x_pt_att', 'punts', 'punt_yds', '3_down_conv', '3_down_att', 
              '4_down_conv', '4_down_att', 'time_of_poss']
df.head(5)

Unnamed: 0,win,venue,score,opp_score,completions,pass_att,pass_yds,pass_tds,interceptions,sacks,...,fg_att,x_pts_made,x_pt_att,punts,punt_yds,3_down_conv,3_down_att,4_down_conv,4_down_att,time_of_poss
0,W,,25,24,39,53,368,2,1,1,...,3,1,1,1,42,10,16,0,2,37:08
1,L,@,9,16,23,47,216,0,1,0,...,3,0,0,6,221,5,15,0,1,30:50
2,W,,26,10,25,42,277,1,0,0,...,4,2,2,2,87,8,18,3,3,39:49
3,W,,27,21,21,32,234,1,0,3,...,2,3,3,3,115,4,10,1,1,34:56
4,L,@,17,20,19,33,209,2,0,1,...,2,2,2,5,222,5,14,0,0,28:22


In [45]:
# fix win and venue columns
df['win'] = np.where(df['win'] == 'W', 1, 0)
df['venue'] = np.where(df['venue'] == '@', 'away', 'home')
df.head(5)

Unnamed: 0,win,venue,score,opp_score,completions,pass_att,pass_yds,pass_tds,interceptions,sacks,...,fg_att,x_pts_made,x_pt_att,punts,punt_yds,3_down_conv,3_down_att,4_down_conv,4_down_att,time_of_poss
0,1,home,25,24,39,53,368,2,1,1,...,3,1,1,1,42,10,16,0,2,37:08
1,0,away,9,16,23,47,216,0,1,0,...,3,0,0,6,221,5,15,0,1,30:50
2,1,home,26,10,25,42,277,1,0,0,...,4,2,2,2,87,8,18,3,3,39:49
3,1,home,27,21,21,32,234,1,0,3,...,2,3,3,3,115,4,10,1,1,34:56
4,0,away,17,20,19,33,209,2,0,1,...,2,2,2,5,222,5,14,0,0,28:22
