# 4) Predicting Winners with Logistic Regression

Finding a linear model to predict how many points a team will score had too many variables to accurately predict the winner of a game. Instead, we can try to use a logistic regression to see if a team will win or lose.

In [1]:
#Imports
import pandas as pd
pd.set_option('display.max_columns', None)

import seaborn as sns
import numpy as np

from sklearn.linear_model import LogisticRegression
from sklearn import metrics
from sklearn.model_selection import train_test_split

In [2]:
#Read in the data
df = pd.read_csv(r'nfl_game_logs_df.csv', index_col=0)

The null model would select a team at random and predict them to win. This would mean the picks are accurate just under 50% of the time (assuming even odds a team will win or lose, and that ties are infrequent).

We can make other predictions with minimal effort that produce better results. For instance, the home team wins about 55% of the time compared to the away team.

Additionally, the Elo scoring system I implented shows that the team with the higher score before the game wins about 63.5% of the time.

In [3]:
df[df['home_team']]['result'].value_counts(normalize=True).to_frame()

Unnamed: 0,result
W,0.553887
L,0.442109
T,0.004004


In [4]:
df.query('elo_start > elo_start_opp')['result'].value_counts(normalize=True).to_frame()

Unnamed: 0,result
W,0.635969
L,0.360027
T,0.004004


For simplicity, we can predict team wins instead of ties or losses.

In [5]:
df['result_letter'] = df['result_value'].map({1.0: 'W', 0.5: 'LT', 0.0: 'LT'})

In [6]:
logreg = LogisticRegression()

feature_cols = ['elo_opp_diff_team',
                '8_game_avg_points_for',
                '8_game_avg_points_for_opp',
                '4_game_avg_exp_pts_off',
                'home_team'
               ]

X = df[feature_cols]
y = df['result_letter']

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=123)

LR = LogisticRegression()
LR.fit(X_train, y_train)

pred = LR.predict(X_test)

In [7]:
LR.score(X, y)

0.6451451451451451

In [8]:
LR.score(X_train, y_train)

0.6422691879866519

In [9]:
LR.score(X_test, y_test)

0.6537691794529686