# Fantasy Football

In this notebook, I will make an attempt to predict football stats for the NFL player Tom Brady.
These stats could be used in a fantasy football league.

_Don't let this be some indication that Tom Brady is my favorite player.  I'm choosing TB12 because, while he is very good at what he does, we both went to Michigan -- Go Blue!.  Some day this will be extended for most (all?) players in FF leagues._

_NB, I honestly prefer college football over the NFL, but if I were to choose a favorite NFL player, I would say Russell Wilson (mostly because we were both at NC State at the same time. Go Wolfpack!)._


To access NFL data, I will use the NFLGame module (only using pre-2018 data as of this writing).

- Original repository (pre-2018): https://github.com/BurntSushi/nflgame (`nflgame`)
- New repository (2018+): https://github.com/derek-adair/nflgame (`nflgame-redux`)
- API http://nflgame.derekadair.com/

My predictions will be compared with those from Yahoo! and ESPN, and ultimately graded on the actual outcome in each game.

## Getting Started

First, let's install the necessary module, `nflgame-redux` and load other modules.

This module uses Python2, so please ensure the runtime in Google Colab is correct.

In [1]:
!pip install nflgame-redux



In [0]:
import nflgame
import sqlite3
import numpy as np
import pandas as pd

# Load the modules from scikit-learn
from sklearn import svm
from sklearn import neighbors
from sklearn.metrics import explained_variance_score  # quantifying accuracy of regression
from sklearn.model_selection import train_test_split

### Access Player and Stats
After a bit of playing around with the API, I think I know how to access the data for a specific player:

Cookbook: https://github.com/BurntSushi/nflgame/wiki/Cookbook#calculate-number-of-sacks-for-a-team  
Stat types: https://github.com/BurntSushi/nflgame/wiki/Stat-types 

In [0]:
tb12 = nflgame.find("Tom Brady")[0]

In [4]:
opts = dir(tb12)
for opt in opts:
    print " > {0} : {1}".format(opt,getattr(tb12,opt))

 > __class__ : <class 'nflgame.player.Player'>
 > __delattr__ : <method-wrapper '__delattr__' of Player object at 0x7fe0d283ecd0>
 > __dict__ : {'years_pro': 18, 'status': u'ACT', 'first_name': u'Tom', 'last_name': u'Brady', 'gsis_id': u'00-0019596', 'weight': 225, 'position': u'QB', 'playerid': u'00-0019596', 'profile_id': 2504211, 'number': 12, 'uniform_number': 12, 'height': 76, 'college': u'Michigan', 'birthdate': u'8/3/1977', 'full_name': u'Tom Brady', 'team': u'NE', 'player_id': u'00-0019596', 'gsis_name': u'T.Brady', 'profile_url': u'http://www.nfl.com/player/tombrady/2504211/profile', 'name': u'Tom Brady'}
 > __doc__ : 
    Player instances represent meta information about a single player.
    This information includes name, team, position, status, height,
    weight, college, jersey number, birth date, years, pro, etc.

    Player information is populated from NFL.com profile pages.
    
 > __format__ : <built-in method __format__ of Player object at 0x7fe0d283ecd0>
 > __getat

And his relevant stats can be obtained by doing:

In [5]:
stats = tb12.stats(2013,week=1)
stat_opts = dir(stats)
for stat_opt in stat_opts:
    if stat_opt.startswith("_"): continue
    print stat_opt,getattr(stats,stat_opt)

formatted_stats <bound method GamePlayerStats.formatted_stats of <nflgame.player.GamePlayerStats object at 0x7fe0cef08b10>>
fumbles_lost 1
fumbles_rcv 0
fumbles_tot 1
fumbles_trcv 0
fumbles_yds 0
games 1
guess_position QB
has_cat <bound method GamePlayerStats.has_cat of <nflgame.player.GamePlayerStats object at 0x7fe0cef08b10>>
home False
name T.Brady
passer_rating <bound method GamePlayerStats.passer_rating of <nflgame.player.GamePlayerStats object at 0x7fe0cef08b10>>
passing_att 52
passing_cmp 29
passing_ints 1
passing_tds 2
passing_twopta 0
passing_twoptm 0
passing_yds 288
player Tom Brady (QB, NE)
playerid 00-0019596
rushing_att 5
rushing_lng 0
rushing_lngtd 0
rushing_tds 0
rushing_twopta 0
rushing_twoptm 0
rushing_yds -4
stats OrderedDict([(u'passing_att', 52), (u'passing_twoptm', 0), (u'passing_twopta', 0), (u'passing_yds', 288), (u'passing_cmp', 29), (u'passing_ints', 1), (u'passing_tds', 2), (u'rushing_lngtd', 0), (u'rushing_tds', 0), (u'rushing_twopta', 0), (u'rushing_lng', 0)

Then I can write things like

In [6]:
print "TB12 went {0}/{1} for {2} yds, {3} TDs, and {4} INTs".format(stats.passing_cmp,stats.passing_att,stats.passing_yds,stats.passing_tds,stats.passing_ints)
print "TB12 had a total of {0} TDs ({1} passing and {2} rushing)".format(stats.tds,stats.passing_tds,stats.rushing_tds)
print "TB12 QBR = {0}".format(stats.passer_rating())

TB12 went 29/52 for 288 yds, 2 TDs, and 1 INTs
TB12 had a total of 2 TDs (2 passing and 0 rushing)
TB12 QBR = 76.4


Now that I can access data from the player I want to consider, I need to get the data from the defense he will be playing (I think this kind of information will be important for predicting TB12's performance).

--- 
Here are a couple of examples from the [cookbook](https://github.com/BurntSushi/nflgame/wiki/Cookbook)

In [0]:
# referencing the cookbook
def total_sacks_suffered(year=None, week=None, team=""):
    if year is None or team is None:
        return None

    games = nflgame.games_gen(year, week, team, team)
    plays = nflgame.combine_plays(games)

    sks = 0
    for p in plays.filter(team=team):
        if p.defense_sk > 0:
            sks += 1
    return sks

In [0]:

def total_sacks_earned(year=None, week=None, team=None):
    if year is None or team is None:
        return None

    games = nflgame.games_gen(year, week, team, team)
    plays = nflgame.combine_plays(games)

    sks = 0
    for p in plays.filter(team__ne=team):
        if p.defense_sk > 0:
            sks += 1
    return sks

In [9]:
print total_sacks_earned(year=2013, week=None, team="BAL")            #Get all sacks earned by Baltimore defense in all of 2013
print total_sacks_suffered(year=2013, week=None, team="BAL")          #Get all sacks given up by Baltimore offense in all of 2013

40
48


### Input Features 

_Season averages up until week of interest, e.g., predictions for Week 5 will use average stats from Weeks 1-4.  Predictions for Week 1 would use the previous year, here I'll just ignore that for the time being._


**QB**
* `fumbles_lost`
* `passer_rating`
* `passing_att`
* `passing_cmp`
* `passing_ints`
* `passing_tds`
* `passing_yds`
* `rushing_att`
* `rushing_tds`
* `rushing_yds`

**DEF**
* `passer_rating` (opponent)
* `passing_att`
* `passing_cmp`
* `passing_ints`
* `passing_tds`
* `passing_yds`
* `rushing_att`
* `rushing_tds`
* `rushing_yds`
* `defense_sk`
* `defense_sk_yds`
* `defense_int`
* `defense_qbhits`
* `defense_pass_def`


**PREDICTIONS** _for upcoming game_
* pass yards
* pass TDs
* rush yards
* rush TDs
* QB INTs
* QB fumbles

### Build the Database

Using this information, let's build our database using SQLite.  
I want to put the minimal amount of information into a single database, with more explicit structure, so I don't need to keep querying `nflgame`.

_I have only used sqlite once, a very long time ago for storing some time series data from our detector (PMT voltages and currents, to be more exact), so this may be very inefficient and ugly code..._

To help me with this, I'm referencing [this](https://sebastianraschka.com/Articles/2014_sqlite_in_python_tutorial.html#creating-a-new-sqlite-database) article.

The database structure:

- 1 table for each NFL team defense stats for all 16 games
    - Each row will be each game
    - Each column will be the different stats
- 1 table for TB12 stats
    - Each row will be each game
    - Each column will be the different stats


With each table in our database, we will export the necessary data to use in our learning methods

In [0]:
# Starting with 2010
ffyear = 2010

In [13]:
# First, let's get a list of New England opponents for 2010
opponents = []
games = list(nflgame.games_gen(ffyear, None, "NE","NE"))

for game in games:
    if game.home=="NE":
        opponents.append(game.away)
    else:
        opponents.append(game.home)

# Opponents in the same division will appear twice, but that's okay since they
# play each other in different weeks and the teams (supposedly) learn
# from the previous meeting
unique_opps = list(set(opponents))  # only need data tables for the unique set
print unique_opps

[u'PIT', u'MIN', u'MIA', u'CLE', u'DET', u'CIN', u'NYJ', u'GB', u'CHI', u'IND', u'BAL', u'BUF', u'SD']


In [0]:
sqlite_file = 'nflgame{0}_db.sqlite'.format(ffyear) # sqlite database

# Connecting to the database file
conn = sqlite3.connect(sqlite_file)
c = conn.cursor()

# Creating a new SQLite table with 1 column for TBrady
c.execute('CREATE TABLE tb12 (passing_tds INTEGER)')

# Creating the other tables with 1 column for DEFs
for opp in unique_opps:
    c.execute('CREATE TABLE {0} (passer_rating REAL)'.format(opp))

# Committing changes and closing the connection to the database file
conn.commit()
conn.close()

Now that we initialized the database, let's add the other columns to each table

In [0]:
# Connecting to the database file
conn = sqlite3.connect(sqlite_file)
c = conn.cursor()

# remaining columns to add
tb12_columns = [
('fumbles_lost','INTEGER'),
('passer_rating','REAL'),
('passing_att','INTEGER'),
('passing_cmp','INTEGER'),
('passing_ints','INTEGER'),
('passing_yds','INTEGER'),
('rushing_att','INTEGER'),
('rushing_tds','INTEGER'),
('rushing_yds','INTEGER')]

# Altering TB12
for column in tb12_columns:
    c.execute("ALTER TABLE tb12 ADD COLUMN '{cn}' {ct}"\
              .format(cn=column[0], ct=column[1]))

# Committing changes and closing the connection to the database file
conn.commit()
conn.close()

In [0]:
# Connecting to the database file
conn = sqlite3.connect(sqlite_file)
c = conn.cursor()

def_columns = [
('passing_att','INTEGER'),
('passing_cmp','INTEGER'),
('passing_ints','INTEGER'),
('passing_tds','INTEGER'),
('passing_yds','INTEGER'),
('rushing_att','INTEGER'),
('rushing_tds','INTEGER'),
('rushing_yds','INTEGER'),
('defense_sk','INTEGER'),
('defense_sk_yds','INTEGER'),
('defense_int','INTEGER'),
('defense_qbhits','INTEGER'),
('defense_pass_def','INTEGER')]

# Altering defenses
for opp in unique_opps:
    for column in def_columns:
        c.execute("ALTER TABLE {0} ADD COLUMN '{cn}' {ct}"\
                  .format(opp, cn=column[0], ct=column[1]))

# Committing changes and closing the connection to the database file
conn.commit()
conn.close()

In [0]:
# add the initial column choices to the list of columns
tb12_columns.append(('passing_tds','INTEGER'))
def_columns.append(('passer_rating','REAL'))

Now that I have the database created, it's time to take data from `nflgame` and populate it.

Starting with Week 1:

In [21]:
# TB12 Week1 stats
stat_names = [i[0] for i in tb12_columns]

stat_values = {}
stats = tb12.stats(ffyear,week=1)

for stat_name in stat_names:
    try:
        stat_values[stat_name] = getattr(stats,stat_name)()
    except:
        stat_values[stat_name] = getattr(stats,stat_name)

print stat_values

{'passing_att': 35, 'rushing_yds': 0, 'fumbles_lost': 0, 'rushing_tds': 0, 'passing_yds': 258, 'passing_cmp': 25, 'passing_ints': 0, 'passing_tds': 3, 'rushing_att': 0, 'passer_rating': 120.9}


In [0]:
# Connecting to the database file
conn = sqlite3.connect(sqlite_file)
c = conn.cursor()

values = ['{0}'.format(stat_values[i]) for i in stat_names]

# Insert new data
try:
    c.execute("INSERT INTO tb12 ({0}) VALUES ({1})".\
        format(','.join(stat_names),','.join(values)))
except sqlite3.IntegrityError:
    print('ERROR: ID already exists in PRIMARY KEY column {}'.format(id_column))

# Committing changes and closing the connection to the database file
conn.commit()
conn.close()

In [23]:
# Check the database
conn = sqlite3.connect(sqlite_file)
c = conn.cursor()

# 1) Contents of all columns for row that match a certain value in 1 column
c.execute('SELECT * FROM tb12')
all_rows = c.fetchall()
print('1):', all_rows)

conn.close()

('1):', [(3, 0, 120.9, 35, 25, 0, 258, 0, 0, 0)])


In [24]:
# Let's check the database using pandas
conn = sqlite3.connect(sqlite_file)

print pd.read_sql_query("SELECT * FROM tb12", conn)

conn.close()

   passing_tds  fumbles_lost  passer_rating  passing_att  passing_cmp  \
0            3             0          120.9           35           25   

   passing_ints  passing_yds  rushing_att  rushing_tds  rushing_yds  
0             0          258            0            0            0  


And it looks like we were successfully able to populate the TABLE with data!  We can even read it easily with `sqlite3` or `pandas`.  The `pandas` interface will be very nice for later, when I want to split into training/testing datasets and such.  
For now, let's add the rest of the data.

NB: There are 17 weeks in an NFL season, but only 16 games (one bye week).  Here I'll try to just skip the bye week and not add an entry to the database.

In [0]:
# First, put some of the code used above into a nice, neat function
# TB12 Week1 stats
def get_tb12_stats(stat_names,week):
    stat_values = {}
    stats = tb12.stats(ffyear,week=week)

    for stat_name in stat_names:
        try:
            stat_values[stat_name] = getattr(stats,stat_name)()
        except:
            stat_values[stat_name] = getattr(stats,stat_name)
    return stat_values

In [0]:
allstats = []
weeks    = range(2,18)
names    = [i[0] for i in tb12_columns]
for wk in weeks:
    wk_stats = get_tb12_stats(names,wk)
    if wk_stats['passing_att'] < 1: continue  # assume this is the bye week, or he just didn't play
    allstats.append(wk_stats)

In [0]:
# Connecting to the database file
conn = sqlite3.connect(sqlite_file)
c = conn.cursor()

# Add all of the new values 
for allstat in allstats:
    values = ['{0}'.format(allstat[i]) for i in names]
    
    # Insert new data
    try:
        c.execute("INSERT INTO tb12 ({0}) VALUES ({1})".\
            format(','.join(names),','.join(values)))
    except sqlite3.IntegrityError:
        print('ERROR: ID already exists in PRIMARY KEY column {}'.format(id_column))

# Committing changes and closing the connection to the database file
conn.commit()
conn.close()

In [38]:
# Let's check the database using pandas
conn = sqlite3.connect(sqlite_file)
print pd.read_sql_query("SELECT * FROM tb12", conn)
conn.close()

    passing_tds  fumbles_lost  passer_rating  passing_att  passing_cmp  \
0             3             0          120.9           35           25   
1             2             1           72.5           36           20   
2             3             0          142.6           27           21   
3             1             0          107.1           24           19   
4             1             0           69.5           44           27   
5             1             0           82.7           32           19   
6             1             0          100.8           27           16   
7             2             0           90.5           36           19   
8             3             0          125.1           43           30   
9             2             0          123.1           25           19   
10            4             0          158.3           27           21   
11            4             0          148.9           29           21   
12            2             0         

It looks like we were able to add everything to the database without any issues for Brady's stats, so let's look at adding the stats for each defense now.

In [0]:
# First, put some of the code used above into a nice, neat function
# def get_def_stats(week):
#     stat_values = {}

#     stats = tb12.stats(ffyear,week=week)

#     for stat_name in def_columns:
#         try:
#             stat_values[stat_name] = getattr(stats,stat_name)()
#         except:
#             stat_values[stat_name] = getattr(stats,stat_name)
#     return stat_values
# 'passing_att'
# 'passing_cmp'
# 'passing_ints'
# 'passing_tds'
# 'passing_yds'
# 'rushing_att'
# 'rushing_tds'
# 'rushing_yds'
# 'defense_sk'
# 'defense_sk_yds'
# 'defense_int'
# 'defense_qbhits'
# 'defense_pass_def

In [0]:
def get_qb_stats(week=None,team=""):
    games = nflgame.games_gen(ffyear, week=week, away=team)
    if games is None:
        games = nflgame.games_gen(ffyear, week=week, home=team)
    if games is None:
        print " Bye week, no QB stats "
        return

    players = nflgame.combine_game_stats(games)
    
    for i,p in enumerate(players.filter(team__ne=team, passing_att__ge=2)): # try to protect against trick plays
        print '%s, %d / %d , %d yards and %f QBR.' \
              % (p.name, p.passing_cmp, p.passing_att, p.passing_yds,p.passer_rating())
    return

In [73]:
for i in range(1,18):
    print " > ",i
    get_qb_stats(i,"NE")

 >  1
C.Palmer, 34 / 50 , 345 yards and 92.500000 QBR.
 >  2
M.Sanchez, 21 / 30 , 220 yards and 124.300000 QBR.
 >  3
R.Fitzpatrick, 20 / 28 , 247 yards and 92.400000 QBR.
 >  4
C.Henne, 29 / 39 , 302 yards and 81.400000 QBR.
T.Thigpen, 2 / 6 , 15 yards and 2.800000 QBR.
 >  5
 Bye week, no QB stats 
 >  6
J.Flacco, 27 / 35 , 285 yards and 119.300000 QBR.
 >  7
P.Rivers, 34 / 50 , 336 yards and 85.100000 QBR.
 >  8
T.Jackson, 4 / 6 , 36 yards and 122.200000 QBR.
B.Favre, 22 / 32 , 259 yards and 80.100000 QBR.
 >  9
C.McCoy, 14 / 19 , 174 yards and 119.200000 QBR.
 >  10
B.Roethlisberger, 30 / 49 , 387 yards and 97.900000 QBR.
 >  11
P.Manning, 38 / 52 , 396 yards and 96.300000 QBR.
 >  12
Sh.Hill, 27 / 46 , 285 yards and 65.900000 QBR.
 >  13
M.Sanchez, 17 / 33 , 164 yards and 27.800000 QBR.
 >  14
J.Cutler, 12 / 26 , 152 yards and 32.900000 QBR.
 >  15
M.Flynn, 24 / 37 , 251 yards and 100.200000 QBR.
 >  16
R.Fitzpatrick, 18 / 37 , 251 yards and 37.100000 QBR.
 >  17
C.Henne, 6 / 16 ,

## Learning

There are only 16 games per season, so the way I have this structured thus far is likely not the most ideal for predicting outcomes.  
Instead, I can imagine a better way would involve using individual plays categorized by {field position, score, quarter+time remaining, down+distance, OFF/DEF formation, home/away, and metrics for the game's progression} for both the player of interest and the defense of interest.  
If I have the time, I will investigate such an approach.  For now, I will keep it simple and just use aggregated game information.

#### Train/Test Split

With all of the data nicely organized into a SQLite database, let's use pandas to easily read that and prepare for our learning.

In [0]:
df  = df.fillna(-1)
tmp = df.sample(frac=1) # shuffle the dataframe rows
tts = train_test_split(df[features].values,\
                       df['SalePrice'].values, \
                       test_size=0.25)
X_train,X_test,Y_train,Y_test = tts

#### Pre-process Data

In [0]:
# Develop the scaling on the training dataset, and then apply the same shift to the test
from sklearn.preprocessing import StandardScaler

# scale features
scaler = StandardScaler()
scaler.fit(X_train)

# scale target values
scaler_target = StandardScaler()
scaler_target.fit(Y_train.reshape(-1,1))

In [0]:
# Scale values
X_test_scale  = scaler.transform(X_test)
Y_test_scale  = scaler_target.transform([Y_test])
X_train_scale = scaler.transform(X_train)
Y_train_scale = scaler_target.transform([Y_train])

### K-Nearest Neighbors

In [0]:
# KNN
n_neighbors = 5
weights = 'uniform'

knn  = neighbors.KNeighborsRegressor(n_neighbors, weights=weights)
fknn = knn.fit(X_train, Y_train)
predictions = fknn.predict(X_test)

### Support Vector Machine

In [0]:
# SVM
# with scikit-learn it is incredibly easy to get started
clf = svm.SVR()  # support vector regression
clf.fit(X_train,Y_train)

In [0]:
# Performance
predictions = clf.predict(X_test)
values = np.divide((np.asarray(predictions) - Y_test),Y_test)

fig,ax = plt.subplots(2,1,figsize=(8,8))

plt.subplot(2,1,1)
plt.hist(values,bins=20,normed=True)
plt.xlabel("(Pred-Real)/Real",position=(1,0),ha='right')
plt.ylabel("AU",position=(0,1),ha='right')
plt.text(0.97,0.90,"SVM Non-scaled Values",ha='right',transform=ax[0].transAxes)

plt.subplot(2,1,2)
plt.scatter(predictions,Y_test,color='b',edgecolor='k',alpha=0.5,label="Test Dataset");
plt.plot(Y_test,Y_test,color='r',label="Perfect")
plt.xlim(min(predictions)-10000,max(predictions)+10000)
plt.ylim(0,max(Y_test)+20000)
plt.xlabel("Predicted Sale Price",position=(1,0),ha='right')
plt.ylabel("Real Sale Price",position=(0,1),ha='right')
plt.legend()

evs = explained_variance_score(Y_test,predictions)

print(r"Distribution = {0:.3f} $\pm$ {1:.4f}".format(np.mean(values),np.std(values)))
print(r"EV Score     = {0:.3f}".format(evs))