Teams have 9 active players, and 7 bench players.  They are broken down as follows:


| Position  | Number Active  |  Max on Team |
|:----------:|:-------------:| :-------------:|
| Quarterback (QB) |  1 | 4 |
| Running Back (RB) | 2  | 8 |
| Wide Receiver (WR) | 2 | 8 |
| Tight End (TE) | 1| 3 |
| Flex (RB/WR/TE) | 1 | N/A |
| Team Defense (D/ST) | 1 | 3 |
| Place Kicker (K) | 1 | 3 |


Here are how points are awarded to each player (not including kicker and defense scoring):

| Stat  | Points  | 
|:----------:|:-------------:|
| Passing Yards (PY) |	0.04 | 
| TD Pass (PTD)	     |  6    | 
| 2pt Passing Conversion (2PC) | 2 | 
| Interceptions Thrown (INT) |	-2	|
| Rushing Yards (RY)	| 0.1 | 
| TD Rush (RTD)	        | 6 | 
| 2pt Rushing Conversion (2PR) | 2	| 
| Receiving Yards (REY)	| 0.1 | 
| Each reception (REC)	| 1 | 
| TD Reception (RETD)	| 6	| 
| 2pt Receiving Conversion (2PRE) | 2 | 
| Kickoff Return TD (KRTD)	| 6 | 
| Punt Return TD (PRTD)	| 6 | 
| Fumble Recovered for TD (FTD)	| 6	|
| Total Fumbles Lost (FUML)	| -2 | 


For a first attempt, let's get all of these stats for a particular quarter back, `T.Brady` in a particular game. We'll follow http://blog.burntsushi.net/nfl-live-statistics-with-python/.

In [None]:
import nfldb
import pandas as pd
db = nfldb.connect()

In [None]:
#From https://github.com/BurntSushi/nfldb/wiki/Statistical-categories
q = nfldb.Query(db)
q.game(season_year=2015, season_type='Regular')
q.player(full_name='Adrian Peterson')
for stats in q.as_aggregate():
        print 'Player: ' , stats.player
        print ''
        print '---Scoring Information---'
        print 'Passing Yards: ' , stats.passing_yds
        print 'TD Passes: ' , stats.passing_tds
        print 'Passing Two-point Conversions: ' , stats.passing_twoptm
        print 'Interceptions Thrown: ' , stats.passing_int
        print 'Rushing Yards: ' , stats.rushing_yds
        print 'Rushing TDs: ' , stats.rushing_tds
        print 'Rushing Two-point Conversions: ' , stats.rushing_twoptm
        print 'Receiving yards: ' , stats.receiving_yds
        print 'Receptions: ' , stats.receiving_rec
        print 'TD Reception: ' , stats.receiving_tds
        print 'Receiving Two-point Conversions: ' , stats.receiving_twoptm
        print 'Kickoff return touchdowns: ' , stats.kickret_tds
        print 'Punt return touchdowns: ' , stats.puntret_tds
        print 'Fumble Return TD: ', stats.fumbles_rec_tds
        print 'Total Fumbles Lost: ' , stats.fumbles_lost
        print ''
        print '---Additional QB Information---'
        print 'Passing Attempts: ', stats.passing_att
        print 'Passing Completions: ', stats.passing_cmp
        print 'Passing Incompletions: ', stats.passing_incmp
        print 'Total yards of passing in air: ' , stats.passing_cmp_air_yds
        print 'Number of times sacked: ' , stats.passing_sk
        print 'Number of yards lost while sacked: ' , stats.passing_sk_yds
        print 'Two point conversion attempts: ' , stats.passing_twopta
        print ''
        print '---Additional WR Information---'
        print 'Number of targets: ' , stats.receiving_tar
        print 'Number of two point conversion attempts: ', stats.receiving_twopta
        print 'Additional yardage after catch: ' , stats.receiving_yac_yds
        print ''
        print '---Additional RB Information---'
        print 'Rushing attempts: ', stats.rushing_att
        print 'Number of rushing losses: ', stats.rushing_loss #NOT WORKING
        print 'Yards of rushign losses: ', stats.rushing_loss_yds #NOT WORKING
        print 'Rushing two point conversion attempts: ', stats.rushing_twopta
        print ''
        print ''

        
    
'''
Kicking stats:    
PAT attempts: kicking_xpa
PATs Made (PAT): kicking_xpmade
PATs Missed: kicking_xpmissed
FG attempts: kicking_fga    
Total FG Made: kicking_fgm
Total FG Missed: kicking_fgmissed
Each FG Yards: kicking_fgm_yds
Each Missed FG Yards: kicking_fgmissed_yds
All FG yards: kicking_all_yds
    
Defense/Special Teams stats:
Interception Return TD (INTTD): defense_int_tds
Blocked Punt or FG return for TD (BLKKRTD): defense_misc_tds
Each Sack (SK): defense_sk
Fumble Return TD (FRTD): defense_frec_tds
Blocked Punt, PAT or FG (BLKK): defense_puntblk, defense_fgblk, defense_xpblk
Each Interception (INT): defense_int
Each Fumble Recovered (FR): defense_frec
Each Safety (SF): defense_safe
yards allowed:
points allowed:
Kickoff Return TD: kickret_tds
Punt Return TD (PRTD): puntret_tds
    
number of fumbles forced: defense_ffum
Defense blocked a pass: defense_pass_def
yards gains after an interception: defense_int_yds
Number of tackles behind scrimmage line: defense_tkl_loss
Defense caused yards lost behind scrimmage line: defense_tkl_loss_yds
'''
    
    
#No stats on 2 point returns by defense, or 1 point safeties... doesn't matter since they're rare
#rushing_loss_yds and rushing_loss aren't working... bummer

We could try to predict fantasy points, or try to predict stat outcomes themselves.  I think predicting stats has more value, since the result can be adapted to any fantasy scoring method people come up with.   It also has the added benefit of realizing when a particular defense is strong against running, or passing, or both rather than just strong in general.  The downside is that it may be harder to predict individual stats than it would be to predict scores.

Let's give predicting stats a shot and see how it works out.  As a first attempt, we should just try to predict something like rushing yards.  First, we need to produce a list of samples, including features and their outcome.  The samples will be stats for an individual player for an individual game.  We'll scan all games played one at a time to collect this information.  What should the features we use for our prediction be?


To start, we can try these features:

* DONE: Average relevant stats for the player (From the last X games prior to a game)
* DONE: Player's team
* DONE: Opponent team
* DONE: Team at home or away
* DONE: Player's average stat against this oppenent
* DONE: Team's average stat against this oppenent (To help account for players switching teams)
* DONE: Player's position (QB's don't rush much, for instance.  This is covered by average stat to begin with)

Here are some more stats we might add in the future.  These may be harder to implement but are worth looking into:

* DONE: Player's injury status
* DONE: Player's bench status
* DONE: (no coach) Player's coach, other players on his team
* DONE: (no coach) Oppenent's coach, and other players on opponent team

Let's loop through each game and produce our samples for the stats we mentioned in the first list. We'll need to write some functions to calculate the average of the players stats.  **Keep in mind we need to account for unexpected injuries and by weeks... probably just don't include them in the samples at first!**

In [1]:
import numpy as np
import nfldb
import pandas as pd
import collections
db = nfldb.connect()

  (fname, cnt))


In [2]:
def zeroStats():

    #List of relevant stats
    this_game_stats = collections.OrderedDict()
    this_game_stats['passing_yds']= 0
    this_game_stats['passing_tds']= 0
    this_game_stats['passing_twoptm']= 0
    this_game_stats['passing_int']= 0
    this_game_stats['rushing_yds']= 0
    this_game_stats['rushing_tds']= 0
    this_game_stats['rushing_twoptm']= 0
    this_game_stats['receiving_yds']= 0
    this_game_stats['receiving_rec']= 0
    this_game_stats['receiving_tds']= 0
    this_game_stats['receiving_twoptm']= 0
    this_game_stats['kickret_tds']= 0
    this_game_stats['puntret_tds']= 0
    this_game_stats['fumbles_rec_tds']= 0
    this_game_stats['fumbles_lost']= 0
    this_game_stats['passing_att']= 0
    this_game_stats['passing_cmp']= 0
    this_game_stats['passing_incmp']=0
    this_game_stats['passing_cmp_air_yds']= 0
    this_game_stats['passing_sk']= 0
    this_game_stats['passing_sk_yds']= 0
    this_game_stats['passing_twopta']= 0
    this_game_stats['receiving_tar']= 0
    this_game_stats['receiving_twopta']= 0
    this_game_stats['receiving_yac_yds']= 0
    this_game_stats['rushing_att']= 0
    this_game_stats['rushing_twopta']= 0
    this_game_stats['kicking_xpa']= 0
    this_game_stats['kicking_xpmade']= 0
    this_game_stats['kicking_xpmissed']= 0
    this_game_stats['kicking_fga']= 0
    this_game_stats['kicking_fgm']= 0
    this_game_stats['kicking_fgmissed']= 0
    this_game_stats['kicking_fgm_yds']= 0
    this_game_stats['kicking_fgmissed_yds']= 0
    this_game_stats['kicking_all_yds']= 0
    this_game_stats['defense_int_tds']= 0
    this_game_stats['defense_misc_tds']= 0
    this_game_stats['defense_sk']= 0
    this_game_stats['defense_frec_tds']= 0
    this_game_stats['defense_puntblk']= 0
    this_game_stats['defense_fgblk']= 0
    this_game_stats['defense_xpblk']= 0
    this_game_stats['defense_int']= 0
    this_game_stats['defense_frec']= 0
    this_game_stats['defense_safe']= 0
    this_game_stats['defense_ffum']= 0
    this_game_stats['defense_pass_def']= 0
    this_game_stats['defense_int_yds']= 0
    this_game_stats['defense_tkl_loss']= 0
    this_game_stats['defense_tkl_loss_yds']= 0


    #These will always be zero since they don't exist in nfldb
    this_game_stats['defense_kickret_tds']=0
    this_game_stats['defense_puntret_tds']=0
    this_game_stats['defense_rushing_yds_allowed']= 0
    this_game_stats['defense_passing_yds_allowed']=0 
    this_game_stats['defense_total_yds_allowed']=0
    this_game_stats['defense_rushing_tds_allowed']= 0
    this_game_stats['defense_passing_tds_allowed']=0       
    this_game_stats['defense_fga_allowed']=0   
    this_game_stats['defense_points_allowed']=0   

    
    return this_game_stats

In [3]:
'''This populates a dataframe with week-by-week stats on all players in all games'''

#The last four are not actual stats in nfldb... need to derive them somehow
gameStats = pd.DataFrame(columns=('Player','Position','Week','Team','At Home','Opponent','Outcome',\
                                  'Team Players','Opponent Players','Player Benched',\
                                  'passing_yds','passing_tds','passing_twoptm',\
                                  'passing_int','rushing_yds','rushing_tds',\
                                  'rushing_twoptm','receiving_yds','receiving_rec',\
                                  'receiving_tds','receiving_twoptm','kickret_tds',\
                                  'puntret_tds','fumbles_rec_tds','fumbles_lost',\
                                  'passing_att','passing_cmp','passing_incmp','passing_cmp_air_yds',\
                                  'passing_sk','passing_sk_yds','passing_twopta',\
                                  'receiving_tar','receiving_twopta','receiving_yac_yds',\
                                  'rushing_att','rushing_twopta','kicking_xpa', \
                                  'kicking_xpmade','kicking_xpmissed','kicking_fga',\
                                  'kicking_fgm','kicking_fgmissed','kicking_fgm_yds',\
                                  'kicking_fgmissed_yds','kicking_all_yds',\
                                  'defense_int_tds','defense_misc_tds','defense_sk',\
                                  'defense_frec_tds','defense_puntblk','defense_fgblk',\
                                  'defense_xpblk','defense_int','defense_frec','defense_safe',\
                                  'defense_ffum','defense_pass_def','defense_int_yds',\
                                  'defense_tkl_loss','defense_tkl_loss_yds',\
                                  'defense_kickret_tds','defense_puntret_tds',\
                                  'defense_rushing_yds_allowed','defense_passing_yds_allowed',\
                                  'defense_total_yds_allowed','defense_rushing_tds_allowed',\
                                  'defense_passing_tds_allowed','defense_fga_allowed','defense_points_allowed'))


aggregate_stats=['defense_int_tds','defense_misc_tds','defense_sk','defense_frec_tds',\
               'defense_frec_tds','defense_puntblk','defense_fgblk',\
               'defense_xpblk','defense_int','defense_frec','defense_safe',\
               'defense_ffum','defense_pass_def','defense_int_yds',\
               'defense_tkl_loss','defense_tkl_loss_yds']

current_row = 0
current_week = 1

#Connect to the database (only goes back to 2009)
for year in range(2009,2016):
    for week_num in range (1,18):

        cur_query= nfldb.Query(db)
        cur_query.game(season_year=year, season_type='Regular', week=week_num)
        for info in cur_query.as_games():
            
            #Make a list of home and away players
            home_players=[]
            away_players=[]
            for player in range(len(info.players)):
                player_name = (str(info.players[player][1]).split(' (')[0])
                team = str(info.players[player][0])
                if (info.home_team == team):
                    home_players.append(player_name)
                else:
                    away_players.append(player_name)

            for player in range(len(info.players)):
                player_name = (str(info.players[player][1]).split(' (')[0])
                position = (str(info.players[player][1]).split(' (')[1].split(', ')[1].split(')')[0])
                team = str(info.players[player][0])
                team_is_home = (info.home_team == team)
                if team_is_home:
                    opponent = info.away_team
                    team_players = home_players
                    opponent_players = away_players
                else:
                    opponent = info.home_team
                    team_players = away_players
                    opponent_players = home_players

                if info.winner == team:
                    outcome = 'Win'
                else:
                    outcome = 'Loss'
                    
                #List of relevant stats
                this_game_stats = zeroStats()
                
                p_query= nfldb.Query(db)
                p_query.game(season_year=year, season_type='Regular', week=week_num)
                for p_info in p_query.player(full_name=player_name).as_aggregate():
                    #If there is no info, they didn't play
                    if p_info:
                        for stat in this_game_stats:
                            if stat in dir(p_info):
                                this_game_stats[stat] = eval('p_info.%s' % stat)
                        
                        #If all stats are zero, claim the player was benched or injured
                        if max([this_game_stats[key] for key in gameStats.columns[10:]]) > 0:
                            suspected_bench = False
                        else:
                            suspected_bench = True
                    else:
                        #If no stats available, claim the player was benched (also included by weeks)
                        for stat in this_game_stats:
                            this_game_stats[stat] = None 
                        suspected_bench = True

                output_list = [player_name, position, current_week, team, team_is_home, opponent, outcome, \
                               team_players, opponent_players, suspected_bench] + \
                              [this_game_stats[key] for key in gameStats.columns[10:]]         
                gameStats.loc[current_row] = output_list                
                current_row+=1 
                 
            #After looping through all the players calculate defense team stats
            
            home_stats = gameStats[(gameStats['Team'] == info.home_team) & (gameStats['Week'] == current_week)]
            away_stats = gameStats[(gameStats['Team'] == info.away_team) & (gameStats['Week'] == current_week)]
           
            #First for home team
            player_name = info.home_team + '_defense'
            position = 'DEF'
            team = info.home_team
            team_is_home = True
            opponent = info.away_team
            if info.winner == info.home_team:
                outcome="Win"
            else:
                outcome="Loss"
            team_players=home_players
            opponent_players=away_players
            suspected_bench=False
            
            this_game_stats = zeroStats()
            
            for key in aggregate_stats:
                this_game_stats[key]=sum(home_stats[key])
    
            this_game_stats['defense_rushing_yds_allowed']= sum(away_stats['rushing_yds'])
            this_game_stats['defense_passing_yds_allowed']=sum(away_stats['passing_yds'])
            this_game_stats['defense_total_yds_allowed']=sum(away_stats['passing_yds']) + sum(away_stats['rushing_yds'])         
            this_game_stats['defense_rushing_tds_allowed']= sum(away_stats['rushing_tds'])
            this_game_stats['defense_passing_tds_allowed']=sum(away_stats['passing_tds'])               
            this_game_stats['defense_points_allowed']=info.away_score
            this_game_stats['defense_kickret_tds']=sum(home_stats['kickret_tds'])
            this_game_stats['defense_puntret_tds']=sum(home_stats['puntret_tds'])
            this_game_stats['defense_fga_allowed']=sum(away_stats['kicking_fga'])
            
            output_list = [player_name, position, current_week, team, team_is_home, opponent, outcome, \
                               team_players, opponent_players, suspected_bench] + \
                              [this_game_stats[key] for key in gameStats.columns[10:]]         
            gameStats.loc[current_row] = output_list                
            current_row+=1         
            
            #Now for away team
            player_name = info.away_team + '_defense'
            position = 'DEF'
            team = info.away_team
            team_is_home = False
            opponent = info.home_team
            if info.winner == info.away_team:
                outcome="Win"
            else:
                outcome="Loss"
            team_players=away_players
            opponent_players=home_players
            suspected_bench=False
            
            this_game_stats = zeroStats()
            
            for key in aggregate_stats:
                this_game_stats[key]=sum(away_stats[key])
    
            this_game_stats['defense_rushing_yds_allowed']= sum(home_stats['rushing_yds'])
            this_game_stats['defense_passing_yds_allowed']=sum(home_stats['passing_yds'])
            this_game_stats['defense_total_yds_allowed']=sum(home_stats['passing_yds']) + sum(home_stats['rushing_yds'])         
            this_game_stats['defense_rushing_tds_allowed']= sum(home_stats['rushing_tds'])
            this_game_stats['defense_passing_tds_allowed']=sum(home_stats['passing_tds'])                        
            this_game_stats['defense_points_allowed']=info.home_score
            this_game_stats['defense_kickret_tds']=sum(away_stats['kickret_tds'])
            this_game_stats['defense_puntret_tds']=sum(away_stats['puntret_tds'])
            this_game_stats['defense_fga_allowed']=sum(home_stats['kicking_fga'])
            
            
            output_list = [player_name, position, current_week, team, team_is_home, opponent, outcome, \
                               team_players, opponent_players, suspected_bench] + \
                              [this_game_stats[key] for key in gameStats.columns[10:]]         
            gameStats.loc[current_row] = output_list                
            current_row+=1 
            
        current_week += 1
     
    #Save each time we finish a year
    gameStats.to_csv('gameStats.csv')


# COULD I SPEED THIS UP USING GROUPBY?

In [20]:
def FindPlayerAverage(num_weeks,player,current_week,opp=None):
    '''Averages a players stats over the past num_weeks
    If we dont have num_weeks of data, it goes back as far as possible
    
    Inputs:
    num_weeks -> number of weeks to average over
    player    -> player whose stats we want to average
    current_week -> The corresponding week in gameStats['Week'] for the sample
    opp       -> optional.  Specify opponent to average over games versus that opponent
    '''
    
    start_week = current_week - num_weeks + 1
    playerStats = gameStats[(gameStats['Player']== player) & (gameStats['Week']>= start_week) & (gameStats['Week']< current_week)].reset_index(drop=True)

    #Find number of games benched
    num_benched=len(playerStats[playerStats['Player Benched']==True])
         
    #Remove games where they player didn't play
    playerStats=playerStats[playerStats['Player Benched']==False]
 
    #Average the remaining stats
    playerAverage=[np.mean(playerStats[key]) for key in playerStats.columns[10:]]

    if opp:
        playerStats_v_opp = playerStats[playerStats['Opponent']==opp]
        playerAverage_v_opp=[np.mean(playerStats_v_opp[key]) for key in playerStats_v_opp.columns[10:]]
        return (playerAverage,num_benched,playerAverage_v_opp)
        
    else:
        return (playerAverage,num_benched)

In [None]:
def: CountWL(dframe):
    #Counting wins and losses
    teamW=0
    teamL=0
    for week in set(dframe['Week']):
        #Count wins and losses
        if dframe[dframe['Week']==week]['Outcome'].iloc[0]=='Win':
            teamW+=1
        else:
            teamL+=1  
            
    return (teamW,teamL)

def: AvgFromWeekly(dframe):
    weeklyStats=dframe.ix[:,10:]
    weeklyStats['Week']=dframe['Week']
    weeklyStats=weeklyStats.groupby(['Week']).sum()
    for key in aggregate_stats:
        weeklyStats[key]=weeklyStats[key]/2    
    
    #Average the remaining stats
    WeeklyAvg=[np.mean(weeklyStats[key]) for key in weeklyStats.columns]
 

In [190]:
#I want to sum over all of the players before averaging...
#need to consider what to do in the case of defense score...

def FindTeamAverage(num_weeks,team,current_week,opp=None):
    '''Averages a teams stats over the past num_weeks
    If we dont have num_weeks of data, it goes back as far as possible
    
    Inputs:
    num_weeks -> number of weeks to average over
    team      -> team whose stats we want to average
    current_week -> The corresponding week in gameStats['Week'] for the sample    
    opp       -> optional.  Specify opponent to average over games versus that opponent
    '''
    start_week = current_week - num_weeks + 1
    teamStats = gameStats[(gameStats['Team']== team) & (gameStats['Week']>= start_week) & (gameStats['Week']< current_week)].reset_index(drop=True)


    weeklyStats=teamStats.ix[:,10:]
    weeklyStats['Week']=teamStats['Week']
    weeklyStats=weeklyStats.groupby(['Week']).sum()
    for key in aggregate_stats:
        weeklyStats[key]=weeklyStats[key]/2    
    
    #Average the remaining stats
    teamAverage=[np.mean(weeklyStats[key]) for key in weeklyStats.columns]

    
    #Counting wins and losses
    teamW=0
    teamL=0
    for week in set(teamStats['Week']):
        #Count wins and losses
        if teamStats[teamStats['Week']==week]['Outcome'].iloc[0]=='Win':
            teamW+=1
        else:
            teamL+=1                    

    teamWL=(teamW,teamL)    
    
    if opp:
        teamStats_v_opp = teamStats[teamStats['Opponent']==opp]
        
        weeklyStats_v_opp=teamStats_v_opp.ix[:,10:]
        weeklyStats_v_opp['Week']=teamStats_v_opp['Week']
        weeklyStats_v_opp=weeklyStats_v_opp.groupby(['Week']).sum()
        for key in aggregate_stats:
            weeklyStats_v_opp[key]=weeklyStats_v_opp[key]/2    

        #Average the remaining stats
        teamAverage_v_opp=[np.mean(weeklyStats_v_opp[key]) for key in weeklyStats_v_opp.columns]


        #Counting wins and losses
        teamW_v_opp=0
        teamL_v_opp=0
        for week in set(teamStats_v_opp['Week']):
            #Count wins and losses
            if teamStats_v_opp[teamStats_v_opp['Week']==week]['Outcome'].iloc[0]=='Win':
                teamW_v_opp+=1
            else:
                teamL_v_opp+=1               
                 
        teamWL_v_opp=(teamW_v_opp,teamL_v_opp)
              
        return (teamAverage,teamWL,teamAverage_v_opp,teamWL_v_opp)
    else:
        return (teamAverage,teamWL)

Now we need to find the team's average stats against the opposing team, and the player's average stats against the opposing team.  Then, we'll create a list of samples and features.

In [227]:
#Making trainingset with these features

#Note: player and team stats are NaN if there is no history to average over
import time


trainingSamples=pd.DataFrame(columns=('Player','Position','Week','Games Benched',\
                                      'Is Benched','Team','TeamWL','TeamWL_v_opp',\
                                      'Teammates','Opponent','OpponentWL','Opp Avg Stats',\
                                      'Opp Avg Stats v Team','Opp Players','At Home',\
                                      'Player Avg Stats','Player Avg v Opp','Team Avg Stats',\
                                      'Team Avg v Opp','Stat Outcome'))

t0 = time.time()
num_samps=1000
current_row=0
for row in range(2000,2000+num_samps):
    position = gameStats['Position'].ix[row]

    #We only care about these positions
    if position in ['DEF','K','QB','RB','WR','TE']:
        player = gameStats['Player'].ix[row]
        team = gameStats['Team'].ix[row]
        opp = gameStats['Opponent'].ix[row]
        week = gameStats['Week'].ix[row]
        is_benched = gameStats['Player Benched'].ix[row]
        teammates = gameStats['Team Players'].ix[row]
        opp_players = gameStats['Opponent Players'].ix[row] 
        at_home = gameStats['At Home'].ix[row]
        outcome_stats = list(gameStats.ix[row,10:])
        
        num_weeks=17 #number of weeks to average over
        player_stats, num_benched, player_stats_v_opp = FindPlayerAverage(num_weeks,player,week,opp)
        team_stats, team_WL, team_stats_v_opp, team_WL_v_opp = FindTeamAverage(num_weeks,team,week,opp)
        opp_stats, opp_WL, opp_stats_v_team, opp_WL_v_team = FindTeamAverage(num_weeks,opp,week,team)

        trainingSamples.loc[current_row]=[player, position, week, num_benched, is_benched, team, team_WL, \
                        team_WL_v_opp, teammates, opp, opp_WL, opp_stats, opp_stats_v_team,\
                        opp_players, at_home, player_stats, player_stats_v_opp, team_stats,\
                        team_stats_v_opp, outcome_stats]
 
        current_row += 1

t1 = time.time()
total = t1-t0

print "Total Run Time for ", num_samps, " samples is " , total, " seconds."
print "Avering ", total/num_samps, " seconds per sample.  Estimated ful run time is ", len(gameStats)*total/num_samps/60/60, " hours per year of data"

Total Run Time for  1000  samples is  13.0325031281  seconds.
Avering  0.0130325031281  seconds per sample.  Estimated ful run time is  0.438087592649  hours per year of data


In [228]:
trainingSamples.head(n=5)

Unnamed: 0,Player,Position,Week,Games Benched,Is Benched,Team,TeamWL,TeamWL_v_opp,Teammates,Opponent,OpponentWL,Opp Avg Stats,Opp Avg Stats v Team,Opp Players,At Home,Player Avg Stats,Player Avg v Opp,Team Avg Stats,Team Avg v Opp,Stat Outcome
0,DeAngelo Williams,RB,2,0,False,CAR,"(0, 1)","(0, 0)","[Captain Munnerlyn, Charles Godfrey, Charles J...",ATL,"(1, 0)","[229.0, 2.0, 0.0, 0.0, 68.0, 0.0, 0.0, 229.0, ...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...","[Brent Grimes, Brian Finneran, Brian Williams,...",False,"[0.0, 0.0, 0.0, 0.0, 37.0, 1.0, 0.0, 42.0, 4.0...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...","[138.0, 0.0, 0.0, 5.0, 82.0, 1.0, 0.0, 197.0, ...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...","[0.0, 0.0, 0.0, 0.0, 79.0, 1.0, 0.0, 32.0, 3.0..."
1,Jonathan Stewart,RB,2,0,False,CAR,"(0, 1)","(0, 0)","[Captain Munnerlyn, Charles Godfrey, Charles J...",ATL,"(1, 0)","[229.0, 2.0, 0.0, 0.0, 68.0, 0.0, 0.0, 229.0, ...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...","[Brent Grimes, Brian Finneran, Brian Williams,...",False,"[0.0, 0.0, 0.0, 0.0, 35.0, 0.0, 0.0, 32.0, 2.0...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...","[138.0, 0.0, 0.0, 5.0, 82.0, 1.0, 0.0, 197.0, ...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...","[0.0, 0.0, 0.0, 0.0, 65.0, 0.0, 0.0, 14.0, 3.0..."
2,Steve Smith,WR,2,0,False,CAR,"(0, 1)","(0, 0)","[Captain Munnerlyn, Charles Godfrey, Charles J...",ATL,"(1, 0)","[229.0, 2.0, 0.0, 0.0, 68.0, 0.0, 0.0, 229.0, ...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...","[Brent Grimes, Brian Finneran, Brian Williams,...",False,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 80.0, 6.0,...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...","[138.0, 0.0, 0.0, 5.0, 82.0, 1.0, 0.0, 197.0, ...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 134.0, 10...."
3,ATL_defense,DEF,2,0,False,ATL,"(1, 0)","(0, 0)","[Brent Grimes, Brian Finneran, Brian Williams,...",CAR,"(0, 1)","[138.0, 0.0, 0.0, 5.0, 82.0, 1.0, 0.0, 197.0, ...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...","[Captain Munnerlyn, Charles Godfrey, Charles J...",True,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...","[229.0, 2.0, 0.0, 0.0, 68.0, 0.0, 0.0, 229.0, ...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
4,CAR_defense,DEF,2,0,False,CAR,"(0, 1)","(0, 0)","[Captain Munnerlyn, Charles Godfrey, Charles J...",ATL,"(1, 0)","[229.0, 2.0, 0.0, 0.0, 68.0, 0.0, 0.0, 229.0, ...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...","[Brent Grimes, Brian Finneran, Brian Williams,...",False,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...","[138.0, 0.0, 0.0, 5.0, 82.0, 1.0, 0.0, 197.0, ...","[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."


In [None]:
print 10*len(gameStats)/3827, 'minutes per year of samples'