Teams have 9 active players, and 7 bench players.  They are broken down as follows:


| Position  | Number Active  |  Max on Team |
|:----------:|:-------------:| :-------------:|
| Quarterback (QB) |  1 | 4 |
| Running Back (RB) | 2  | 8 |
| Wide Receiver (WR) | 2 | 8 |
| Tight End (TE) | 1| 3 |
| Flex (RB/WR/TE) | 1 | N/A |
| Team Defense (D/ST) | 1 | 3 |
| Place Kicker (K) | 1 | 3 |


Here are how points are awarded to each player (not including kicker and defense scoring):

| Stat  | Points  | 
|:----------:|:-------------:|
| Passing Yards (PY) |	0.04 | 
| TD Pass (PTD)	     |  6    | 
| 2pt Passing Conversion (2PC) | 2 | 
| Interceptions Thrown (INT) |	-2	|
| Rushing Yards (RY)	| 0.1 | 
| TD Rush (RTD)	        | 6 | 
| 2pt Rushing Conversion (2PR) | 2	| 
| Receiving Yards (REY)	| 0.1 | 
| Each reception (REC)	| 1 | 
| TD Reception (RETD)	| 6	| 
| 2pt Receiving Conversion (2PRE) | 2 | 
| Kickoff Return TD (KRTD)	| 6 | 
| Punt Return TD (PRTD)	| 6 | 
| Fumble Recovered for TD (FTD)	| 6	|
| Total Fumbles Lost (FUML)	| -2 | 


For a first attempt, let's get all of these stats for a particular quarter back, `T.Brady` in a particular game. We'll follow http://blog.burntsushi.net/nfl-live-statistics-with-python/.

In [None]:
import nfldb
import pandas as pd
db = nfldb.connect()

In [None]:
#From https://github.com/BurntSushi/nfldb/wiki/Statistical-categories
q = nfldb.Query(db)
q.game(season_year=2015, season_type='Regular')
q.player(full_name='Adrian Peterson')
for stats in q.as_aggregate():
        print 'Player: ' , stats.player
        print ''
        print '---Scoring Information---'
        print 'Passing Yards: ' , stats.passing_yds
        print 'TD Passes: ' , stats.passing_tds
        print 'Passing Two-point Conversions: ' , stats.passing_twoptm
        print 'Interceptions Thrown: ' , stats.passing_int
        print 'Rushing Yards: ' , stats.rushing_yds
        print 'Rushing TDs: ' , stats.rushing_tds
        print 'Rushing Two-point Conversions: ' , stats.rushing_twoptm
        print 'Receiving yards: ' , stats.receiving_yds
        print 'Receptions: ' , stats.receiving_rec
        print 'TD Reception: ' , stats.receiving_tds
        print 'Receiving Two-point Conversions: ' , stats.receiving_twoptm
        print 'Kickoff return touchdowns: ' , stats.kickret_tds
        print 'Punt return touchdowns: ' , stats.puntret_tds
        print 'Fumble Return TD: ', stats.fumbles_rec_tds
        print 'Total Fumbles Lost: ' , stats.fumbles_lost
        print ''
        print '---Additional QB Information---'
        print 'Passing Attempts: ', stats.passing_att
        print 'Passing Completions: ', stats.passing_cmp
        print 'Passing Incompletions: ', stats.passing_incmp
        print 'Total yards of passing in air: ' , stats.passing_cmp_air_yds
        print 'Number of times sacked: ' , stats.passing_sk
        print 'Number of yards lost while sacked: ' , stats.passing_sk_yds
        print 'Two point conversion attempts: ' , stats.passing_twopta
        print ''
        print '---Additional WR Information---'
        print 'Number of targets: ' , stats.receiving_tar
        print 'Number of two point conversion attempts: ', stats.receiving_twopta
        print 'Additional yardage after catch: ' , stats.receiving_yac_yds
        print ''
        print '---Additional RB Information---'
        print 'Rushing attempts: ', stats.rushing_att
        print 'Number of rushing losses: ', stats.rushing_loss #NOT WORKING
        print 'Yards of rushign losses: ', stats.rushing_loss_yds #NOT WORKING
        print 'Rushing two point conversion attempts: ', stats.rushing_twopta
        print ''
        print ''

        
    
'''
Kicking stats:    
PAT attempts: kicking_xpa
PATs Made (PAT): kicking_xpmade
PATs Missed: kicking_xpmissed
FG attempts: kicking_fga    
Total FG Made: kicking_fgm
Total FG Missed: kicking_fgmissed
Each FG Yards: kicking_fgm_yds
Each Missed FG Yards: kicking_fgmissed_yds
All FG yards: kicking_all_yds
    
Defense/Special Teams stats:
Interception Return TD (INTTD): defense_int_tds
Blocked Punt or FG return for TD (BLKKRTD): defense_misc_tds
Each Sack (SK): defense_sk
Fumble Return TD (FRTD): defense_frec_tds
Blocked Punt, PAT or FG (BLKK): defense_puntblk, defense_fgblk, defense_xpblk
Each Interception (INT): defense_int
Each Fumble Recovered (FR): defense_frec
Each Safety (SF): defense_safe
yards allowed:
points allowed:
Kickoff Return TD: kickret_tds
Punt Return TD (PRTD): puntret_tds
    
number of fumbles forced: defense_ffum
Defense blocked a pass: defense_pass_def
yards gains after an interception: defense_int_yds
Number of tackles behind scrimmage line: defense_tkl_loss
Defense caused yards lost behind scrimmage line: defense_tkl_loss_yds
'''
    
    
#No stats on 2 point returns by defense, or 1 point safeties... doesn't matter since they're rare
#rushing_loss_yds and rushing_loss aren't working... bummer

We could try to predict fantasy points, or try to predict stat outcomes themselves.  I think predicting stats has more value, since the result can be adapted to any fantasy scoring method people come up with.   It also has the added benefit of realizing when a particular defense is strong against running, or passing, or both rather than just strong in general.  The downside is that it may be harder to predict individual stats than it would be to predict scores.

Let's give predicting stats a shot and see how it works out.  As a first attempt, we should just try to predict something like rushing yards.  First, we need to produce a list of samples, including features and their outcome.  The samples will be stats for an individual player for an individual game.  We'll scan all games played one at a time to collect this information.  What should the features we use for our prediction be?


To start, we can try these features:

* DONE: Average relevant stats for the player (From the last X games prior to a game)
* DONE: Player's team
* DONE: Opponent team
* DONE: Team at home or away
* DONE: Player's average stat against this oppenent
* DONE: Team's average stat against this oppenent (To help account for players switching teams)
* DONE: Player's position (QB's don't rush much, for instance.  This is covered by average stat to begin with)

Here are some more stats we might add in the future.  These may be harder to implement but are worth looking into:

* DONE: Player's injury status
* DONE: Player's bench status
* DONE: (no coach) Player's coach, other players on his team
* DONE: (no coach) Oppenent's coach, and other players on opponent team

Let's loop through each game and produce our samples for the stats we mentioned in the first list. We'll need to write some functions to calculate the average of the players stats.  **Keep in mind we need to account for unexpected injuries and by weeks... probably just don't include them in the samples at first!**

In [2]:
import numpy as np
import nfldb
import pandas as pd
import collections
db = nfldb.connect()

  (fname, cnt))


In [3]:
def zeroStats():

    #List of relevant stats
    this_game_stats = collections.OrderedDict()
    this_game_stats['passing_yds']= 0
    this_game_stats['passing_tds']= 0
    this_game_stats['passing_twoptm']= 0
    this_game_stats['passing_int']= 0
    this_game_stats['rushing_yds']= 0
    this_game_stats['rushing_tds']= 0
    this_game_stats['rushing_twoptm']= 0
    this_game_stats['receiving_yds']= 0
    this_game_stats['receiving_rec']= 0
    this_game_stats['receiving_tds']= 0
    this_game_stats['receiving_twoptm']= 0
    this_game_stats['kickret_tds']= 0
    this_game_stats['puntret_tds']= 0
    this_game_stats['fumbles_rec_tds']= 0
    this_game_stats['fumbles_lost']= 0
    this_game_stats['passing_att']= 0
    this_game_stats['passing_cmp']= 0
    this_game_stats['passing_incmp']=0
    this_game_stats['passing_cmp_air_yds']= 0
    this_game_stats['passing_sk']= 0
    this_game_stats['passing_sk_yds']= 0
    this_game_stats['passing_twopta']= 0
    this_game_stats['receiving_tar']= 0
    this_game_stats['receiving_twopta']= 0
    this_game_stats['receiving_yac_yds']= 0
    this_game_stats['rushing_att']= 0
    this_game_stats['rushing_twopta']= 0
    this_game_stats['kicking_xpa']= 0
    this_game_stats['kicking_xpmade']= 0
    this_game_stats['kicking_xpmissed']= 0
    this_game_stats['kicking_fga']= 0
    this_game_stats['kicking_fgm']= 0
    this_game_stats['kicking_fgmissed']= 0
    this_game_stats['kicking_fgm_yds']= 0
    this_game_stats['kicking_fgmissed_yds']= 0
    this_game_stats['kicking_all_yds']= 0
    this_game_stats['defense_int_tds']= 0
    this_game_stats['defense_misc_tds']= 0
    this_game_stats['defense_sk']= 0
    this_game_stats['defense_frec_tds']= 0
    this_game_stats['defense_puntblk']= 0
    this_game_stats['defense_fgblk']= 0
    this_game_stats['defense_xpblk']= 0
    this_game_stats['defense_int']= 0
    this_game_stats['defense_frec']= 0
    this_game_stats['defense_safe']= 0
    this_game_stats['defense_ffum']= 0
    this_game_stats['defense_pass_def']= 0
    this_game_stats['defense_int_yds']= 0
    this_game_stats['defense_tkl_loss']= 0
    this_game_stats['defense_tkl_loss_yds']= 0


    #These will always be zero since they don't exist in nfldb
    this_game_stats['defense_kickret_tds']=0
    this_game_stats['defense_puntret_tds']=0
    this_game_stats['defense_rushing_yds_allowed']= 0
    this_game_stats['defense_passing_yds_allowed']=0 
    this_game_stats['defense_total_yds_allowed']=0
    this_game_stats['defense_rushing_tds_allowed']= 0
    this_game_stats['defense_passing_tds_allowed']=0       
    this_game_stats['defense_fga_allowed']=0   
    this_game_stats['defense_points_allowed']=0   

    
    return this_game_stats

In [4]:
'''This populates a dataframe with week-by-week stats on all players in all games'''

#The last four are not actual stats in nfldb... need to derive them somehow
gameStats = pd.DataFrame(columns=('Player','PlayerID','Position','Inferred Position','Week','Team','At Home','Opponent','Outcome',\
                                  'Team Players','Opponent Players','Player Benched',\
                                  'passing_yds','passing_tds','passing_twoptm',\
                                  'passing_int','rushing_yds','rushing_tds',\
                                  'rushing_twoptm','receiving_yds','receiving_rec',\
                                  'receiving_tds','receiving_twoptm','kickret_tds',\
                                  'puntret_tds','fumbles_rec_tds','fumbles_lost',\
                                  'passing_att','passing_cmp','passing_incmp','passing_cmp_air_yds',\
                                  'passing_sk','passing_sk_yds','passing_twopta',\
                                  'receiving_tar','receiving_twopta','receiving_yac_yds',\
                                  'rushing_att','rushing_twopta','kicking_xpa', \
                                  'kicking_xpmade','kicking_xpmissed','kicking_fga',\
                                  'kicking_fgm','kicking_fgmissed','kicking_fgm_yds',\
                                  'kicking_fgmissed_yds','kicking_all_yds',\
                                  'defense_int_tds','defense_misc_tds','defense_sk',\
                                  'defense_frec_tds','defense_puntblk','defense_fgblk',\
                                  'defense_xpblk','defense_int','defense_frec','defense_safe',\
                                  'defense_ffum','defense_pass_def','defense_int_yds',\
                                  'defense_tkl_loss','defense_tkl_loss_yds',\
                                  'defense_kickret_tds','defense_puntret_tds',\
                                  'defense_rushing_yds_allowed','defense_passing_yds_allowed',\
                                  'defense_total_yds_allowed','defense_rushing_tds_allowed',\
                                  'defense_passing_tds_allowed','defense_fga_allowed','defense_points_allowed'))


aggregate_stats=['defense_int_tds','defense_misc_tds','defense_sk','defense_frec_tds',\
               'defense_frec_tds','defense_puntblk','defense_fgblk',\
               'defense_xpblk','defense_int','defense_frec','defense_safe',\
               'defense_ffum','defense_pass_def','defense_int_yds',\
               'defense_tkl_loss','defense_tkl_loss_yds']

current_row = 0
current_week = 1

#Connect to the database (only goes back to 2009)
for year in range(2009,2016):
    for week_num in range (1,18):

        cur_query= nfldb.Query(db)
        cur_query.game(season_year=year, season_type='Regular', week=week_num)
        for info in cur_query.as_games():  
            
            #Get the id of the current game
            game_id = info.gsis_id
            
            #Make a list of home and away players
            home_players=[]
            away_players=[]
            for player in range(len(info.players)):
                player_name = (str(info.players[player][1]).split(' (')[0])
                team = str(info.players[player][0])
                if (info.home_team == team):
                    home_players.append(player_name)
                else:
                    away_players.append(player_name)

            #For each player in the game, get the stats
            for player in range(len(info.players)):
                player_name = (str(info.players[player][1]).split(' (')[0])
                player_id = info.players[player][1].player_id
                position = (str(info.players[player][1]).split(' (')[1].split(', ')[1].split(')')[0])
                                
                #Try to guess unknown positions
                if position == "UNK":
                    cur_query=nfldb.Query(db)
                    cur_query.game(season_year=range(year-1,year+2), season_type='Regular')
                    all_pos_guess=[]
                    for guess_pos_info in cur_query.player(full_name=player_name,player_id=player_id).as_play_players():
                        all_pos_guess.append(str(guess_pos_info.guess_position))
                    if all_pos_guess:
                        guess_position = max(set(all_pos_guess), key=all_pos_guess.count)
                    else:
                        guess_position = "UNK"
                else:
                    guess_position = position

                team = str(info.players[player][0])
                team_is_home = (info.home_team == team)
                if team_is_home:
                    opponent = info.away_team
                    team_players = home_players
                    opponent_players = away_players
                else:
                    opponent = info.home_team
                    team_players = away_players
                    opponent_players = home_players

                if info.winner == team:
                    outcome = 'Win'
                else:
                    outcome = 'Loss'

                #List of relevant stats
                this_game_stats = zeroStats()
                
                #Query DB for more info.
                #Have to provide game_id to deal with players with the same name
                #Also provide player_id incase same named players are in the same game
                p_query= nfldb.Query(db)
                p_query.game(season_year=year, season_type='Regular', week=week_num, gsis_id=game_id)
                for p_info in p_query.player(full_name=player_name,player_id=player_id).as_aggregate():
                            
                    #If there is no info, they didn't play
                    if p_info:
                        for stat in this_game_stats:
                            if stat in dir(p_info):
                                this_game_stats[stat] = eval('p_info.%s' % stat)
                        
                        #If all stats are zero, claim the player was benched or injured
                        if max([this_game_stats[key] for key in gameStats.columns[12:]]) > 0:
                            suspected_bench = False
                        else:
                            suspected_bench = True
                            
                        player_id=p_info.player_id    
                    else:
                        #If no stats available, claim the player was benched (also included by weeks)
                        for stat in this_game_stats:
                            this_game_stats[stat] = None 
                        suspected_bench = True
                        player_id=None

                output_list = [player_name, player_id, position, guess_position, current_week, team, team_is_home, opponent, outcome, \
                               team_players, opponent_players, suspected_bench] + \
                              [this_game_stats[key] for key in gameStats.columns[12:]]         
                gameStats.loc[current_row] = output_list                
                current_row+=1 
                 
            #After looping through all the players calculate defense team stats            
            home_stats = gameStats[(gameStats['Team'] == info.home_team) & (gameStats['Week'] == current_week)]
            away_stats = gameStats[(gameStats['Team'] == info.away_team) & (gameStats['Week'] == current_week)]
           
            #First for home team
            player_name = info.home_team + '_defense'
            position = 'DEF'
            team = info.home_team
            team_is_home = True
            opponent = info.away_team
            if info.winner == info.home_team:
                outcome="Win"
            else:
                outcome="Loss"
            team_players=home_players
            opponent_players=away_players
            suspected_bench=False
            
            this_game_stats = zeroStats()
            
            for key in aggregate_stats:
                this_game_stats[key]=sum(home_stats[key])
    
            this_game_stats['defense_rushing_yds_allowed']= sum(away_stats['rushing_yds'])
            this_game_stats['defense_passing_yds_allowed']=sum(away_stats['passing_yds'])
            this_game_stats['defense_total_yds_allowed']=sum(away_stats['passing_yds']) + sum(away_stats['rushing_yds'])         
            this_game_stats['defense_rushing_tds_allowed']= sum(away_stats['rushing_tds'])
            this_game_stats['defense_passing_tds_allowed']=sum(away_stats['passing_tds'])               
            this_game_stats['defense_points_allowed']=info.away_score
            this_game_stats['defense_kickret_tds']=sum(home_stats['kickret_tds'])
            this_game_stats['defense_puntret_tds']=sum(home_stats['puntret_tds'])
            this_game_stats['defense_fga_allowed']=sum(away_stats['kicking_fga'])
            
            #DEF doesn't need a player id
            player_id='DEF'
            
            output_list = [player_name, player_id, position, position, current_week, team, team_is_home, opponent, outcome, \
                               team_players, opponent_players, suspected_bench] + \
                              [this_game_stats[key] for key in gameStats.columns[12:]]         
            gameStats.loc[current_row] = output_list                
            current_row+=1         
            
            #Now for away team
            player_name = info.away_team + '_defense'
            position = 'DEF'
            team = info.away_team
            team_is_home = False
            opponent = info.home_team
            if info.winner == info.away_team:
                outcome="Win"
            else:
                outcome="Loss"
            team_players=away_players
            opponent_players=home_players
            suspected_bench=False
            
            this_game_stats = zeroStats()
            
            for key in aggregate_stats:
                this_game_stats[key]=sum(away_stats[key])
    
            this_game_stats['defense_rushing_yds_allowed']= sum(home_stats['rushing_yds'])
            this_game_stats['defense_passing_yds_allowed']=sum(home_stats['passing_yds'])
            this_game_stats['defense_total_yds_allowed']=sum(home_stats['passing_yds']) + sum(home_stats['rushing_yds'])         
            this_game_stats['defense_rushing_tds_allowed']= sum(home_stats['rushing_tds'])
            this_game_stats['defense_passing_tds_allowed']=sum(home_stats['passing_tds'])                        
            this_game_stats['defense_points_allowed']=info.home_score
            this_game_stats['defense_kickret_tds']=sum(away_stats['kickret_tds'])
            this_game_stats['defense_puntret_tds']=sum(away_stats['puntret_tds'])
            this_game_stats['defense_fga_allowed']=sum(home_stats['kicking_fga'])
            
            #DEF doesn't need a player id
            player_id='DEF'
            
            output_list = [player_name, player_id, position, position, current_week, team, team_is_home, opponent, outcome, \
                               team_players, opponent_players, suspected_bench] + \
                              [this_game_stats[key] for key in gameStats.columns[12:]]         
            gameStats.loc[current_row] = output_list                
            current_row+=1 
            
        current_week += 1
     
    #Save each time we finish a year
    gameStats.to_csv('gameStats_v2.csv')


# Extracting Features

In [None]:
import numpy as np
import nfldb
import pandas as pd
import collections
db = nfldb.connect()
gameStats = pd.read_csv('gameStats.csv')
gameStats.drop('Unnamed: 0', axis=1, inplace=True)

aggregate_stats=['defense_int_tds','defense_misc_tds','defense_sk','defense_frec_tds',\
               'defense_frec_tds','defense_puntblk','defense_fgblk',\
               'defense_xpblk','defense_int','defense_frec','defense_safe',\
               'defense_ffum','defense_pass_def','defense_int_yds',\
               'defense_tkl_loss','defense_tkl_loss_yds']

In [None]:
def FindPlayerAverage(num_weeks,player,current_week,num_team_games,opp=None):
    '''Averages a players stats over the past num_weeks
    If we dont have num_weeks of data, it goes back as far as possible
    
    Inputs:
    num_weeks -> number of weeks to average over
    player    -> player whose stats we want to average
    current_week -> The corresponding week in gameStats['Week'] for the sample
    num_team_games -> Number of games the player's team played in past num_weeks (obtained from FindTeamAverage)
    opp       -> optional.  Specify opponent to average over games versus that opponent
    '''
    start_week = current_week - num_weeks + 1
    playerStats = gameStats[(gameStats['Player']== player) & (gameStats['Week']>= start_week) & (gameStats['Week']< current_week)].reset_index(drop=True)

    #Find number of games benched - IF A PLAYER IS TRADED, IT WILL ASSUME THEY WERE BENCHED
    #THIS TRICKY TO DEAL WITH.. JUST DON'T USE IT AS A FEATURE FOR NOW
    num_benched=num_team_games - len(playerStats[playerStats['Player Benched']==False])

    #Remove games where they player didn't play
    playerStats=playerStats[playerStats['Player Benched']==False]

    #Only use games against the opponent
    if opp:
        playerStats = playerStats[playerStats['Opponent']==opp]

    #Average the remaining stats
    playerAverage=[np.mean(playerStats[key]) for key in playerStats.columns[10:]]
    playerStd=[np.std(playerStats[key]) for key in playerStats.columns[10:]]
    num_games_in_avg = len(playerStats)
    
    return (playerAverage,playerStd,num_games_in_avg, num_benched)


In [None]:
def CountWL(dframe):
    '''Given a dataframe of gameStats, counts numebr of wins and losses'''
    #Counting wins and losses
    teamW=0
    teamL=0
    for week in set(dframe['Week']):
        #Count wins and losses
        if dframe[dframe['Week']==week]['Outcome'].iloc[0]=='Win':
            teamW+=1
        else:
            teamL+=1  
            
    return (teamW,teamL)

def AvgFromWeekly(dframe):
    '''Given a dataframe of gameStats, calculates average stats
       Note: Some defense stats are aggregates of other from the team already
       so we need to divide by 2 so we don't double count
    '''
    weeklyStats=dframe.ix[:,10:]
    weeklyStats['Week']=dframe['Week']
    weeklyStats=weeklyStats.groupby(['Week']).sum()
    for key in aggregate_stats:
        weeklyStats[key]=weeklyStats[key]/2    
    
    #Average the remaining stats
    WeeklyAvg=[np.mean(weeklyStats[key]) for key in weeklyStats.columns]
    WeeklyStd=[np.std(weeklyStats[key]) for key in weeklyStats.columns]
    num_games=len(weeklyStats) 
    
    return (WeeklyAvg,WeeklyStd,num_games)

In [None]:
def FindTeamAverage(num_weeks,team,current_week,opp=None):
    '''Averages a teams stats over the past num_weeks
    If we dont have num_weeks of data, it goes back as far as possible
    
    Inputs:
    num_weeks -> number of weeks to average over
    team      -> team whose stats we want to average
    current_week -> The corresponding week in gameStats['Week'] for the sample    
    opp       -> optional.  Specify opponent to average over games versus that opponent
    '''
    
    start_week = current_week - num_weeks + 1
    
    teamStats = gameStats[(gameStats['Team']== team) & (gameStats['Week']>= start_week) & (gameStats['Week']< current_week)].reset_index(drop=True)
   
    #only use games against opp
    if opp:
        teamStats = teamStats[teamStats['Opponent']==opp]
    
    teamAverage, teamStd, num_games = AvgFromWeekly(teamStats)
    teamWL = CountWL(teamStats)

    return (num_games,teamAverage,teamStd, teamWL)

Now we need to find the team's average stats against the opposing team, and the player's average stats against the opposing team.  Then, we'll create a list of samples and features.

In [None]:
#Making trainingset with these features

#Note: player and team stats are NaN if there is no history to average over


trainingSamples=pd.DataFrame(columns=('Player','Position','Inferred Position','Week','Games Benched',\
                                      'Is Benched','Team','TeamWL','TeamWL_v_opp',\
                                      'Teammates','Opponent','OpponentWL',\
                                      'Opp Avg Stats','Opp Stat Std','Num Opp Games',\
                                      'Opp Avg Stats v Team','Opp Stat Std v Team','Num Opp Games v Team',\
                                      'Opp Players','At Home',\
                                      'Player Avg Stats','Player Stat Std','Num Player Games',\
                                      'Player Avg Stats v Opp','Player Stat Std v Opp','Num Player Games v Opp',\
                                      'Team Avg Stats','Team Stat Std','Num Team Games',\
                                      'Team Avg v Opp','Team Stat Std v Opp','Num Team Games v Opp',\
                                      'Stat Outcome'))

num_samps=len(gameStats)
current_row=0
for row in range(0,num_samps):
    position = gameStats['Position'].ix[row]
    is_benched = gameStats['Player Benched'].ix[row]
    week = gameStats['Week'].ix[row]
    year=int(2009+round(week/17))
    player = gameStats['Player'].ix[row]
    
    if position == "UNK":
        cur_query= nfldb.Query(db)
        cur_query.game(season_year=range(year-1,year+2), season_type='Regular')
        all_pos_guess=[]
        for info in cur_query.player(full_name=player).as_play_players():
            all_pos_guess.append(str(info.guess_position))
        
        if all_pos_guess:
            guess_position = max(set(all_pos_guess), key=all_pos_guess.count)
        else:
            guess_position = "UNK"
    else:
        guess_position = position
    
    #We only care about these positions
    if (guess_position in ['DEF','K','QB','RB','WR','TE']) and not is_benched:
        team = gameStats['Team'].ix[row]
        opp = gameStats['Opponent'].ix[row]
        teammates = gameStats['Team Players'].ix[row]
        opp_players = gameStats['Opponent Players'].ix[row] 
        at_home = gameStats['At Home'].ix[row]
        outcome_stats = list(gameStats.ix[row,10:])
        
        num_weeks=25 #Average over last 1.5 seasons
        
        #Find general average
        num_team_games, team_stats, team_stats_std, team_WL = FindTeamAverage(num_weeks,team,week)
        player_stats, player_stats_std, num_player_games, num_benched = FindPlayerAverage(num_weeks,player,week,num_team_games)
        num_opp_games, opp_stats, opp_stats_std, opp_WL = FindTeamAverage(num_weeks,opp,week)
        
        #Find average v team
        num_week_v_opp=51 #average over last 3 seasons
        num_team_games_v_opp, team_stats_v_opp, team_stats_std_v_opp, team_WL_v_opp = FindTeamAverage(num_week_v_opp,team,week,opp)
        player_stats_v_opp, player_stats_std_v_opp, num_player_games_v_opp, num_benched_v_opp = FindPlayerAverage(num_week_v_opp,player,week,num_team_games_v_opp,opp)
        num_opp_games_v_team, opp_stats_v_team, opp_stats_std_v_opp, opp_WL_v_team = FindTeamAverage(num_week_v_opp,opp,week,team)

        trainingSamples.loc[current_row]=[player, position, guess_position, week, num_benched,\
                                          is_benched, team, team_WL, team_WL_v_opp,\
                                          teammates, opp, opp_WL,\
                                          opp_stats, opp_stats_std, num_opp_games,\
                                          opp_stats_v_team,opp_stats_std_v_opp,num_opp_games_v_team,\
                                          opp_players, at_home,\
                                          player_stats,player_stats_std, num_player_games,\
                                          player_stats_v_opp, player_stats_std_v_opp, num_player_games_v_opp,\
                                          team_stats,team_stats_std, num_team_games,\
                                          team_stats_v_opp, team_stats_std_v_opp, num_team_games_v_opp,\
                                          outcome_stats]


        current_row += 1

    if (current_row % 1000) == 0:
        trainingSamples.to_csv('TrainingSamples.csv')

In [None]:
#Need to save the meaning of each player_stat and team_stat entry
import pickle
stat_order = list(gameStats.columns[12:])

with open("stat_order.pickle",'wb') as f:
    pickle.dump(stat_order,f)