# NFL Survivor Pool Model

**By: Calvin Walters**

## Rules

Each week you will pick an NFL team to win their game. You cannot duplicate picks throughout the season. There will be two picks each for weeks 5, 7, 10, 12, and 15. In essence, you will have to pick 23/32 NFL teams throughout the season. If your team’s game ends in a tie, that counts as a win as well. Each player gets three strikes; once you reach that limit you are eliminated.

## Purpose

A rudimentary strategy for NFL Survivor would be to play week-by-week, selecting whichever remaining team has the highest win probability for the current week.  This strategy is prone to failure though, as it fails to look ahead to the future.  My model looks ahead of the NFL schedule, identifying the highest quality picks for not just the current week, but the entire season.  Selecting 23 of the 32 NFL teams to win a game throughout the competition is a challenging task, as there are not 23 reliable teams on a week-by-basis.  My model accounts for this challenge of the contest by optimizing a contestant's selection schedule to pick the less reliable teams when they are up against even less capable opponents.

## Model Methodology

In a perfect world, we could simply choose the team (or 2 teams) with the greatest win probability to win on a week-by-week basis.  But this method is not plausible due to the non-duplicate team constraint.  Strong teams with a high probability to win one week are likely to also have a high probability to win in other weeks.  As the season progresses and the pool of available teams to select dwindles, the chances of the team with the greatest win probability for that week still being available to select decreases.  This model optimizes the compound win probability for all future picks by looking ahead for favorable matchups and identifying comparative advantages.

In a given week, the team with the greatest probability to win may be passed over by the model in favor of a team with a lower probability to win if there is a comparative advantage present between the two teams relating to a future week.  The compound win probability of the resulting combination of picks would be greater than the compound win probability when we always choose the team with the greatest win probability at the current week.


**Example:**

Week 1...  
Team1 Win Probability = 90%  
Team2 Win Probability = 75%  

Week 2...  
Team1 Win Probability = 85%  
Team2 Win Probability = 55%  

Even though Team1 has the greatest win probability in Week1, it is more efficient to pick Team2, as they have a comparative advantage when compared to the Week 2 win probabilities between the two teams.  
There is a 49.5% compound probability by selecting Team1 followed by Team2, compared to a 63.75% compound probability by selecting Team2 followed by Team1.



## Import Packages

In [1905]:
import pandas as pd
import numpy as np

## Import Data from FiveThirtyEight
The data comes from FiveThirtyEight's quarterback-adjusted Elo forecast. (https://projects.fivethirtyeight.com/2021-nfl-predictions/games/)

In [1906]:
raw = pd.read_csv('nfl_elo_latest.csv')

## Clean Data

In [1907]:
# keep relevant columns
full = raw[['date', 'team1', 'team2', 'qbelo_prob1', 'qbelo_prob2']]

In [1908]:
# convert Date column to Daytetime format
full['date'] = pd.to_datetime(full['date'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  full['date'] = pd.to_datetime(full['date'])


In [1909]:
# create Week variable to track game week of NFL season
week = []
for date in full['date']:
    if date < pd.Timestamp('2021-09-16'):
        week.append(1)
    elif date < pd.Timestamp('2021-09-23'):
        week.append(2)
    elif date < pd.Timestamp('2021-09-30'):
        week.append(3)
    elif date < pd.Timestamp('2021-10-07'):
        week.append(4)
    elif date < pd.Timestamp('2021-10-14'):
        week.append(5)
    elif date < pd.Timestamp('2021-10-21'):
        week.append(6)
    elif date < pd.Timestamp('2021-10-28'):
        week.append(7)
    elif date < pd.Timestamp('2021-11-04'):
        week.append(8)
    elif date < pd.Timestamp('2021-11-11'):
        week.append(9)
    elif date < pd.Timestamp('2021-11-18'):
        week.append(10)
    elif date < pd.Timestamp('2021-11-25'):
        week.append(11)
    elif date < pd.Timestamp('2021-12-02'):
        week.append(12)
    elif date < pd.Timestamp('2021-12-09'):
        week.append(13)
    elif date < pd.Timestamp('2021-12-16'):
        week.append(14)
    elif date < pd.Timestamp('2021-12-23'):
        week.append(15)
    elif date < pd.Timestamp('2022-01-02'):
        week.append(16)
    elif date < pd.Timestamp('2022-01-09'):
        week.append(17)
    else:
        week.append(18)

full['Week'] = week

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  full['Week'] = week


In [1910]:
# drop date column in favor of Week column
full = full.drop(columns = ['date'])

In [1911]:
# create columns for Winner and Loser of each game

winners = []
losers = []
winner_wprob = []
loser_wprob = []

for i, row in full.iterrows():
    if row['qbelo_prob1'] > row['qbelo_prob2']:
        winner_wprob.append(row['qbelo_prob1'])
        loser_wprob.append(row['qbelo_prob2'])
        winners.append(row['team1'])
        losers.append(row['team2'])
    elif row['qbelo_prob1'] < row['qbelo_prob2']:
        winner_wprob.append(row['qbelo_prob2'])
        loser_wprob.append(row['qbelo_prob1'])
        winners.append(row['team2'])
        losers.append(row['team1'])
        
full['Winner'] = winners
full['Loser'] = losers
full['winner_wprob'] = winner_wprob
full['loser_wprob'] = loser_wprob
full = full.drop(columns = ['team1', 'team2', 'qbelo_prob1', 'qbelo_prob2'])

# sort by win_prob
full = full.sort_values(by = ['winner_wprob'], ascending = False).reset_index()

# rank win_prob values
for i, x in enumerate(full['index']):
    full['index'][i] = i
full.insert(6, "League Rank", full['index'] + 1)
full = full.drop(columns = ['index'])

# create new df without projected losing team's win probability
df = full.drop(columns = ['loser_wprob'])

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  full['index'][i] = i


In [1912]:
# create df with row for each team's win probability for every game

winners_df = full.drop(columns = ['Loser', 'League Rank', 'loser_wprob'])
losers_df = full.drop(columns = ['Winner', 'League Rank', 'winner_wprob'])

win_probs = pd.concat([winners_df, losers_df.rename(columns={'Loser':'Winner', 'loser_wprob':'winner_wprob'})], ignore_index = True).sort_values(by = ['winner_wprob'], ascending = False).reset_index(drop = True)

## NFL Win Probability Schedule

- NaN values are BYE weeks

In [1913]:
teams = win_probs.pivot(index = 'Week', columns = 'Winner', values = 'winner_wprob')
teams

Winner,ARI,ATL,BAL,BUF,CAR,CHI,CIN,CLE,DAL,DEN,...,NYG,NYJ,OAK,PHI,PIT,SEA,SF,TB,TEN,WSH
Week,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.342829,0.635255,0.601639,0.716585,0.649769,0.265996,0.467884,0.316648,0.26391,0.500131,...,0.499869,0.350231,0.398361,0.364745,0.283415,0.477755,0.656573,0.73609,0.657171,0.579076
2,0.555008,0.194827,0.513059,0.558531,0.444393,0.614756,0.385244,0.894027,0.417974,0.592035,...,0.324663,0.416195,0.338537,0.422411,0.661463,0.605687,0.577589,0.805173,0.394313,0.675337
3,0.568313,0.446112,0.724441,0.713431,0.674341,0.277051,0.313294,0.722949,0.716728,0.733134,...,0.553888,0.266866,0.507327,0.283272,0.686706,0.473493,0.506591,0.527483,0.628506,0.286569
4,0.31136,0.508863,0.538837,0.918735,0.304245,0.696082,0.652552,0.497538,0.695755,0.461163,...,0.292012,0.369723,0.331833,0.286361,0.299566,0.430041,0.569959,0.624768,0.630277,0.491137
5,0.503539,0.60334,0.695907,0.381268,0.603386,0.41094,0.341329,0.471356,0.695183,0.404777,...,0.304817,0.39666,0.58906,0.396614,0.595223,0.559114,0.496461,0.752099,0.646938,0.551868
6,0.309278,,0.683428,0.525902,0.483821,0.373374,0.509905,0.690722,0.441921,0.645887,...,0.410743,,0.354113,0.27333,0.53152,0.46848,,0.72667,0.474098,0.401714
7,0.843417,0.379785,0.77851,,0.415385,0.186333,0.22149,0.668551,,0.331449,...,0.584615,0.302469,0.635633,0.364367,,0.577292,0.67377,0.813667,0.436564,0.30241
8,0.425914,0.611138,,0.74384,0.388862,0.463377,0.475836,0.620933,0.43762,0.573096,...,0.183981,0.524164,,0.499611,0.379067,0.773848,0.536623,0.584316,0.470298,0.426904
9,0.343341,0.320598,0.721835,0.737241,0.490736,0.344505,0.3873,0.6127,0.611365,0.388635,...,0.534906,0.269847,0.465094,0.451205,0.655495,,0.656659,,0.371404,
10,0.653742,0.333572,0.525405,0.726936,0.346258,,,0.506339,0.666428,0.692752,...,,0.273064,0.327241,0.307248,0.731323,0.379327,0.543045,0.613821,0.586828,0.386179


## Win Probability Table for Each Game of NFL Season
- sorted by Win Probability

In [1914]:
# create column for 
df['Team Rank'] = win_probs.groupby(by = 'Winner')['winner_wprob'].rank("dense", ascending = False).astype('int')

# create column for 
df['Week Rank'] = win_probs.groupby(by = 'Week')['winner_wprob'].rank("dense", ascending=False).astype('int')

df = df[['Week', 'Winner', 'Loser', 'winner_wprob', 'Team Rank', 'Week Rank', 'League Rank']]
df

Unnamed: 0,Week,Winner,Loser,winner_wprob,Team Rank,Week Rank,League Rank
0,4,BUF,HOU,0.918735,1,1,1
1,2,CLE,HOU,0.894027,1,1,2
2,17,SF,HOU,0.882383,1,1,3
3,9,MIA,HOU,0.862837,1,1,4
4,11,TEN,HOU,0.862051,1,1,5
...,...,...,...,...,...,...,...
267,4,MIN,CLE,0.502462,9,16,268
268,16,MIN,LAR,0.502009,10,16,269
269,8,DET,PHI,0.500389,1,15,270
270,17,PIT,CLE,0.500265,9,16,271


## Highest Win Probability for Each Team

- Note: The Houston Texans are not favored in any game this season.  Their highest win probability for a single game is 40.74%.

In [1915]:
win_probs.groupby(by = 'Winner').max().sort_values(by = ['winner_wprob'], ascending = False).reset_index()[['Winner', 'winner_wprob']]

Unnamed: 0,Winner,winner_wprob
0,BUF,0.918735
1,CLE,0.894027
2,SF,0.882383
3,MIA,0.862837
4,TEN,0.862051
5,IND,0.855389
6,ARI,0.843417
7,GB,0.84139
8,TB,0.825118
9,LAR,0.821587


# Survivor Model

This model generates Survivor picks by identifying the team with the single greatest projected win probability for a future game, scheduling that pick, then eliminating that team and week from future consideration.  The process repeats for the next highest incidence of win probability for a future game until all Survivor slots are occupied.

The model can be re-calibrated as the season progresses by filling in your previous picks and expired game weeks.

Week 4 Example:
- expired_weeks = [1, 2, 3]
- expired_teams = ['CAR', 'CLE', 'DEN']

**As the season goes on, the model can be re-calibrated by inputting expired weeks and already chosen teams here:**

In [1916]:
expired_weeks = []
expired_teams = []

**Input double pick weeks here:**

In [1917]:
double_weeks = (5, 7, 10, 12, 15)

### Schedule Picks from Sorted Win Probabilities

In [1918]:
picks = []

for i, winner in df.iterrows():
    
    # skip Week 1 Thursday night game (already expired)
    if (df['Week'][i] == 1) & (df['Winner'][i] == 'TB'):
        continue
        
    # check if team already picked
    if (df['Winner'][i] in expired_teams):
        continue
        
    # pick 2 teams for double weeks (Weeks 5, 7, 10, 12, 15)
    if (df['Week'][i] in double_weeks):
        # check if Week has been picked 0 or 1 times
        count = expired_weeks.count(df['Week'][i])
        # if so, pick another game
        if count < 2:
            picks.append(dict(df.loc[i]))
            expired_weeks.append(df['Week'][i])
            expired_teams.append(df['Winner'][i])
            
    # pick 1 team for all other Weeks
    elif (df['Week'][i] not in expired_weeks):
        picks.append(dict(df.loc[i]))
        expired_weeks.append(df['Week'][i])
        expired_teams.append(df['Winner'][i])
        
    # end loop at 23 picks
    if len(picks) == 23:
        break
        
picks = pd.DataFrame(picks).sort_values(by = ['Week']).reset_index(drop = True)

### Identify Comparative Advantages

In [1919]:
for i, pick in picks.iterrows():
    
    # set Team Rank threshold
    if (picks['Team Rank'][i] > 1) or (picks['Week Rank'][i] > 1):
        
        curr_team = picks['Winner'][i]
        curr_week = picks['Week'][i]
        curr_wprob = picks['winner_wprob'][i]
        curr_wprob_lst = teams[[curr_team]].sort_values(by = curr_team, ascending = False).reset_index()
        
        best_product = 0
        # loop through current team's descending win probabilities down to currently selected win probability
        for index, row in curr_wprob_lst.iterrows():
            
            replacement_prob = row[curr_team]
            
            if replacement_prob <= curr_wprob:
                break
            
            # at each week's iteration, calculate product of current team's win probability 
            # and win probability of team scheduled to pe picked that week
            
            # find week of potential replacement
            replacement_week = curr_wprob_lst['Week'][0]
            
            # find team(s) scheduled at potential replacement week
            sched_teams = picks[picks['Week'] == replacement_week]['Winner'].to_list()
            
            # find wprobs of team(s) at potential replacement week
            sched_probs = picks[picks['Week'] == replacement_week]['winner_wprob'].to_list()
            
            # loop thru rows of scheduled pick(s) at potential replacement week
            for index2, game in picks[picks['Week'] == replacement_week].iterrows():
                
                # find currently scheduled product
                curr_product = curr_wprob * game['winner_wprob']
                
                # find potential replacement product
                replacement_product = replacement_prob * teams[game['Winner']][curr_week]
            
                if (replacement_product > curr_product) & (replacement_product > best_product):
                    best_product = replacement_product
                    switch_week = replacement_week
                    switch_team = game['Winner']
                    replacement_game = df[(df['Week'] == replacement_week) & (df['Winner'] == curr_team)].iloc[0]
                    filler_game = df[(df['Week'] == curr_week) & (df['Winner'] == switch_team)].iloc[0]
        
        # if greatest product is greater than currently scheduled product, 
        # switch the picks between the two teams on the schedule
        if best_product > 0:
            picks.iloc[i] = filler_game
            index = picks.loc[(picks['Week'] == switch_week) & (picks['Winner'] == switch_team)].index[0]
            picks.iloc[index] = replacement_game

# SURVIVOR PICK SCHEDULE

In [1920]:
picks

Unnamed: 0,Week,Winner,Loser,winner_wprob,Team Rank,Week Rank,League Rank
0,1,CAR,NYJ,0.649769,2,7,116
1,2,CLE,HOU,0.894027,1,1,2
2,3,DEN,NYJ,0.733134,2,2,49
3,4,BUF,HOU,0.918735,1,1,1
4,5,NE,HOU,0.746957,1,3,39
5,5,MIN,DET,0.759504,1,1,33
6,6,IND,HOU,0.855389,1,1,6
7,7,LAR,DET,0.821587,1,2,11
8,7,ARI,HOU,0.843417,1,1,7
9,8,KC,NYG,0.816019,1,1,12


### Mean Win Probability for All Picks

In [1921]:
np.mean(picks['winner_wprob'])

0.7619536631975512

### Compund Win Probability for All Picks
- probability of perfect record

In [1922]:
for i, prob in enumerate(picks['winner_wprob']):
    if i == 0:
        com_prob = prob
    else:
        com_prob = com_prob * prob
        
com_prob

0.0016287760080637727

### Future Adjustments

- adjust model to weigh sooner games more than later games to take the uncertainty of future win probability into account (injuries, trades, performance trends, etc.)