# NFL Gambling Project

## Business problem

#### Stakeholder: John Daniels
- non-technical stakeholder
- local gambling addict

#### True business problem: 
- predict whether the favorite in a NFL game wins or not
- explore factors that impact the result of the game

#### Deliverables: Inference or Prediction?
- Predition on if the favorite will win

#### Context:

- **False negative** Predicit favorite loses, but wins
    - **Outcome**: Lose money, by betting on underdog in this situation
- **False positive** preidcits favorite wins, but loses
    - **Outcome**: Lose money, by betting on favorite in this situation
    
Would prefer reducing fasle positives over false negative?
- equally would like to reduce false positives and false negatives.

### Evaluation Metric
Which metric would make sense to primarily use as we evaluate our models?

- Accuracy - balances the two kinds od errors (but is impractical with imbalanced targets)
- Precision - helps reduce false postives 
- Recall - helps reduce false negatives
- F1-Score - balances recall & precision (and is better than accuracy with imbalanced targets)
- ROC-AUC - helps focus better on probability outputs (makes sure our predicited probabilities are better)

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score
from sklearn.metrics import plot_confusion_matrix
from sklearn.metrics import roc_auc_score, plot_roc_curve

In [2]:
!ls ../Tom

NFL Gambling Modeling.ipynb          nfl_teams.csv
NFL Gambling Pre-Modeling Work.ipynb pi.csv
Untitled.ipynb                       spreadspoke.R
nfl_stadiums.csv                     spreadspoke_scores.csv


In [3]:
df = pd.read_csv('cleaned_nfl_data.csv')
df = df.drop(columns = ['Unnamed: 0'])
df

Unnamed: 0,schedule_season,team_home,team_away,team_favorite_id,spread_favorite,over_under_line,team_home_id,team_away_id,favorite_win,favorite_home,fav_elev_change,fav_temp_dif,fav_humidity_diff,fav_wind_diff
0,1979,Tampa Bay Buccaneers,Detroit Lions,TB,-3.0,30.0,TB,DET,True,True,0.0,0.000000,0.000000,0.000000
1,1979,Washington Redskins,Houston Oilers,TEN,-4.0,33.0,WAS,TEN,True,False,-1.8,8.000000,6.000000,11.000000
2,1979,St. Louis Cardinals,Dallas Cowboys,DAL,-4.0,37.0,ARI,DAL,True,False,25.6,-1.470588,13.020408,-1.102941
3,1979,Seattle Seahawks,San Diego Chargers,SEA,-2.0,42.5,SEA,LAC,False,True,0.0,0.000000,0.000000,0.000000
4,1979,New York Jets,Cleveland Browns,NYJ,-2.0,41.0,NYJ,CLE,False,True,0.0,0.000000,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10414,2021,Tennessee Titans,Cincinnati Bengals,TEN,-4.0,48.5,TEN,CIN,False,True,0.0,0.000000,0.000000,0.000000
10415,2021,Kansas City Chiefs,Buffalo Bills,KC,-2.5,54.0,KC,BUF,True,True,0.0,0.000000,0.000000,0.000000
10416,2021,Tampa Bay Buccaneers,Los Angeles Rams,TB,-3.0,48.0,TB,LAR,False,True,0.0,0.000000,0.000000,0.000000
10417,2021,Kansas City Chiefs,Cincinnati Bengals,KC,-7.0,54.5,KC,CIN,False,True,0.0,0.000000,0.000000,0.000000


In [4]:
df['team_away'].unique()

array(['Detroit Lions', 'Houston Oilers', 'Dallas Cowboys',
       'San Diego Chargers', 'Cleveland Browns', 'Atlanta Falcons',
       'New York Giants', 'Oakland Raiders', 'Baltimore Colts',
       'Cincinnati Bengals', 'Green Bay Packers', 'Miami Dolphins',
       'San Francisco 49ers', 'Pittsburgh Steelers', 'Los Angeles Rams',
       'St. Louis Cardinals', 'New York Jets', 'New Orleans Saints',
       'Minnesota Vikings', 'Seattle Seahawks', 'Buffalo Bills',
       'Philadelphia Eagles', 'Tampa Bay Buccaneers', 'Chicago Bears',
       'New England Patriots', 'Denver Broncos', 'Kansas City Chiefs',
       'Washington Redskins', 'Los Angeles Raiders', 'Indianapolis Colts',
       'Phoenix Cardinals', 'Arizona Cardinals', 'St. Louis Rams',
       'Carolina Panthers', 'Jacksonville Jaguars', 'Baltimore Ravens',
       'Tennessee Oilers', 'Tennessee Titans', 'Houston Texans',
       'Los Angeles Chargers', 'Las Vegas Raiders',
       'Washington Football Team'], dtype=object)

In [5]:
df_1 = pd.read_csv('../Tom/pi.csv')
df_1

Unnamed: 0.1,Unnamed: 0,Rank,Team,Rating,v 1-5,v 6-10,v 11-16,Hi,Low,Last,Year
0,0,1,LA Rams (16-5),38.7,4-2,0-2,4-0,1,19,1,2022
1,1,2,San Francisco (12-8),35.7,3-1,2-2,1-3,2,26,2,2022
2,2,3,Kansas City (14-6),33.9,0-2,3-2,4-0,1,19,4,2022
3,3,4,Cincinnati (13-8),32.8,2-2,1-1,5-0,3,28,3,2022
4,4,5,Tampa Bay (14-5),30.8,0-2,3-0,1-2,1,15,5,2022
...,...,...,...,...,...,...,...,...,...,...,...
603,603,28,Washington (5-11),16.9,1-3,0-2,2-2,8,31,28,2004
604,604,29,LA Chargers (4-12),16.1,0-0,0-4,0-4,25,32,29,2004
605,605,30,Arizona (4-12),15.1,0-1,1-2,0-4,22,32,30,2004
606,606,31,Las Vegas (4-12),14.3,0-1,0-3,1-2,2,32,31,2004


In [6]:
df_1 = df_1.drop(columns = ['Rating','v 1-5','v 6-10','v 11-16','Hi','Low','Last','Unnamed: 0'])

In [7]:
df['team_home'].str.split(" ").str[:-1].str.join(" ").unique()

array(['Tampa Bay', 'Washington', 'St. Louis', 'Seattle', 'New York',
       'New Orleans', 'Philadelphia', 'Los Angeles', 'Kansas City',
       'Denver', 'Chicago', 'Buffalo', 'Minnesota', 'New England',
       'San Francisco', 'Pittsburgh', 'San Diego', 'Green Bay', 'Miami',
       'Dallas', 'Cleveland', 'Cincinnati', 'Atlanta', 'Houston',
       'Detroit', 'Oakland', 'Baltimore', 'Indianapolis', 'Phoenix',
       'Arizona', 'Jacksonville', 'Carolina', 'Tennessee', 'Las Vegas',
       'Washington Football'], dtype=object)

In [8]:
df['team_home'].str.split(" ").str[-1].unique()

array(['Buccaneers', 'Redskins', 'Cardinals', 'Seahawks', 'Jets',
       'Saints', 'Eagles', 'Rams', 'Chiefs', 'Broncos', 'Bears', 'Bills',
       'Vikings', 'Patriots', 'Giants', '49ers', 'Steelers', 'Chargers',
       'Packers', 'Dolphins', 'Cowboys', 'Browns', 'Bengals', 'Falcons',
       'Oilers', 'Lions', 'Raiders', 'Colts', 'Jaguars', 'Panthers',
       'Ravens', 'Titans', 'Texans', 'Team'], dtype=object)

In [9]:
{" ".join(team.split(" ")[:-1]):team for team in df['team_home']}
#     team[:-1].join(" ") : team[-1]

{'Tampa Bay': 'Tampa Bay Buccaneers',
 'Washington': 'Washington Redskins',
 'St. Louis': 'St. Louis Rams',
 'Seattle': 'Seattle Seahawks',
 'New York': 'New York Giants',
 'New Orleans': 'New Orleans Saints',
 'Philadelphia': 'Philadelphia Eagles',
 'Los Angeles': 'Los Angeles Rams',
 'Kansas City': 'Kansas City Chiefs',
 'Denver': 'Denver Broncos',
 'Chicago': 'Chicago Bears',
 'Buffalo': 'Buffalo Bills',
 'Minnesota': 'Minnesota Vikings',
 'New England': 'New England Patriots',
 'San Francisco': 'San Francisco 49ers',
 'Pittsburgh': 'Pittsburgh Steelers',
 'San Diego': 'San Diego Chargers',
 'Green Bay': 'Green Bay Packers',
 'Miami': 'Miami Dolphins',
 'Dallas': 'Dallas Cowboys',
 'Cleveland': 'Cleveland Browns',
 'Cincinnati': 'Cincinnati Bengals',
 'Atlanta': 'Atlanta Falcons',
 'Houston': 'Houston Texans',
 'Detroit': 'Detroit Lions',
 'Oakland': 'Oakland Raiders',
 'Baltimore': 'Baltimore Ravens',
 'Indianapolis': 'Indianapolis Colts',
 'Phoenix': 'Phoenix Cardinals',
 'Arizona

In [10]:
df_1['Team'].unique()

array(['LA Rams (16-5)', 'San Francisco (12-8)', 'Kansas City (14-6)',
       'Cincinnati (13-8)', 'Tampa Bay (14-5)', 'Green Bay (13-5)',
       'Buffalo (12-7)', 'Tennessee (12-6)', 'Dallas (12-6)',
       'Miami (9-8)', 'Seattle (7-10)', 'Pittsburgh (9-8-1)',
       'Las Vegas (10-8)', 'Minnesota (8-9)', 'New Orleans (9-8)',
       'Indianapolis (9-8)', 'New England (10-8)', 'Arizona (11-7)',
       'Cleveland (8-9)', 'LA Chargers (9-8)', 'Philadelphia (9-9)',
       'Baltimore (8-9)', 'Washington (7-10)', 'Detroit (3-13-1)',
       'Chicago (6-11)', 'Denver (7-10)', 'Atlanta (7-10)',
       'Houston (4-13)', 'NY Jets (4-13)', 'Jacksonville (3-14)',
       'Carolina (5-12)', 'NY Giants (4-13)', 'Tampa Bay (15-5)',
       'Kansas City (16-3)', 'Buffalo (15-4)', 'Green Bay (14-4)',
       'New Orleans (13-5)', 'Miami (10-6)', 'Baltimore (12-6)',
       'LA Chargers (7-9)', 'Seattle (12-5)', 'LA Rams (11-7)',
       'Cleveland (12-6)', 'Indianapolis (11-6)', 'Las Vegas (8-8)',
       '

In [11]:
map_1 = {" ".join(team.split(" ")[:-1]):team for team in df['team_home']}

In [12]:
map_1['Washington'] = map_1['Washington Football']
del map_1['Washington Football']

In [13]:
map_1

{'Tampa Bay': 'Tampa Bay Buccaneers',
 'Washington': 'Washington Football Team',
 'St. Louis': 'St. Louis Rams',
 'Seattle': 'Seattle Seahawks',
 'New York': 'New York Giants',
 'New Orleans': 'New Orleans Saints',
 'Philadelphia': 'Philadelphia Eagles',
 'Los Angeles': 'Los Angeles Rams',
 'Kansas City': 'Kansas City Chiefs',
 'Denver': 'Denver Broncos',
 'Chicago': 'Chicago Bears',
 'Buffalo': 'Buffalo Bills',
 'Minnesota': 'Minnesota Vikings',
 'New England': 'New England Patriots',
 'San Francisco': 'San Francisco 49ers',
 'Pittsburgh': 'Pittsburgh Steelers',
 'San Diego': 'San Diego Chargers',
 'Green Bay': 'Green Bay Packers',
 'Miami': 'Miami Dolphins',
 'Dallas': 'Dallas Cowboys',
 'Cleveland': 'Cleveland Browns',
 'Cincinnati': 'Cincinnati Bengals',
 'Atlanta': 'Atlanta Falcons',
 'Houston': 'Houston Texans',
 'Detroit': 'Detroit Lions',
 'Oakland': 'Oakland Raiders',
 'Baltimore': 'Baltimore Ravens',
 'Indianapolis': 'Indianapolis Colts',
 'Phoenix': 'Phoenix Cardinals',
 'Ar

In [14]:
df_1['Team'] = df_1['Team'].apply(lambda t: t.split('(')[0].strip())
df_1

Unnamed: 0,Rank,Team,Year
0,1,LA Rams,2022
1,2,San Francisco,2022
2,3,Kansas City,2022
3,4,Cincinnati,2022
4,5,Tampa Bay,2022
...,...,...,...
603,28,Washington,2004
604,29,LA Chargers,2004
605,30,Arizona,2004
606,31,Las Vegas,2004


In [15]:
list(df_1['Team'].sort_values().unique())

['Arizona',
 'Atlanta',
 'Baltimore',
 'Buffalo',
 'Carolina',
 'Chicago',
 'Cincinnati',
 'Cleveland',
 'Dallas',
 'Denver',
 'Detroit',
 'Green Bay',
 'Houston',
 'Indianapolis',
 'Jacksonville',
 'Kansas City',
 'LA Chargers',
 'LA Rams',
 'Las Vegas',
 'Miami',
 'Minnesota',
 'NY Giants',
 'NY Jets',
 'New England',
 'New Orleans',
 'Philadelphia',
 'Pittsburgh',
 'San Francisco',
 'Seattle',
 'Tampa Bay',
 'Tennessee',
 'Washington']

In [16]:
list(df.loc[df['schedule_season'] >=2003]['team_home'].sort_values().unique())

['Arizona Cardinals',
 'Atlanta Falcons',
 'Baltimore Ravens',
 'Buffalo Bills',
 'Carolina Panthers',
 'Chicago Bears',
 'Cincinnati Bengals',
 'Cleveland Browns',
 'Dallas Cowboys',
 'Denver Broncos',
 'Detroit Lions',
 'Green Bay Packers',
 'Houston Texans',
 'Indianapolis Colts',
 'Jacksonville Jaguars',
 'Kansas City Chiefs',
 'Las Vegas Raiders',
 'Los Angeles Chargers',
 'Los Angeles Rams',
 'Miami Dolphins',
 'Minnesota Vikings',
 'New England Patriots',
 'New Orleans Saints',
 'New York Giants',
 'New York Jets',
 'Oakland Raiders',
 'Philadelphia Eagles',
 'Pittsburgh Steelers',
 'San Diego Chargers',
 'San Francisco 49ers',
 'Seattle Seahawks',
 'St. Louis Rams',
 'Tampa Bay Buccaneers',
 'Tennessee Titans',
 'Washington Football Team',
 'Washington Redskins']

In [17]:
map_2 = ({'LA Rams':"Los Angeles Rams | St. Louis Rams",
          'LA Chargers':"Los Angeles Chargers | San Diego Chargers",
          'NY Giants':'New York Giants',
          'NY Jets':'New York Jets', 
          'Las Vegas': "Las Vegas Raiders | Oakland Raiders",
          'Washington': "Washington Football Team | Washington Redskins"})

In [18]:
map_1

{'Tampa Bay': 'Tampa Bay Buccaneers',
 'Washington': 'Washington Football Team',
 'St. Louis': 'St. Louis Rams',
 'Seattle': 'Seattle Seahawks',
 'New York': 'New York Giants',
 'New Orleans': 'New Orleans Saints',
 'Philadelphia': 'Philadelphia Eagles',
 'Los Angeles': 'Los Angeles Rams',
 'Kansas City': 'Kansas City Chiefs',
 'Denver': 'Denver Broncos',
 'Chicago': 'Chicago Bears',
 'Buffalo': 'Buffalo Bills',
 'Minnesota': 'Minnesota Vikings',
 'New England': 'New England Patriots',
 'San Francisco': 'San Francisco 49ers',
 'Pittsburgh': 'Pittsburgh Steelers',
 'San Diego': 'San Diego Chargers',
 'Green Bay': 'Green Bay Packers',
 'Miami': 'Miami Dolphins',
 'Dallas': 'Dallas Cowboys',
 'Cleveland': 'Cleveland Browns',
 'Cincinnati': 'Cincinnati Bengals',
 'Atlanta': 'Atlanta Falcons',
 'Houston': 'Houston Texans',
 'Detroit': 'Detroit Lions',
 'Oakland': 'Oakland Raiders',
 'Baltimore': 'Baltimore Ravens',
 'Indianapolis': 'Indianapolis Colts',
 'Phoenix': 'Phoenix Cardinals',
 'Ar

In [19]:
del map_1['Washington']
del map_1['St. Louis']
del map_1['New York']
del map_1['Los Angeles']
del map_1['San Diego']
del map_1['Oakland']
del map_1['Las Vegas']

In [20]:
map_1

{'Tampa Bay': 'Tampa Bay Buccaneers',
 'Seattle': 'Seattle Seahawks',
 'New Orleans': 'New Orleans Saints',
 'Philadelphia': 'Philadelphia Eagles',
 'Kansas City': 'Kansas City Chiefs',
 'Denver': 'Denver Broncos',
 'Chicago': 'Chicago Bears',
 'Buffalo': 'Buffalo Bills',
 'Minnesota': 'Minnesota Vikings',
 'New England': 'New England Patriots',
 'San Francisco': 'San Francisco 49ers',
 'Pittsburgh': 'Pittsburgh Steelers',
 'Green Bay': 'Green Bay Packers',
 'Miami': 'Miami Dolphins',
 'Dallas': 'Dallas Cowboys',
 'Cleveland': 'Cleveland Browns',
 'Cincinnati': 'Cincinnati Bengals',
 'Atlanta': 'Atlanta Falcons',
 'Houston': 'Houston Texans',
 'Detroit': 'Detroit Lions',
 'Baltimore': 'Baltimore Ravens',
 'Indianapolis': 'Indianapolis Colts',
 'Phoenix': 'Phoenix Cardinals',
 'Arizona': 'Arizona Cardinals',
 'Jacksonville': 'Jacksonville Jaguars',
 'Carolina': 'Carolina Panthers',
 'Tennessee': 'Tennessee Titans'}

In [21]:
df.loc[df['schedule_season'] >= 2004]['team_home'].unique()

array(['New England Patriots', 'Miami Dolphins', 'Washington Redskins',
       'St. Louis Rams', 'San Francisco 49ers', 'Pittsburgh Steelers',
       'Philadelphia Eagles', 'New Orleans Saints', 'New York Jets',
       'Houston Texans', 'Denver Broncos', 'Cleveland Browns',
       'Chicago Bears', 'Buffalo Bills', 'Minnesota Vikings',
       'Carolina Panthers', 'Tampa Bay Buccaneers', 'San Diego Chargers',
       'Oakland Raiders', 'New York Giants', 'Kansas City Chiefs',
       'Jacksonville Jaguars', 'Tennessee Titans', 'Detroit Lions',
       'Dallas Cowboys', 'Cincinnati Bengals', 'Baltimore Ravens',
       'Atlanta Falcons', 'Arizona Cardinals', 'Green Bay Packers',
       'Seattle Seahawks', 'Indianapolis Colts', 'Los Angeles Rams',
       'Los Angeles Chargers', 'Las Vegas Raiders',
       'Washington Football Team'], dtype=object)

In [22]:
map_2

{'LA Rams': 'Los Angeles Rams | St. Louis Rams',
 'LA Chargers': 'Los Angeles Chargers | San Diego Chargers',
 'NY Giants': 'New York Giants',
 'NY Jets': 'New York Jets',
 'Las Vegas': 'Las Vegas Raiders | Oakland Raiders',
 'Washington': 'Washington Football Team | Washington Redskins'}

In [23]:
df_1['Team'].map(map_1)

0                       NaN
1       San Francisco 49ers
2        Kansas City Chiefs
3        Cincinnati Bengals
4      Tampa Bay Buccaneers
               ...         
603                     NaN
604                     NaN
605       Arizona Cardinals
606                     NaN
607                     NaN
Name: Team, Length: 608, dtype: object

In [24]:
df_1['Team_Clean'] = df_1['Team'].replace(map_1)

In [25]:
df_1['Team_Clean'] = df_1['Team_Clean'].replace(map_2)

In [26]:
df_1['Team_Clean']

0                   Los Angeles Rams | St. Louis Rams
1                                 San Francisco 49ers
2                                  Kansas City Chiefs
3                                  Cincinnati Bengals
4                                Tampa Bay Buccaneers
                            ...                      
603    Washington Football Team | Washington Redskins
604         Los Angeles Chargers | San Diego Chargers
605                                 Arizona Cardinals
606               Las Vegas Raiders | Oakland Raiders
607                                   New York Giants
Name: Team_Clean, Length: 608, dtype: object

In [27]:
df_1['Team_Clean'] = df_1['Team_Clean'].str.split('|')

In [28]:
df_1 = df_1.explode(column='Team_Clean')

In [29]:
df_1['Team_Clean'] = df_1['Team_Clean'].str.strip()

In [30]:
df_1 = df_1.rename(columns = {'Team_Clean':'team_home','Year':'schedule_season','Rank':'rank_home'})
df_1

Unnamed: 0,rank_home,Team,schedule_season,team_home
0,1,LA Rams,2022,Los Angeles Rams
0,1,LA Rams,2022,St. Louis Rams
1,2,San Francisco,2022,San Francisco 49ers
2,3,Kansas City,2022,Kansas City Chiefs
3,4,Cincinnati,2022,Cincinnati Bengals
...,...,...,...,...
604,29,LA Chargers,2004,San Diego Chargers
605,30,Arizona,2004,Arizona Cardinals
606,31,Las Vegas,2004,Las Vegas Raiders
606,31,Las Vegas,2004,Oakland Raiders


In [31]:
df_1 = df_1.drop(columns = 'Team')

In [32]:
df_1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 684 entries, 0 to 607
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   rank_home        684 non-null    int64 
 1   schedule_season  684 non-null    int64 
 2   team_home        684 non-null    object
dtypes: int64(2), object(1)
memory usage: 21.4+ KB


In [33]:
df_g = pd.merge(df, df_1,  how='left', on = ['team_home','schedule_season'])
df_g

Unnamed: 0,schedule_season,team_home,team_away,team_favorite_id,spread_favorite,over_under_line,team_home_id,team_away_id,favorite_win,favorite_home,fav_elev_change,fav_temp_dif,fav_humidity_diff,fav_wind_diff,rank_home
0,1979,Tampa Bay Buccaneers,Detroit Lions,TB,-3.0,30.0,TB,DET,True,True,0.0,0.000000,0.000000,0.000000,
1,1979,Washington Redskins,Houston Oilers,TEN,-4.0,33.0,WAS,TEN,True,False,-1.8,8.000000,6.000000,11.000000,
2,1979,St. Louis Cardinals,Dallas Cowboys,DAL,-4.0,37.0,ARI,DAL,True,False,25.6,-1.470588,13.020408,-1.102941,
3,1979,Seattle Seahawks,San Diego Chargers,SEA,-2.0,42.5,SEA,LAC,False,True,0.0,0.000000,0.000000,0.000000,
4,1979,New York Jets,Cleveland Browns,NYJ,-2.0,41.0,NYJ,CLE,False,True,0.0,0.000000,0.000000,0.000000,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10414,2021,Tennessee Titans,Cincinnati Bengals,TEN,-4.0,48.5,TEN,CIN,False,True,0.0,0.000000,0.000000,0.000000,15.0
10415,2021,Kansas City Chiefs,Buffalo Bills,KC,-2.5,54.0,KC,BUF,True,True,0.0,0.000000,0.000000,0.000000,2.0
10416,2021,Tampa Bay Buccaneers,Los Angeles Rams,TB,-3.0,48.0,TB,LAR,False,True,0.0,0.000000,0.000000,0.000000,1.0
10417,2021,Kansas City Chiefs,Cincinnati Bengals,KC,-7.0,54.5,KC,CIN,False,True,0.0,0.000000,0.000000,0.000000,2.0


In [34]:
df_1 = df_1.rename(columns = {'team_home':'team_away','rank_home':'rank_away'})
df_1

Unnamed: 0,rank_away,schedule_season,team_away
0,1,2022,Los Angeles Rams
0,1,2022,St. Louis Rams
1,2,2022,San Francisco 49ers
2,3,2022,Kansas City Chiefs
3,4,2022,Cincinnati Bengals
...,...,...,...
604,29,2004,San Diego Chargers
605,30,2004,Arizona Cardinals
606,31,2004,Las Vegas Raiders
606,31,2004,Oakland Raiders


In [35]:
df_1.team_away = df_1.team_away.str.strip()

In [36]:
df_1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 684 entries, 0 to 607
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   rank_away        684 non-null    int64 
 1   schedule_season  684 non-null    int64 
 2   team_away        684 non-null    object
dtypes: int64(2), object(1)
memory usage: 21.4+ KB


In [37]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10419 entries, 0 to 10418
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   schedule_season    10419 non-null  int64  
 1   team_home          10419 non-null  object 
 2   team_away          10419 non-null  object 
 3   team_favorite_id   10419 non-null  object 
 4   spread_favorite    10419 non-null  float64
 5   over_under_line    10419 non-null  float64
 6   team_home_id       10419 non-null  object 
 7   team_away_id       10419 non-null  object 
 8   favorite_win       10419 non-null  bool   
 9   favorite_home      10419 non-null  bool   
 10  fav_elev_change    10419 non-null  float64
 11  fav_temp_dif       10419 non-null  float64
 12  fav_humidity_diff  10419 non-null  float64
 13  fav_wind_diff      10419 non-null  float64
dtypes: bool(2), float64(6), int64(1), object(5)
memory usage: 997.3+ KB


In [38]:
df_t = pd.merge(df_g, df_1,  how='left', on = ['team_away','schedule_season'])
df_t

Unnamed: 0,schedule_season,team_home,team_away,team_favorite_id,spread_favorite,over_under_line,team_home_id,team_away_id,favorite_win,favorite_home,fav_elev_change,fav_temp_dif,fav_humidity_diff,fav_wind_diff,rank_home,rank_away
0,1979,Tampa Bay Buccaneers,Detroit Lions,TB,-3.0,30.0,TB,DET,True,True,0.0,0.000000,0.000000,0.000000,,
1,1979,Washington Redskins,Houston Oilers,TEN,-4.0,33.0,WAS,TEN,True,False,-1.8,8.000000,6.000000,11.000000,,
2,1979,St. Louis Cardinals,Dallas Cowboys,DAL,-4.0,37.0,ARI,DAL,True,False,25.6,-1.470588,13.020408,-1.102941,,
3,1979,Seattle Seahawks,San Diego Chargers,SEA,-2.0,42.5,SEA,LAC,False,True,0.0,0.000000,0.000000,0.000000,,
4,1979,New York Jets,Cleveland Browns,NYJ,-2.0,41.0,NYJ,CLE,False,True,0.0,0.000000,0.000000,0.000000,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10414,2021,Tennessee Titans,Cincinnati Bengals,TEN,-4.0,48.5,TEN,CIN,False,True,0.0,0.000000,0.000000,0.000000,15.0,28.0
10415,2021,Kansas City Chiefs,Buffalo Bills,KC,-2.5,54.0,KC,BUF,True,True,0.0,0.000000,0.000000,0.000000,2.0,3.0
10416,2021,Tampa Bay Buccaneers,Los Angeles Rams,TB,-3.0,48.0,TB,LAR,False,True,0.0,0.000000,0.000000,0.000000,1.0,10.0
10417,2021,Kansas City Chiefs,Cincinnati Bengals,KC,-7.0,54.5,KC,CIN,False,True,0.0,0.000000,0.000000,0.000000,2.0,28.0


In [39]:
df_t = df_t.loc[df_t['schedule_season'] >= 2004]
df_t

Unnamed: 0,schedule_season,team_home,team_away,team_favorite_id,spread_favorite,over_under_line,team_home_id,team_away_id,favorite_win,favorite_home,fav_elev_change,fav_temp_dif,fav_humidity_diff,fav_wind_diff,rank_home,rank_away
5698,2004,New England Patriots,Indianapolis Colts,NE,-3.0,44.5,NE,IND,True,True,0.00,0.000000,0.00,0.000000,1.0,2.0
5699,2004,Miami Dolphins,Tennessee Titans,TEN,-3.0,38.0,MIA,TEN,True,False,-174.10,8.272727,14.84,8.060606,6.0,3.0
5700,2004,Washington Redskins,Tampa Bay Buccaneers,WAS,-2.5,38.5,WAS,TB,True,True,0.00,0.000000,0.00,0.000000,28.0,18.0
5701,2004,St. Louis Rams,Arizona Cardinals,LAR,-11.0,46.0,LAR,ARI,True,True,0.00,0.000000,0.00,0.000000,9.0,30.0
5702,2004,San Francisco 49ers,Atlanta Falcons,ATL,-3.0,44.5,SF,ATL,True,False,-317.64,-7.338028,11.75,18.366197,19.0,14.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10414,2021,Tennessee Titans,Cincinnati Bengals,TEN,-4.0,48.5,TEN,CIN,False,True,0.00,0.000000,0.00,0.000000,15.0,28.0
10415,2021,Kansas City Chiefs,Buffalo Bills,KC,-2.5,54.0,KC,BUF,True,True,0.00,0.000000,0.00,0.000000,2.0,3.0
10416,2021,Tampa Bay Buccaneers,Los Angeles Rams,TB,-3.0,48.0,TB,LAR,False,True,0.00,0.000000,0.00,0.000000,1.0,10.0
10417,2021,Kansas City Chiefs,Cincinnati Bengals,KC,-7.0,54.5,KC,CIN,False,True,0.00,0.000000,0.00,0.000000,2.0,28.0


In [40]:
home_team_id = list(df_t['team_home_id'].values)
fav_team_id = list(df_t['team_favorite_id'].values)
away_team_id = list(df_t['team_away_id'].values)

In [41]:
rank_home = list(df_t['rank_home'].values)
rank_away = list(df_t['rank_away'].values)

In [42]:
fav_rank_diff = []
for index in list(range(0,4721)):
    if home_team_id[index] == fav_team_id[index]:
        fav_rank_diff.append(rank_away[index] - rank_home[index])
    elif away_team_id[index] == fav_team_id[index]:
        fav_rank_diff.append(rank_home[index] - rank_away[index])

In [43]:
fav_rank_diff

[1.0,
 3.0,
 -10.0,
 21.0,
 5.0,
 5.0,
 27.0,
 -6.0,
 7.0,
 6.0,
 4.0,
 14.0,
 5.0,
 -6.0,
 -11.0,
 3.0,
 2.0,
 12.0,
 -10.0,
 4.0,
 9.0,
 -8.0,
 7.0,
 1.0,
 -2.0,
 16.0,
 -18.0,
 13.0,
 -5.0,
 29.0,
 13.0,
 17.0,
 12.0,
 1.0,
 3.0,
 -13.0,
 -5.0,
 -2.0,
 11.0,
 5.0,
 20.0,
 21.0,
 11.0,
 16.0,
 20.0,
 -17.0,
 10.0,
 10.0,
 26.0,
 -2.0,
 -11.0,
 13.0,
 -1.0,
 25.0,
 20.0,
 20.0,
 -8.0,
 15.0,
 10.0,
 -1.0,
 11.0,
 21.0,
 -4.0,
 1.0,
 29.0,
 5.0,
 8.0,
 4.0,
 1.0,
 14.0,
 11.0,
 -7.0,
 -4.0,
 15.0,
 20.0,
 -1.0,
 23.0,
 -12.0,
 15.0,
 2.0,
 -18.0,
 15.0,
 -3.0,
 8.0,
 3.0,
 -15.0,
 9.0,
 -3.0,
 2.0,
 -21.0,
 -7.0,
 16.0,
 -19.0,
 2.0,
 4.0,
 22.0,
 25.0,
 8.0,
 14.0,
 13.0,
 16.0,
 21.0,
 -12.0,
 2.0,
 25.0,
 8.0,
 10.0,
 21.0,
 -8.0,
 6.0,
 14.0,
 -1.0,
 9.0,
 10.0,
 -11.0,
 6.0,
 -8.0,
 3.0,
 -19.0,
 21.0,
 -12.0,
 24.0,
 15.0,
 27.0,
 4.0,
 14.0,
 3.0,
 20.0,
 4.0,
 -4.0,
 17.0,
 -7.0,
 -15.0,
 20.0,
 -2.0,
 10.0,
 21.0,
 15.0,
 1.0,
 4.0,
 2.0,
 6.0,
 23.0,
 2.0,
 18.0,
 2.0,
 3.0,


In [44]:
df_t['fav_rank_diff'] = fav_rank_diff
df_t

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_t['fav_rank_diff'] = fav_rank_diff


Unnamed: 0,schedule_season,team_home,team_away,team_favorite_id,spread_favorite,over_under_line,team_home_id,team_away_id,favorite_win,favorite_home,fav_elev_change,fav_temp_dif,fav_humidity_diff,fav_wind_diff,rank_home,rank_away,fav_rank_diff
5698,2004,New England Patriots,Indianapolis Colts,NE,-3.0,44.5,NE,IND,True,True,0.00,0.000000,0.00,0.000000,1.0,2.0,1.0
5699,2004,Miami Dolphins,Tennessee Titans,TEN,-3.0,38.0,MIA,TEN,True,False,-174.10,8.272727,14.84,8.060606,6.0,3.0,3.0
5700,2004,Washington Redskins,Tampa Bay Buccaneers,WAS,-2.5,38.5,WAS,TB,True,True,0.00,0.000000,0.00,0.000000,28.0,18.0,-10.0
5701,2004,St. Louis Rams,Arizona Cardinals,LAR,-11.0,46.0,LAR,ARI,True,True,0.00,0.000000,0.00,0.000000,9.0,30.0,21.0
5702,2004,San Francisco 49ers,Atlanta Falcons,ATL,-3.0,44.5,SF,ATL,True,False,-317.64,-7.338028,11.75,18.366197,19.0,14.0,5.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10414,2021,Tennessee Titans,Cincinnati Bengals,TEN,-4.0,48.5,TEN,CIN,False,True,0.00,0.000000,0.00,0.000000,15.0,28.0,13.0
10415,2021,Kansas City Chiefs,Buffalo Bills,KC,-2.5,54.0,KC,BUF,True,True,0.00,0.000000,0.00,0.000000,2.0,3.0,1.0
10416,2021,Tampa Bay Buccaneers,Los Angeles Rams,TB,-3.0,48.0,TB,LAR,False,True,0.00,0.000000,0.00,0.000000,1.0,10.0,9.0
10417,2021,Kansas City Chiefs,Cincinnati Bengals,KC,-7.0,54.5,KC,CIN,False,True,0.00,0.000000,0.00,0.000000,2.0,28.0,26.0


In [45]:
df_t.columns

Index(['schedule_season', 'team_home', 'team_away', 'team_favorite_id',
       'spread_favorite', 'over_under_line', 'team_home_id', 'team_away_id',
       'favorite_win', 'favorite_home', 'fav_elev_change', 'fav_temp_dif',
       'fav_humidity_diff', 'fav_wind_diff', 'rank_home', 'rank_away',
       'fav_rank_diff'],
      dtype='object')

In [48]:
df_f = df_t.drop(columns = ['schedule_season','team_home','team_away',
                     'team_favorite_id','team_home_id','team_away_id',
                     'rank_home','rank_away'])
df_f

Unnamed: 0,spread_favorite,over_under_line,favorite_win,favorite_home,fav_elev_change,fav_temp_dif,fav_humidity_diff,fav_wind_diff,fav_rank_diff
5698,-3.0,44.5,True,True,0.00,0.000000,0.00,0.000000,1.0
5699,-3.0,38.0,True,False,-174.10,8.272727,14.84,8.060606,3.0
5700,-2.5,38.5,True,True,0.00,0.000000,0.00,0.000000,-10.0
5701,-11.0,46.0,True,True,0.00,0.000000,0.00,0.000000,21.0
5702,-3.0,44.5,True,False,-317.64,-7.338028,11.75,18.366197,5.0
...,...,...,...,...,...,...,...,...,...
10414,-4.0,48.5,False,True,0.00,0.000000,0.00,0.000000,13.0
10415,-2.5,54.0,True,True,0.00,0.000000,0.00,0.000000,1.0
10416,-3.0,48.0,False,True,0.00,0.000000,0.00,0.000000,9.0
10417,-7.0,54.5,False,True,0.00,0.000000,0.00,0.000000,26.0


In [49]:
#df_f.to_csv('final_nfl_data.csv')