# General Assembly DSI - Denver 2018
## Capstone Project - DFS Model
This is my capstone project at General Assembly's fifth [Data Science Immersive](https://generalassemb.ly/education/data-science-immersive) cohort in 2018. I am developing a model to assist in optimizing NFL lineups on the daily fantasy sports platforms [Draft Kings](https://www.draftkings.com/) and [Fan Duel](https://www.fanduel.com/).

### Problem Statement

Can we build a model to predict a football player’s fantasy football performance to estimate their value and implement the model in conjunction with a daily fantasy strategy to be profitable?

### Gathering & Cleaning Game Info
- [NFL Weather Data from 2011 to 2017](https://www.kaggle.com/tobycrabtree/nfl-scores-and-betting-data#spreadspoke_scores.csv) | Kaggle
- [NFL Betting Data from 2011 to 2017](https://www.kaggle.com/tobycrabtree/nfl-scores-and-betting-data#spreadspoke_scores.csv) | Kaggle

In [1]:
import pandas as pd

In [2]:
game_info = pd.read_csv('../data/spreadspoke_scores.csv') # Spreads and Weather

In [3]:
# Need to drop all data before 2011
game_info = game_info[game_info['schedule_season'] >= 2011]
# And drop all data after 2017
game_info = game_info[game_info['schedule_season'] != 2018]

In [4]:
game_info.shape

(1869, 17)

In [5]:
game_info.dtypes

schedule_date           object
schedule_season          int64
schedule_week           object
team_home               object
team_away               object
stadium                 object
team_favorite_id        object
spread_favorite        float64
over_under_line         object
weather_detail          object
weather_temperature    float64
weather_wind_mph       float64
weather_humidity        object
score_home             float64
score_away             float64
stadium_neutral           bool
schedule_playoff          bool
dtype: object

In [6]:
game_info.isnull().sum().sort_values(ascending = False)

weather_humidity       1358
weather_detail         1318
weather_wind_mph         31
weather_temperature      31
schedule_playoff          0
stadium                   0
schedule_season           0
schedule_week             0
team_home                 0
team_away                 0
over_under_line           0
team_favorite_id          0
spread_favorite           0
stadium_neutral           0
score_home                0
score_away                0
schedule_date             0
dtype: int64

In [7]:
game_info[game_info['weather_temperature'].isnull()].head()

Unnamed: 0,schedule_date,schedule_season,schedule_week,team_home,team_away,stadium,team_favorite_id,spread_favorite,over_under_line,weather_detail,weather_temperature,weather_wind_mph,weather_humidity,score_home,score_away,stadium_neutral,schedule_playoff
10531,01/07/2012,2011,Wildcard,Houston Texans,Cincinnati Bengals,Reliant Stadium,HOU,-4.0,38,,,,,31.0,10.0,False,True
10532,01/07/2012,2011,Wildcard,New Orleans Saints,Detroit Lions,Louisiana Superdome,NO,-10.5,60,,,,,45.0,28.0,False,True
10533,01/08/2012,2011,Wildcard,Denver Broncos,Pittsburgh Steelers,Sports Authority Field at Mile High,PIT,-7.5,34,,,,,29.0,23.0,False,True
10534,01/08/2012,2011,Wildcard,New York Giants,Atlanta Falcons,MetLife Stadium,NYG,-3.0,48,,,,,24.0,2.0,False,True
10535,01/14/2012,2011,Division,New England Patriots,Denver Broncos,Gillette Stadium,NE,-13.5,50,,,,,45.0,10.0,False,True


In [8]:
game_info['schedule_week'].value_counts()

3             112
2             112
14            112
17            112
16            112
15            112
13            111
1             111
12            109
4             105
7             100
6             100
10             99
11             99
5              98
8              95
9              93
Division       28
Wildcard       24
Conference     14
Superbowl       6
WildCard        4
SuperBowl       1
Name: schedule_week, dtype: int64

> Need to drop all data for playoff games. Only need weeks 1 through 17.

In [26]:
game_info = game_info[game_info['schedule_week'] != 'Division']
game_info = game_info[game_info['schedule_week'] != 'Wildcard']
game_info = game_info[game_info['schedule_week'] != 'Conference']
game_info = game_info[game_info['schedule_week'] != 'Superbowl']
game_info = game_info[game_info['schedule_week'] != 'WildCard']
game_info = game_info[game_info['schedule_week'] != 'SuperBowl']

game_info['schedule_week'].value_counts()

2     112
3     112
14    112
17    112
15    112
16    112
1     111
13    111
12    109
4     105
7     100
6     100
10     99
11     99
5      98
8      95
9      93
Name: schedule_week, dtype: int64

In [38]:
game_info.isnull().sum().sort_values(ascending = False).head()

weather_detail      1256
schedule_playoff       0
stadium_neutral        0
score_away             0
score_home             0
dtype: int64

In [41]:
game_info['weather_detail'].value_counts()

DOME                    415
Rain                     72
Fog                      19
Rain | Fog               14
Snow                     11
Snow | Fog                4
Snow | Freezing Rain      1
Name: weather_detail, dtype: int64

In [36]:
game_info.drop('weather_humidity', axis = 1, inplace = True)

In [37]:
# I'm going to assume that games with no weather detail were simply just sunny
# I'm not going to fuss with cloudy, partly cloudy, etc.; it's all sunny
game_info.isnull().sum().sort_values(ascending = False).head()

weather_detail      1256
schedule_playoff       0
stadium_neutral        0
score_away             0
score_home             0
dtype: int64

In [43]:
game_info.fillna('Sunny', inplace = True)

In [45]:
game_info.to_csv('../data/game_information.csv', index = False)