# General Assembly DSI - Denver 2018
## Capstone Project - DFS Model
This is my capstone project at General Assembly's fifth [Data Science Immersive](https://generalassemb.ly/education/data-science-immersive) cohort in 2018. I am developing a model to assist in optimizing NFL lineups on the daily fantasy sports platforms [Draft Kings](https://www.draftkings.com/) and [Fan Duel](https://www.fanduel.com/).

### Problem Statement

Can we build a model to predict a football player’s fantasy football performance to estimate their value and implement the model in conjunction with a daily fantasy strategy to be profitable?

### FanDuel Data Gathering and Cleaning

I will be collecting various information about NFL games including:
- Fanduel (thanks to [this guy](https://github.com/rogerfitz/tutorials/tree/master/draft-kings-history-scrape))
    - [Fanduel Points & Pricing History from 2011 to 2017](http://rotoguru1.com/cgi-bin/fyday.pl?week=1&year=2011&game=fd) | Scrape

In [1]:
import pandas as pd
import numpy as np

# the following imports were used to scrape fanduel & draftkings data
# when they run they create a 'pycache' folder in my repo, so I have commented them out since gathering the data
# import scraper
# import io 

### Gathering

#### Scraping Fan Duel

In [2]:
# Scraping Fan Duel Data
# Don't want to run this by accident

# url = 'http://rotoguru1.com/cgi-bin/fyday.pl?game=fd&scsv=1&week=WEEK&year=YEAR'
# weeks = list(map(str, range(1,18)))
# years = list(map(str, range(2011, 2018)))

# fanduel = pd.DataFrame()

# for yr in years:
#     for wk in weeks:
#         soup=scraper.soup(url.replace('WEEK',wk).replace('YEAR',yr))
#         fanduel=pd.concat([fanduel,pd.read_csv(io.StringIO(soup.find("pre").text),sep=";")])
        
# fanduel

# fanduel.to_csv('../data/fanduel.csv', index = False)

### Cleaning

In [3]:
# Read in Data
fanduel = pd.read_csv('../data/fanduel.csv') # Fan Duel | Player, Salary, Points, 2011 to 2017

In [4]:
print(fanduel.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50954 entries, 0 to 50953
Data columns (total 10 columns):
Week         50954 non-null int64
Year         50954 non-null int64
GID          50954 non-null object
Name         50951 non-null object
Pos          50954 non-null object
Team         50951 non-null object
h/a          50954 non-null object
Oppt         50951 non-null object
FD points    50954 non-null float64
FD salary    50878 non-null float64
dtypes: float64(2), int64(2), object(6)
memory usage: 3.9+ MB
None


In [5]:
print(fanduel.isnull().sum())

Week          0
Year          0
GID           0
Name          3
Pos           0
Team          3
h/a           0
Oppt          3
FD points     0
FD salary    76
dtype: int64


> I don't think you can really impute these values... I'm going to drop them.

In [6]:
fanduel.dropna(inplace = True)

> Now going to format player names and teams like the dataframes above for joining

In [7]:
fanduel.head()

Unnamed: 0,Week,Year,GID,Name,Pos,Team,h/a,Oppt,FD points,FD salary
0,1,2011,1131,"Brady, Tom",QB,nwe,a,mia,35.98,9200.0
1,1,2011,1309,"Henne, Chad",QB,mia,h,nwe,35.54,6800.0
2,1,2011,1378,"Newton, Cam",QB,car,a,ari,31.68,6700.0
3,1,2011,1151,"Brees, Drew",QB,nor,a,gnb,29.06,8900.0
4,1,2011,1242,"Fitzpatrick, Ryan",QB,buf,a,kan,24.62,7900.0


In [8]:
len(fanduel['Team'].unique())

34

> It appears there will be a difference between this and my defensive data which already change the SD Chargers to LAC and the STL Rams to LA

In [9]:
fanduel.isnull().sum()

Week         0
Year         0
GID          0
Name         0
Pos          0
Team         0
h/a          0
Oppt         0
FD points    0
FD salary    0
dtype: int64

In [10]:
fanduel.to_csv('../data/fanduel.csv', index = False)