# Draft Kings Data Wrangling

## Import relevant packages


In [1]:
import pandas as pd
import datetime as dt
import numpy as np
import json
import pickle
import time

## Clean DraftKings data

After scraping all the DraftKings data into a dictionary of dataframes (see Data Acquisition), there is still a lot of work to be done to clean it up. There were a few steps to consider going into the cleaning process just from looking at one of the pages of rotoguru (http://rotoguru1.com/cgi-bin/hyday.pl?game=dk&mon=2&day=10&year=2018).

- Get rid of all the dates that returned errors in the scraping process

- Remove all of the dates that didn't have any games played, therefore no data is available

- Rotoworld splits the data tables into positions with a row of unnessecary text 

- Columns may be missaligned as there is no column name for the names of the players

- The player names have '^' in them to represent starters

- Many columns containing values are strings instead of integers or floats

- Opponents column has 'vs.' or '@' 

- Make sure team abbreviations match the NBA abbreviations

- RotoGuru has unlisted players that may have played, those players do not have Position or Salary values

- The Allstar games don't have actual teams, I will not need allstar data for this project

### Import Data

In [2]:
#Import the raw data from pickle file
dkdata_raw = pickle.load(open('rotoDKraw.p','rb'))

### Concatenate List of DataFrames to One Table

The data was scraped into a dictionary of dataframes with dates as the keys. The goal here is to concatenate all of the dataframe values into one table with the dates as the index. 

In [3]:
#First, identify the dates that returned 'error' during the scraping process
errors = [date for date in dkdata_raw if type(dkdata_raw[date]) is str]

#Remove the dates with errors by returning all dates without errors into a dictionary
dkdata_df = {date:dkdata_raw[date] for date in dkdata_raw if date not in errors}

#Concatenate all the dataframes into one table, keep the keys as the dates in the first argument of pd.concat
#Set ignore_index to False to keep the dates as the index, drop the second level of the index to leave only the dates
dkdata_df = pd.concat({date:dkdata_df[date] for date in dkdata_df} , ignore_index = False).reset_index(level=1,drop = True)
dkdata_df.info()
dkdata_df.head(10)

<class 'pandas.core.frame.DataFrame'>
Index: 148216 entries, 2014-1-1 to 2018-3-9
Data columns (total 9 columns):
0    145119 non-null object
1    145159 non-null object
2    145194 non-null object
3    142096 non-null object
4    145118 non-null object
5    145194 non-null object
6    145194 non-null object
7    126675 non-null object
8    114883 non-null object
dtypes: object(9)
memory usage: 11.3+ MB


Unnamed: 0,0,1,2,3,4,5,6,7,8
2014-1-1,RotoGuru is produced by Dave Hall (a.k.a. the ...,,,,,,,,
2014-1-10,RotoGuru is produced by Dave Hall (a.k.a. the ...,,,,,,,,
2014-1-11,RotoGuru is produced by Dave Hall (a.k.a. the ...,,,,,,,,
2014-1-12,RotoGuru is produced by Dave Hall (a.k.a. the ...,,,,,,,,
2014-1-13,RotoGuru is produced by Dave Hall (a.k.a. the ...,,,,,,,,
2014-1-14,RotoGuru is produced by Dave Hall (a.k.a. the ...,,,,,,,,
2014-1-15,RotoGuru is produced by Dave Hall (a.k.a. the ...,,,,,,,,
2014-1-16,RotoGuru is produced by Dave Hall (a.k.a. the ...,,,,,,,,
2014-1-17,RotoGuru is produced by Dave Hall (a.k.a. the ...,,,,,,,,
2014-1-18,RotoGuru is produced by Dave Hall (a.k.a. the ...,,,,,,,,


### Remove Unnecessary Rows

There is clearly a lot of unnecessary rows in the raw data as we can see from the first 10 rows of the data. Above the table we can see that there are 9 columns, and 8 of them are NaN for each row in the table printed. It's safe to assume we want rows that have at least 5 non-null values.

In [4]:
dkdata_df = dkdata_df.dropna(0,thresh = 5)
dkdata_df.info()
dkdata_df.head(10)


<class 'pandas.core.frame.DataFrame'>
Index: 145160 entries, 2014-10-28 to 2018-3-9
Data columns (total 9 columns):
0    142097 non-null object
1    145159 non-null object
2    145160 non-null object
3    142096 non-null object
4    145086 non-null object
5    145160 non-null object
6    145160 non-null object
7    126675 non-null object
8    114883 non-null object
dtypes: object(9)
memory usage: 11.1+ MB


Unnamed: 0,0,1,2,3,4,5,6,7,8
2014-10-28,Guards,DK Points,Salary,Team,Opp.,Score,Min,Stats,
2014-10-28,SG,"Harden, James^",45.75,"$10,200",hou,@ lal,108-90,31,32pt 1rb 6as 1st 3trey 7-17fg 15-16ft
2014-10-28,SG,"Ellis, Monta^",41,"$7,800",dal,@ sas,100-101,37,26pt 4rb 6as 1st 3to 1trey 11-21fg 3-4ft
2014-10-28,SG,"Ginobili, Manu",35,"$4,800",sas,v dal,101-100,28,20pt 2rb 6as 2st 3to 2trey 6-13fg 6-7ft
2014-10-28,PG,"Parker, Tony^",32.75,"$6,400",sas,v dal,101-100,35,23pt 3rb 3as 1to 4trey 9-15fg 1-2ft
2014-10-28,SG,"Evans, Tyreke^",32.25,"$6,600",nor,v orl,101-84,35,12pt 9rb 6as 5-15fg 2-5ft
2014-10-28,PG,"Harris, Devin",29,"$4,200",dal,@ sas,100-101,29,17pt 5as 2st 1to 2trey 6-12fg 3-4ft
2014-10-28,SG,"Bryant, Kobe^",27.25,"$7,800",lal,v hou,90-108,29,19pt 3rb 2as 1st 1to 6-17fg 7-8ft
2014-10-28,SG,"Belinelli, Marco^",24.5,"$4,200",sas,v dal,101-100,31,15pt 2rb 3as 1st 2to 3trey 5-8fg 2-2ft
2014-10-28,PG,"Holiday, Jrue^",23.5,"$7,800",nor,v orl,101-84,27,8pt 2rb 4as 3st 1bl 2to 4-11fg 0-0ft


### Assign Proper Column Names

That's better! However, it seems as though the column names are spread out throughout the table in separate rows. We can see that by the first row containing column informations, as well as the last column in the table having a lot less non-null objects than the other columns. 

Also, remember the point made in the intro about the player names not having a column name? Well it seems like the column names have shifted one spot to the left. So the goal here is to remove all the rows with column names, shift the left 7 column names to the right 1 column, and assign 'Position' and 'Player' as the right 2 columns.

In [5]:
#assign list of column names
colnames = list(dkdata_df.iloc[0][1:-1])

#and position and player to the list of column names and assign it to the dataframe
dkdata_df.columns = ['Position','Player']+colnames
dkdata_df.head()

Unnamed: 0,Position,Player,DK Points,Salary,Team,Opp.,Score,Min,Stats
2014-10-28,Guards,DK Points,Salary,Team,Opp.,Score,Min,Stats,
2014-10-28,SG,"Harden, James^",45.75,"$10,200",hou,@ lal,108-90,31,32pt 1rb 6as 1st 3trey 7-17fg 15-16ft
2014-10-28,SG,"Ellis, Monta^",41,"$7,800",dal,@ sas,100-101,37,26pt 4rb 6as 1st 3to 1trey 11-21fg 3-4ft
2014-10-28,SG,"Ginobili, Manu",35,"$4,800",sas,v dal,101-100,28,20pt 2rb 6as 2st 3to 2trey 6-13fg 6-7ft
2014-10-28,PG,"Parker, Tony^",32.75,"$6,400",sas,v dal,101-100,35,23pt 3rb 3as 1to 4trey 9-15fg 1-2ft


In [6]:
#See if we can filter out the column rows by length of string in Position column, ignore nulls for now
set(dkdata_df.Position.dropna())


{'C',
 'Centers',
 'Forwards',
 'Guards',
 'PF',
 'PF/C',
 'PG',
 'PG/SF',
 'PG/SG',
 'SF',
 'SF/PF',
 'SG',
 'SG/SF',
 'Unlisted'}

In [7]:
#The longest positional text has 5 characters, i.e. 'PG/SF','PG/SG'
#The shortest grouped word in 'Guards' and that has 6 characters. 
#So we want all the rows in the data where the Position column is 5 characters or less.
#KEEP NULL VALUES
dkdata_df = dkdata_df.loc[(dkdata_df.Position.str.len() < 6) | dkdata_df.Position.isnull()]
print(dkdata_df.info())
dkdata_df.head()

<class 'pandas.core.frame.DataFrame'>
Index: 142351 entries, 2014-10-28 to 2018-3-9
Data columns (total 9 columns):
Position     139288 non-null object
Player       142350 non-null object
DK Points    142351 non-null object
Salary       139287 non-null object
Team         142277 non-null object
Opp.         142351 non-null object
Score        142351 non-null object
Min          123866 non-null object
Stats        114883 non-null object
dtypes: object(9)
memory usage: 10.9+ MB
None


Unnamed: 0,Position,Player,DK Points,Salary,Team,Opp.,Score,Min,Stats
2014-10-28,SG,"Harden, James^",45.75,"$10,200",hou,@ lal,108-90,31,32pt 1rb 6as 1st 3trey 7-17fg 15-16ft
2014-10-28,SG,"Ellis, Monta^",41.0,"$7,800",dal,@ sas,100-101,37,26pt 4rb 6as 1st 3to 1trey 11-21fg 3-4ft
2014-10-28,SG,"Ginobili, Manu",35.0,"$4,800",sas,v dal,101-100,28,20pt 2rb 6as 2st 3to 2trey 6-13fg 6-7ft
2014-10-28,PG,"Parker, Tony^",32.75,"$6,400",sas,v dal,101-100,35,23pt 3rb 3as 1to 4trey 9-15fg 1-2ft
2014-10-28,SG,"Evans, Tyreke^",32.25,"$6,600",nor,v orl,101-84,35,12pt 9rb 6as 5-15fg 2-5ft


### Clean up each Column

Now that our columns are all set up and we got rid the unnecessary rows, lets clean up each column. 

#### Index

In [8]:
#First, set the index to datetime
dkdata_df.index = pd.to_datetime(dkdata_df.index).date

#check the null value situation of the data
print(dkdata_df.isnull().sum())

Position      3063
Player           1
DK Points        0
Salary        3064
Team            74
Opp.             0
Score            0
Min          18485
Stats        27468
dtype: int64


#### Player column

In [9]:
#Lets start by getting rid of the null player value
dkdata_df = dkdata_df.loc[dkdata_df.Player.notnull()]

#If a player started, they have a '^' by there name. This function helps us identify starters
def idStarter(name):
    if '^' in name:
        return 'starter'
    return 'bench'

#create a column to identify starters using idStarter function
dkdata_df['starter'] = [idStarter(name) for name in dkdata_df.Player]  

#remove the '^' in the player name column that id's starters
dkdata_df.Player = dkdata_df.Player.str.replace('^','')

#flip first and last name and remove comma in order to match BBall Ref and NBA API player names 
dkdata_df.Player = [' '.join(s.split(',')[::-1]).strip() for s in dkdata_df.Player]
dkdata_df.head()
                    


Unnamed: 0,Position,Player,DK Points,Salary,Team,Opp.,Score,Min,Stats,starter
2014-10-28,SG,James Harden,45.75,"$10,200",hou,@ lal,108-90,31,32pt 1rb 6as 1st 3trey 7-17fg 15-16ft,starter
2014-10-28,SG,Monta Ellis,41.0,"$7,800",dal,@ sas,100-101,37,26pt 4rb 6as 1st 3to 1trey 11-21fg 3-4ft,starter
2014-10-28,SG,Manu Ginobili,35.0,"$4,800",sas,v dal,101-100,28,20pt 2rb 6as 2st 3to 2trey 6-13fg 6-7ft,bench
2014-10-28,PG,Tony Parker,32.75,"$6,400",sas,v dal,101-100,35,23pt 3rb 3as 1to 4trey 9-15fg 1-2ft,starter
2014-10-28,SG,Tyreke Evans,32.25,"$6,600",nor,v orl,101-84,35,12pt 9rb 6as 5-15fg 2-5ft,starter


#### Team and Opp. 
We want the team abbreviations to be capitalized, 30 distinct teams, and no allstar teams represented. Plus, we don't want the extra just in front of the values in the 'Opp.' column

In [10]:
#Identify the unique teams
print(dkdata_df.Team.unique())

['hou' 'dal' 'sas' 'nor' 'lal' 'orl' 'gsw' 'okc' 'mil' 'phi' 'cha' 'tor'
 'bos' 'sac' 'ind' 'atl' 'was' 'bkn' 'pho' 'por' 'mia' 'uta' 'min' 'mem'
 'chi' 'det' 'den' 'nyk' 'lac' 'cle' 'WES' 'EAS' nan]


In [11]:
#create variables that will filter out the columns with null values, west allstars, and east allstars
noteam = (dkdata_df.Team.isnull())
westallstar = (dkdata_df.Team == 'WES')
eastallstar = (dkdata_df.Team == 'EAS')
allstarweek = dkdata_df[noteam | westallstar | eastallstar]
allstarweek.head().append(allstarweek.tail())


Unnamed: 0,Position,Player,DK Points,Salary,Team,Opp.,Score,Min,Stats,starter
2015-02-15,PG,Russell Westbrook,56.75,"$10,600",WES,@ EAS,163-158,26,41pt 5rb 1as 3st 1to 5trey 16-28fg 4-4ft,bench
2015-02-15,PG,Chris Paul,46.5,"$9,800",WES,@ EAS,163-158,27,12pt 6rb 15as 2st 2to 6-13fg 0-0ft,bench
2015-02-15,PF,Tim Duncan,16.25,"$7,400",WES,@ EAS,163-158,15,2pt 9rb 2as 1-2fg 0-0ft,bench
2015-02-15,PF,Dirk Nowitzki,15.75,"$6,700",WES,@ EAS,163-158,12,5pt 5rb 2st 1trey 2-5fg 0-0ft,bench
2015-02-15,SF,Kevin Durant,9.75,"$10,600",WES,@ EAS,163-158,10,3pt 3rb 1as 1st 2to 1trey 1-6fg 0-0ft,bench
2018-02-18,C,Andre Drummond,19.25,"$5,400",,@,148-,17:36,14pt 3rb 1bl 1to 7-7fg,bench
2018-02-18,PF/C,Draymond Green,18.25,"$5,800",,v,148-,18:15,3pt 5rb 2as 1st 2bl 3-4ft,bench
2018-02-18,PF/C,Al Horford,17.25,"$5,100",,v,148-,12:45,6pt 5rb 2as 1st 2-4fg 2-2ft,bench
2018-02-18,PF/C,Anthony Davis,15.5,"$10,000",,@,148-,16:40,12pt 2rb 1as 1to 6-9fg,starter
2018-02-18,PF/C,LaMarcus Aldridge,2.0,"$5,000",,@,148-,4:28,0pt 1bl 0-1fg,bench


In [12]:
#as seen from the above table, the null values also seem to be all star week, so we'll remove them all
dkdata_df = dkdata_df[~(noteam | westallstar | eastallstar)]   

#convert lowercase team abbreviations into uppercase
dkdata_df.Team = dkdata_df.Team.str.upper()

#The NBA abbreviations for the  New Orleans Pelicans and Phoenix Suns are 'NOP' and 'PHX', respectively
dkdata_df.Team = dkdata_df.Team.str.replace('NOR','NOP') 
dkdata_df.Team = dkdata_df.Team.str.replace('PHO','PHX') 


#get rid of '@' and 'v' in the opponent column, as well as uppercase, plus change the NOR and PHO 
dkdata_df['Opp.'] = [team[-3:] for team in dkdata_df['Opp.']]
dkdata_df['Opp.'] = dkdata_df['Opp.'].str.upper()
dkdata_df['Opp.'] = dkdata_df['Opp.'].str.replace('NOR','NOP') 
dkdata_df['Opp.']  = dkdata_df['Opp.'].str.replace('PHO','PHX') 

print(dkdata_df.Team.unique())
print(dkdata_df['Opp.'].unique())

['HOU' 'DAL' 'SAS' 'NOP' 'LAL' 'ORL' 'GSW' 'OKC' 'MIL' 'PHI' 'CHA' 'TOR'
 'BOS' 'SAC' 'IND' 'ATL' 'WAS' 'BKN' 'PHX' 'POR' 'MIA' 'UTA' 'MIN' 'MEM'
 'CHI' 'DET' 'DEN' 'NYK' 'LAC' 'CLE']
['LAL' 'SAS' 'DAL' 'ORL' 'HOU' 'NOP' 'SAC' 'POR' 'CHA' 'IND' 'MIL' 'PHX'
 'ATL' 'BKN' 'UTA' 'GSW' 'PHI' 'TOR' 'MIA' 'BOS' 'OKC' 'WAS' 'MEM' 'MIN'
 'NYK' 'DEN' 'DET' 'CHI' 'CLE' 'LAC' 'V H' 'V A' 'V']


In [13]:
#Get rid of all the rows with the wierd opponent names 'V H', 'V A', 'V'
dkdata_df = dkdata_df.loc[~((dkdata_df['Opp.'] == 'V A') | (dkdata_df['Opp.'] == 'V H') | (dkdata_df['Opp.'] == 'V'))]

print(dkdata_df['Opp.'].unique())

['LAL' 'SAS' 'DAL' 'ORL' 'HOU' 'NOP' 'SAC' 'POR' 'CHA' 'IND' 'MIL' 'PHX'
 'ATL' 'BKN' 'UTA' 'GSW' 'PHI' 'TOR' 'MIA' 'BOS' 'OKC' 'WAS' 'MEM' 'MIN'
 'NYK' 'DEN' 'DET' 'CHI' 'CLE' 'LAC']


#### Position

In [14]:
#analyze the rows with null values in the position column
print('Position Null =', dkdata_df.Position.isnull().sum())
dkdata_df.loc[dkdata_df.Position.isnull()].head()

Position Null = 2898


Unnamed: 0,Position,Player,DK Points,Salary,Team,Opp.,Score,Min,Stats,starter
2014-10-28,,Devyn Marble,5.25,,ORL,NOP,84-101,3,4pt 1rb 2-2fg 0-2ft,bench
2014-10-29,,Malcolm Thomas,11.25,,PHI,IND,91-103,16,4pt 5rb 1bl 2to 1-2fg 2-2ft,bench
2014-10-29,,Joel Anthony,9.25,,DET,DEN,79-89,21,2pt 5rb 1bl 2to 1-1fg 0-0ft,bench
2014-10-30,,Nene Hilario,30.5,,WAS,ORL,105-98,34,12pt 4rb 5as 3st 5-10fg 2-4ft,starter
2014-10-30,,Jose Barea,13.0,,DAL,UTA,120-102,19,4pt 4rb 3as 1to 2-5fg 0-0ft,bench


In [15]:
#They seem to have all the information besides salary, lets fill these null values with the value of the
#previous time they appeared in the data

#Group the data by position and use groupby.transform to fill in from previous position for the same player
dkdata_df['Position'] = dkdata_df.groupby('Player')['Position'].transform(lambda pos: pos.ffill().bfill())

print('Position Null =', dkdata_df.Position.isnull().sum())
dkdata_df.loc[dkdata_df.Position.isnull()].head()

Position Null = 135


Unnamed: 0,Position,Player,DK Points,Salary,Team,Opp.,Score,Min,Stats,starter
2014-11-19,,Kalin Lucas,2.0,,MEM,TOR,92-96,6,0pt 1st 0-1fg 0-0ft,bench
2017-10-01,,Andy Rautins,2.0,,TOR,LAC,121-113,4:50,0pt 1st 0-1fg,bench
2017-10-10,,Amida Brimah,11.25,,SAS,ORL,98-103,5:10,3pt 5rb 1bl 1-1fg 1-4ft,bench
2017-10-10,,Matt Costello,2.25,,SAS,ORL,98-103,10:03,0pt 1rb 1bl 2to 0-2fg,bench
2017-10-11,,Isaiah Briscoe,13.75,,POR,PHX,113-104,12:00,11pt 1rb 2as 3to 5-6fg 1-1ft,bench


#### Salary
There are still 135 null values in the Position column, lets work with the Salary column first and see if null correlate between the two.

In [16]:
#We will use the same estimation for Salary as we did for position, as the players fantasy salary generally 
#doesn't vary too much from game to game
dkdata_df['Salary'] = dkdata_df.groupby('Player')['Salary'].transform(lambda sal: sal.ffill().bfill())

print('Salary Null =', dkdata_df.Salary.isnull().sum())
dkdata_df.loc[dkdata_df.Salary.isnull()].head(10)

Salary Null = 135


Unnamed: 0,Position,Player,DK Points,Salary,Team,Opp.,Score,Min,Stats,starter
2014-11-19,,Kalin Lucas,2.0,,MEM,TOR,92-96,6,0pt 1st 0-1fg 0-0ft,bench
2017-10-01,,Andy Rautins,2.0,,TOR,LAC,121-113,4:50,0pt 1st 0-1fg,bench
2017-10-10,,Amida Brimah,11.25,,SAS,ORL,98-103,5:10,3pt 5rb 1bl 1-1fg 1-4ft,bench
2017-10-10,,Matt Costello,2.25,,SAS,ORL,98-103,10:03,0pt 1rb 1bl 2to 0-2fg,bench
2017-10-11,,Isaiah Briscoe,13.75,,POR,PHX,113-104,12:00,11pt 1rb 2as 3to 5-6fg 1-1ft,bench
2017-10-11,,Terry Henderson,10.25,,CHA,BOS,100-108,15:05,5pt 1rb 1as 1bl 1trey 1-4fg 2-2ft,bench
2017-10-11,,Isaiah Taylor,5.0,,HOU,MEM,101-89,4:05,4pt 1as 1to 2-3fg,bench
2017-10-11,,Rade Zagorac,4.0,,MEM,HOU,89-101,7:16,2pt 2rb 1to 0-1fg 2-2ft,bench
2017-10-11,,T.J. Williams,0.25,,CHA,BOS,100-108,9:26,0pt 1rb 2to 0-1fg,bench
2017-10-12,,Maalik Wayns,25.0,,DAL,ATL,108-94,23:42,10pt 4rb 6as 1st 4to 2trey 3-10fg 2-2ft,bench


The number of missing positions and salaries are the same, and by the names of the players, they are all 2-way players. Two way players are a new concept in the NBA in the 2017-18 season so Draftkings might not have assigned salaries or positions to these players since they are not consistent members of the roster. However, the minimum salary for players in DraftKings is 3000 dollars.

In [17]:
dkdata_df['Salary'] = dkdata_df.groupby('Player')['Salary'].transform(lambda sal: sal.fillna('$3,000'))
dkdata_df.isnull().sum()

Position       135
Player           0
DK Points        0
Salary           0
Team             0
Opp.             0
Score            0
Min          18483
Stats        27431
starter          0
dtype: int64

In [18]:
#Now that we filled all the salary values, lets convert them into pandas numeric type
#Take every thing after the '$', remove the comma, convert to numeric
dkdata_df.Salary = [int(x[1:].replace(',','')) if str(x) != 'nan' else x for x in dkdata_df.Salary ]
dkdata_df.Salary.dtypes
dkdata_df.head()

Unnamed: 0,Position,Player,DK Points,Salary,Team,Opp.,Score,Min,Stats,starter
2014-10-28,SG,James Harden,45.75,10200,HOU,LAL,108-90,31,32pt 1rb 6as 1st 3trey 7-17fg 15-16ft,starter
2014-10-28,SG,Monta Ellis,41.0,7800,DAL,SAS,100-101,37,26pt 4rb 6as 1st 3to 1trey 11-21fg 3-4ft,starter
2014-10-28,SG,Manu Ginobili,35.0,4800,SAS,DAL,101-100,28,20pt 2rb 6as 2st 3to 2trey 6-13fg 6-7ft,bench
2014-10-28,PG,Tony Parker,32.75,6400,SAS,DAL,101-100,35,23pt 3rb 3as 1to 4trey 9-15fg 1-2ft,starter
2014-10-28,SG,Tyreke Evans,32.25,6600,NOP,ORL,101-84,35,12pt 9rb 6as 5-15fg 2-5ft,starter


#### DK Points

In [19]:
#convert DK Points column to numeric
dkdata_df['DK Points'] = pd.to_numeric(dkdata_df['DK Points'])
dkdata_df['DK Points'].dtypes

dtype('float64')

In [20]:
dkdata_df.head()

Unnamed: 0,Position,Player,DK Points,Salary,Team,Opp.,Score,Min,Stats,starter
2014-10-28,SG,James Harden,45.75,10200,HOU,LAL,108-90,31,32pt 1rb 6as 1st 3trey 7-17fg 15-16ft,starter
2014-10-28,SG,Monta Ellis,41.0,7800,DAL,SAS,100-101,37,26pt 4rb 6as 1st 3to 1trey 11-21fg 3-4ft,starter
2014-10-28,SG,Manu Ginobili,35.0,4800,SAS,DAL,101-100,28,20pt 2rb 6as 2st 3to 2trey 6-13fg 6-7ft,bench
2014-10-28,PG,Tony Parker,32.75,6400,SAS,DAL,101-100,35,23pt 3rb 3as 1to 4trey 9-15fg 1-2ft,starter
2014-10-28,SG,Tyreke Evans,32.25,6600,NOP,ORL,101-84,35,12pt 9rb 6as 5-15fg 2-5ft,starter


##  Conclusion

The Min and Stats columns don't concern us too much since we will pull the stats from the NBA.com API, and we mentioned the null Position values. Everything else is good to go and we now have a timeseries of DraftKings score and salary data. 

Next up, NBA. Check the NBA Data Wrangling Notebook to see how I went about cleaning it.
