# Getting DK Salary Data through MySportsFeeds API

With non-commercial use access to [mysportsfeeds.com API](https://www.mysportsfeeds.com/data-feeds/), which can be made accessible by becoming a Patron to them at [Patreon](https://www.patreon.com/mysportsfeeds/overview) in order to use their API data feeds. With Python, the API can be used in a couple ways:

1. Make GET requests using the API URL and specifying requisite parameters, API access keys. More info in their [API documentation](https://www.mysportsfeeds.com/data-feeds/api-docs/)

2. Use their Python library `ohmysportsfeedspy`, available on [Github](https://github.com/MySportsFeeds/mysportsfeeds-python) or through [Python Package Index](https://pypi.org/project/ohmysportsfeedspy/). This python library creates wrapper functions around the MySportsFeeds API for easy use. 

Here, I'll be going with option 1, making GET requests directly to MySportsFeeds' RESTful API. As noted in the documentation, there are parameters (ex. date) needed when making requests. Additionally, [HTTP Basic Authentication](https://en.wikipedia.org/wiki/Basic_access_authentication) is the method used to validate that the client (me) who wants to access resources governed by the API is allowed to do so via my API credentials with MySportsFeeds.

In [1]:
# Modules used in this notebook.
import base64
import requests
import json
import pandas as pd
import datetime as dt

## Write Functions for making API calls

*Note, that I did not include my authorization keys or API password in this notebook. Best practice not to share these in public repos.*

In [27]:
def get_basic_auth_header(username, password):
    """function to return dict of HTTP Basic access authentication
    credentials by generating the base64 encoded authorization header.
    These credentials can be used to make HTTP requests to a protected API.
    """
    # initialize HTTP authorization method dict.
    header = {'Authorization': 'Basic '} 
    user_cred = '{}:{}'.format(username, password)
    # add my encoded user login credentials
    header['Authorization'] += base64.b64encode(user_cred.encode('utf-8')).decode('ascii')
    return header


# see authentication section of API docs 2.0 for details on username, pword.
credentials = get_basic_auth_header(username='XXXXXXX',
                                    password='YYYYYYY')

print(credentials)


{'Authorization': 'Basic WFhYWFhYWDpZWVlZWVlZ'}


The above function and code helps to create a valid HTTP Basic authentication header, that prepends the HTTP Basic authorization method and our encoded API Key and password credentials. It will look of the form 
`Authorization: Basic QWxhZGRpbjpPcGVuU2VzYW1l`, as an example. 

This piece of data will be passed as a header in our GET request made to the MySportsFeeds API. 

Below, we'll write a function to make the GET request to any properly formated MySportsFeed API URL. We'll intentionally read the response as a JSON. Other options for receiving GET responses include r.text, r.content.

In [3]:
def send_request(url, auth_dict):
    """Package an API GET request and return a JSON dict-like object.
    url: a formatted URL in accordance with the API docs for this feed.
    Date: should be 'YYYYMMDD'. 
    Authn_str: HTTP Basic authorization header should be dict like.
            ex. {'Authorization': 'Basic NGVjNjhiOGQtYjVhZi00Zj='}
    """
    try:
        r = requests.get(url,
                         headers = auth_dict)
        print('Response HTTP Status Code: {}'.format(r.status_code))
        return r.text
    
    except requests.exceptions.RequestException as e:
        print(e)


## Make API Call with Python

Having written the send_request() function, lets work on identifying a properly formatted URL string to access Daily Fantasy Sports salary data from DraftKings. Refer again to the MySportsFeeds [API documentation](https://www.mysportsfeeds.com/data-feeds/api-docs/) for how to properly format the resource URLs. 

The URL requires at least the following 3 parameters for the 2.0 API:

* season: set to 2018 regular season as 2018-regular
* date: set to a specific date as YYYYMMDD
* format: json or csv. we'll go with JSON for now to get familiar with it.

Based on reviewing prior 1.2 API documentation, it appears additional parameters can be passed using standard URL [query string syntax](https://en.wikipedia.org/wiki/Query_string) by appending *"?param1=value1&param2=value2"*. Additional parameters I included for now are:

* dfstype: set to draftkings
* sort: sort in descending order of salary

In [4]:
sample_dfs_url = 'https://api.mysportsfeeds.com/v2.0/pull/mlb/2018-regular/date/20180715/dfs.json?dfstype=draftkings&sort=dfs.salary.D'

A successful HTTP request to the API will yield a response code of 200, meaning it went through OK.

In [5]:
data = send_request(sample_dfs_url, credentials)

Response HTTP Status Code: 200


## Explore the response JSON

The `requests` response was captured as a JSON string, but its structure is a nested dict. 

In [6]:
type(data)

str

In [7]:
data[:2500] # print first 2500 characters of the JSON string

'{"lastUpdatedOn":"2018-07-16T00:47:17.329Z","dfsEntries":[{"dfsSource":"DraftKings","dfsRows":[{"player":{"id":10432,"firstName":"Chris","lastName":"Sale","position":"P","jerseyNumber":49},"team":{"id":113,"abbreviation":"BOS"},"game":{"id":44862,"startTime":"2018-07-15T17:05:00.000Z","awayTeamAbbreviation":"TOR","homeTeamAbbreviation":"BOS"},"dfsSourceId":392121,"salary":14700,"fantasyPoints":null},{"player":{"id":11042,"firstName":"Max","lastName":"Scherzer","position":"P","jerseyNumber":31},"team":{"id":126,"abbreviation":"WAS"},"game":{"id":44865,"startTime":"2018-07-15T17:10:00.000Z","awayTeamAbbreviation":"WAS","homeTeamAbbreviation":"NYM"},"dfsSourceId":326473,"salary":14100,"fantasyPoints":null},{"player":{"id":10462,"firstName":"Justin","lastName":"Verlander","position":"P","jerseyNumber":35},"team":{"id":122,"abbreviation":"HOU"},"game":{"id":44868,"startTime":"2018-07-15T18:10:00.000Z","awayTeamAbbreviation":"DET","homeTeamAbbreviation":"HOU"},"dfsSourceId":277705,"salary":

We can convert a JSON string to python form of a dict by doing the following

In [8]:
data = json.loads(data)
type(data)

dict

In [9]:
data.keys()

dict_keys(['lastUpdatedOn', 'dfsEntries', 'references'])

The data are in a really complicated list of dicts of dicts type structure. We need to explore it a little to figure out how the player salary data can be easily accessed.

In [10]:
type(data['dfsEntries'])

list

In [11]:
type(data['dfsEntries'][0]['dfsRows'])

list

In [12]:
len(data['dfsEntries'][0]['dfsRows'])

1524

The above JSON structure, where we accessed **'dfsEntries'** and **'dfsRows'**, returns us a list of 1,524 dicts, where each dict is a complex player dict. Now we're getting somewhere. Let's extract the first dict from this list.

In [13]:
data['dfsEntries'][0]['dfsRows'][0] # access the first player's dict of dicts from that day.

{'player': {'id': 10432,
  'firstName': 'Chris',
  'lastName': 'Sale',
  'position': 'P',
  'jerseyNumber': 49},
 'team': {'id': 113, 'abbreviation': 'BOS'},
 'game': {'id': 44862,
  'startTime': '2018-07-15T17:05:00.000Z',
  'awayTeamAbbreviation': 'TOR',
  'homeTeamAbbreviation': 'BOS'},
 'dfsSourceId': 392121,
 'salary': 14700,
 'fantasyPoints': None}

## Parse the response JSON into a list of dictionaries

This structure is not very friendly to navigate. We'll parse each dict of player data to return a new, cleaner dict of 6 key:value items: *firstName* + *lastName*, *id*, *position*, *startTime*, *team*, *salary*. 

Each parsed dict will be added to a new list, called **output_list**, resulting in a list of 1,524 dicts. This setup, a list of dictionaries, is the default row-oriented behavior of a `pandas` dataframe, which I found described [here](http://pbpython.com/pandas-list-dict.html).

Below shows the code to parse the desired elements from each complex player dict into clean dict, then appending each player to the new **output_list**. 

In [14]:
# subset the JSON to the player info section. a list of 1,000+ player dicts
raw_data = data['dfsEntries'][0]['dfsRows']

output_list = []

for data in raw_data:
    player_dict = {}
    player_dict['name'] = "{0} {1}".format(data['player']['firstName'],
                                           data['player']['lastName'])
    player_dict['playerid'] = data['player']['id']
    player_dict['pos'] = data['player']['position']
    player_dict['team'] = data['team']['abbreviation']
    player_dict['game_datetime'] = data['game']['startTime']
    player_dict['dk_salary'] = data['salary']
    
    # add dict as new element in list, iterate. 
    output_list.append(player_dict)

We can check if the parsing code above returned the same number of elements as the original section of the JSON output that had the player data.

In [15]:
len(output_list) #yes

1524

Additionally, let's look at the first element of the clean **output_list**. We see that Chris Sale's information is a much cleaner dict than the earlier rendering from the JSON.

In [16]:
output_list[0]

{'name': 'Chris Sale',
 'playerid': 10432,
 'pos': 'P',
 'team': 'BOS',
 'game_datetime': '2018-07-15T17:05:00.000Z',
 'dk_salary': 14700}

## Convert to a Pandas DataFrame

Now it is time to start taking this list of dictionaries and convert it to a `pandas` dataframe. We can preview its dimensions.

In [17]:
df = pd.DataFrame(output_list,
                 columns=['name', 'team', 'pos', 'dk_salary',
                         'game_datetime', 'playerid'])

df.shape

(1524, 6)

Let's look at each columns data type

In [18]:
df.dtypes

name             object
team             object
pos              object
dk_salary         int64
game_datetime    object
playerid          int64
dtype: object

And preview the first 25 rows

In [19]:
df.head(25)

Unnamed: 0,name,team,pos,dk_salary,game_datetime,playerid
0,Chris Sale,BOS,P,14700,2018-07-15T17:05:00.000Z,10432
1,Max Scherzer,WAS,P,14100,2018-07-15T17:10:00.000Z,11042
2,Justin Verlander,HOU,P,13500,2018-07-15T18:10:00.000Z,10462
3,Gerrit Cole,HOU,P,13300,2018-07-15T18:10:00.000Z,10792
4,Jacob deGrom,NYM,P,13100,2018-07-15T17:10:00.000Z,10676
5,Clayton Kershaw,LAD,P,12900,2018-07-15T20:10:00.000Z,10573
6,Aaron Nola,PHI,P,12300,2018-07-15T17:10:00.000Z,10779
7,Trevor Bauer,CLE,P,12000,2018-07-15T17:10:00.000Z,10363
8,Madison Bumgarner,SF,P,11400,2018-07-15T20:05:00.000Z,10883
9,Patrick Corbin,ARI,P,10900,2018-07-15T17:35:00.000Z,10185


## Remove Potential Duplicate rows

One of the first cleaning steps will be to make sure there are no duplicate rows. Note that players may show up multiple times in a response if their team has a doubleheader that day (but the rows will differ based on game_datetime in case of a player with a doubleheader). So we'll remove duplicates based on value of all columns per row. We can do this using the `pandas` dataframe drop_duplicates() method.


In [20]:
df = df.drop_duplicates()

## Convert Series string object to a Datetime64 object

Next, we'll parse the *game_datetime* column, which is currently captured as a character object in the dataframe into a proper date format.

### Approach 1

Refer to the `datetime` module [documentation](https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior) for how to parse datetime strings. We can iterate over every row in the column and parse the string into a datetime object, assigning it to a new column *game_datetime_cln*.

In [21]:
df['game_datetime_cln'] = [dt.datetime.strptime(row,('%Y-%m-%dT%H:%M:%S.000Z')) for row in df['game_datetime']]

In [22]:
df.dtypes

name                         object
team                         object
pos                          object
dk_salary                     int64
game_datetime                object
playerid                      int64
game_datetime_cln    datetime64[ns]
dtype: object

In [23]:
df.head(5)

Unnamed: 0,name,team,pos,dk_salary,game_datetime,playerid,game_datetime_cln
0,Chris Sale,BOS,P,14700,2018-07-15T17:05:00.000Z,10432,2018-07-15 17:05:00
1,Max Scherzer,WAS,P,14100,2018-07-15T17:10:00.000Z,11042,2018-07-15 17:10:00
2,Justin Verlander,HOU,P,13500,2018-07-15T18:10:00.000Z,10462,2018-07-15 18:10:00
3,Gerrit Cole,HOU,P,13300,2018-07-15T18:10:00.000Z,10792,2018-07-15 18:10:00
4,Jacob deGrom,NYM,P,13100,2018-07-15T17:10:00.000Z,10676,2018-07-15 17:10:00


### Approach 2 (Simpler)

There's actually an easier way to do this using `pandas` own methods, which inheret from the `datetime` module. This approach saves code and readability than the original version on line 80 above.

In [24]:
df['game_datetime_pd_style'] = pd.to_datetime(df['game_datetime'])

In [25]:
df.dtypes

name                              object
team                              object
pos                               object
dk_salary                          int64
game_datetime                     object
playerid                           int64
game_datetime_cln         datetime64[ns]
game_datetime_pd_style    datetime64[ns]
dtype: object

In [26]:
df.head(5)

Unnamed: 0,name,team,pos,dk_salary,game_datetime,playerid,game_datetime_cln,game_datetime_pd_style
0,Chris Sale,BOS,P,14700,2018-07-15T17:05:00.000Z,10432,2018-07-15 17:05:00,2018-07-15 17:05:00
1,Max Scherzer,WAS,P,14100,2018-07-15T17:10:00.000Z,11042,2018-07-15 17:10:00,2018-07-15 17:10:00
2,Justin Verlander,HOU,P,13500,2018-07-15T18:10:00.000Z,10462,2018-07-15 18:10:00,2018-07-15 18:10:00
3,Gerrit Cole,HOU,P,13300,2018-07-15T18:10:00.000Z,10792,2018-07-15 18:10:00,2018-07-15 18:10:00
4,Jacob deGrom,NYM,P,13100,2018-07-15T17:10:00.000Z,10676,2018-07-15 17:10:00,2018-07-15 17:10:00


We can see from the above assessment of the last 2 columns, we get the same parsed date formats. 

## Final Thoughts

Having mocked up and explored the MySportsFeed API, we can encapsulate a lot of this new knowledge into a Python module that can act as a program that will pipeline all of the above work: Make GET request, parse the JSON response into a `Pandas` dataFrame, clean the dataFrame, then do something with the data (ex. analysis, write to disk or database).