## NHL Stats Challenge

> If you have not had a chance to check out the [Lazy Guide to Web Scraping](https://github.com/sportsdatasolutions/sport_x_code_eis/blob/master/3.projects/LazyGuides/web_scraping.md) please do so! You can find the code for the guide in this [Deepnote Project](https://deepnote.com/project/0d7f30b4-7eb4-4d7e-b601-28085e59e0d3#%2F1.OpenAPI.ipynb).

#### The challenge is to simply scrape the **Skaters Stats Table** with stats from the [**combined 18/19 and 19/20 season playoffs**](http://www.nhl.com/stats/skaters?reportType=season&seasonFrom=20182019&seasonTo=20192020&gameType=3&filter=gamesPlayed,gte,1&sort=points,goals,assists&page=0&pageSize=50).
1. ##### **Start** by ***using the Network Tool*** on your ```browser``` to identify any possible requests for the data in the table.

2. ##### **Copy** and **Paste** the ***request url*** into a new tab on your **browser** to see if you can access the API without the actual website.

3. ##### **Make call(s)** for the relevant data via [`requests`.](https://requests.readthedocs.io/en/master/)

4. ##### **Create** a Pandas **Dataframe** containing the **data**.

5. ##### **Make sure you have ALL the data**. You'll notice that the table has a total of **583 records**. Does yours?

    **Hint:** The table is **paginated**, and **so is the API**. Pay attention to the **Request URL Parameters**. Have a play around with the **values** given to the **`page`** and **`pageSize`** parameters.


6. ##### **Save** the Dataframe to a new **CSV** file within the `data` folder.


In [29]:
# import required libraries
import requests
import json
import pandas as pd

#define number of page iterations needed to see all 583 records
start_values=list(range(0,12))

#define url strings pre (url1) and post (url2) start page number
url1 = "https://api.nhle.com/stats/rest/en/skater/summary?isAggregate=true&isGame=false&sort=%5B%7B%22property%22:%22points%22,%22direction%22:%22DESC%22%7D,%7B%22property%22:%22goals%22,%22direction%22:%22DESC%22%7D,%7B%22property%22:%22assists%22,%22direction%22:%22DESC%22%7D%5D&start="
url2 = "&limit=50&factCayenneExp=gamesPlayed%3E=1&cayenneExp=gameTypeId=3%20and%20seasonId%3C=20192020%20and%20seasonId%3E=20182019"




In [15]:
   ## checking nested structure of extracted data
   # url = url1 + start_string[0] + url2
   # nhl_data_part = requests.get(url).json()
   # print(nhl_data_part.keys())
   # table_data1 = nhl_data_part.get("data")

dict_keys(['data', 'total'])


In [30]:
#loop through different start pages
for start_value in start_values:
    #create new url
    #convert page number to str
    start_string = str(start_value)
    #combine strings to form url
    url = url1 + start_string + url2
    #request data as json format
    nhl_data_part = requests.get(url).json()
    #take "data" key from the extracted dictionary
    table_data1 = nhl_data_part.get("data")
    #convert into a dataframe
    nhl_dataframe_part = pd.DataFrame.from_dict(table_data1)
    #if first iteration create new variable as extracted data
    if start_value == 0:
        nhl_dataframe=nhl_dataframe_part
    #else append data from new iteration to that of previous iteration
    else:
        nhl_dataframe = nhl_dataframe.append(nhl_dataframe_part,ignore_index=True)

In [39]:
nhl_dataframe.to_csv('data/nhl_data.csv')

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=e188a3d3-d9fb-43a6-a297-8fc54b7562e6' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>