# Getting the Data

## Finding the source

When we were checking different sites for basketball data, we ran across one website that had what we needed.

[stats.nba.com](https://stats.nba.com/teams/traditional/?sort=W_PCT&dir=-1)

![NBA Stats site](../Images/stats.nba.com.png)

The problem we had was that it it's not an api, and there wasn't a way to get that data in terms of a CSV. And we didn't want to copy and paste each page individually.

## Getting the URL

We observed that when you put in the parameters of the page, only the frame loaded. We opened up Chrome Dev Tools to the networking tab and then looked what happened when you clicked run it.

![Chrome Exploration](../Images/dev_tools.png)

A close up of the url is shown here:

![Chrome Exploration](../Images/devurl.png)

`https://stats.nba.com/stats/leaguedashteamstats?Conference=&DateFrom=&DateTo=&Division=&GameScope=&GameSegment=&LastNGames=0&LeagueID=00&Location=Home&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=Totals&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2018-19&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&TwoWay=0&VsConference=&VsDivision=`

## Verify that it returns data in the right format

We simply pasted it into a browser to look and see if it was JSON compatibatible and it was.

![Data Check](../Images/url_response.png)

## Automate and combine the dataframe

From here, we went to the notebook to code the data scraping. Using the principles we learned 
from using python APIs, we decided to build our own dataframes.

In [71]:
import pandas as pd
import requests as rq
import json

In [79]:
#define initial seasons and make a values for the home and road parameters
seasons=[ "2012-13","2013-14","2014-15","2015-16","2016-17","2017-18","2018-19"]
# This null value here is a combined total of home and road
venues=["Home" , "Road" , "" ]
# This is the user agent string of Microsoft Edge. 
header_data= { "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14931"}

# set up lists to hold reponse info
year =[]
location =[]
team =[]
win=[]
loss = []
win_pct =[]
ftm = []
fta =[]
ft_pct =[]
w_rnk =[]
l_ank =[]
w_pct_rnk = []
ftm_rnk = []
fta_rnk =[]
ft_pct_rnk =[]
total_points=[]


#i is the index for season
i=0
#j is the index for home_away
j=1

#loop through seasons and venuses and use the url described in the first section.
for season in seasons:
    for venue in venues:
        urlliteral= f"https://stats.nba.com/stats/leaguedashteamstats?Conference=&DateFrom=&DateTo=&Division=&GameScope=&GameSegment=&LastNGames=0&LeagueID=00&Location={venue}&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=Totals&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season={season}&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&TwoWay=0&VsConference=&VsDivision="
        new=rq.get(urlliteral, headers=header_data).json()
        #This loops through the teams. There are 30 of them and it's static. The data comes out as rows
        for t in range(0,30,1):
            team.append(new["resultSets"][0]["rowSet"][t][1])
            win.append(new["resultSets"][0]["rowSet"][t][3])
            loss.append(new["resultSets"][0]["rowSet"][t][4])
            win_pct.append(new["resultSets"][0]["rowSet"][t][5])
            ftm.append(new["resultSets"][0]["rowSet"][t][13])
            fta.append(new["resultSets"][0]["rowSet"][t][14])
            ft_pct.append(new["resultSets"][0]["rowSet"][t][15])
            w_rnk.append(new["resultSets"][0]["rowSet"][t][29])
            l_ank.append(new["resultSets"][0]["rowSet"][t][30])
            w_pct_rnk.append(new["resultSets"][0]["rowSet"][t][31])
            ftm_rnk.append(new["resultSets"][0]["rowSet"][t][39])
            fta_rnk.append(new["resultSets"][0]["rowSet"][t][40])
            ft_pct_rnk.append(new["resultSets"][0]["rowSet"][t][41])
            total_points.append(new["resultSets"][0]["rowSet"][t][26])
            year.append(new['parameters']["Season"])
            location.append(new['parameters']["Location"])
        print(f"Getting data for {new['parameters']['Season']} {new['parameters']['Location']}")

print("Data scrape is done")
              
# This makes it look a lot cleaner than it was. There was a lot of navigating the JSON and printing
# Until we were able to find the right data locations.
# Additionally, we did a few trial loops and printed the responses to make sure the requests aligned
# with the responses
              
              

Getting data for 2012-13 Home
Getting data for 2012-13 Road
Getting data for 2012-13 None
Getting data for 2013-14 Home
Getting data for 2013-14 Road
Getting data for 2013-14 None
Getting data for 2014-15 Home
Getting data for 2014-15 Road
Getting data for 2014-15 None
Getting data for 2015-16 Home
Getting data for 2015-16 Road
Getting data for 2015-16 None
Getting data for 2016-17 Home
Getting data for 2016-17 Road
Getting data for 2016-17 None
Getting data for 2017-18 Home
Getting data for 2017-18 Road
Getting data for 2017-18 None
Getting data for 2018-19 Home
Getting data for 2018-19 Road
Getting data for 2018-19 None
Data scrape is done


In [80]:
#Add all the lists into a dictionary
nba_dict  = {
   "Team": team,
    "Season": year,
   "Venue": location,
   "Wins": win,
   "Loss": loss,
   "Win%": win_pct,
   "FT Made": ftm,
   "FT Att": fta,
   "FT%": ft_pct,
   "Win Rnk": w_rnk,
   "Loss Rnk": l_ank,
   "Win% Rnk": w_pct_rnk,
   "FT Made Rnk": ftm_rnk,
   "FT ATT Rnk": fta_rnk,
   "FT% Rnk": ft_pct_rnk,
   "Total Points": total_points
}

#Create the dataframe
nba_data = pd.DataFrame(nba_dict)
#Look at the data
nba_data.tail()

Unnamed: 0,Team,Season,Venue,Wins,Loss,Win%,FT Made,FT Att,FT%,Win Rnk,Loss Rnk,Win% Rnk,FT Made Rnk,FT ATT Rnk,FT% Rnk,Total Points
625,Sacramento Kings,2018-19,,39,41,0.488,1328,1832,0.725,15,17,17,21,15,27,9103
626,San Antonio Spurs,2018-19,,46,34,0.575,1376,1680,0.819,12,13,13,19,24,1,8939
627,Toronto Raptors,2018-19,,56,24,0.7,1414,1759,0.804,2,2,2,15,21,4,9147
628,Utah Jazz,2018-19,,49,30,0.62,1465,1997,0.734,7,7,7,9,5,26,8797
629,Washington Wizards,2018-19,,32,48,0.4,1480,1927,0.768,22,23,23,8,9,17,9130


In [81]:
#Write to csv
nba_data.to_csv("NBA Free Throw Data.csv", index=False)