#### In this notebook, scraping and retrieval of data from the NBA official stats page will be conducted. 

All the data is obtained from this site: https://www.nba.com/stats


### 1.0 Data Retrieval through API requests

#### 1.1 Playtype Dataset


API endpoint URL: https://stats.nba.com/stats/synergyplaytypes?LeagueID=00&PerMode=PerGame&PlayType=Isolation&PlayerOrTeam=T&SeasonType=Regular%20Season&SeasonYear=2024-25&TypeGrouping=offensive

From the API URL above, we can observe several parameters that is important for directing to this specific link. 

In [1]:
import requests
import pandas as pd
import time


In [2]:


url = "https://stats.nba.com/stats/synergyplaytypes" #base api URL

#variable parameters is PlayType and Season Type

base_params = {"LeagueID":"00",
                "PerMode":"PerGame",
                "PlayerOrTeam":"T",
                "SeasonYear":"2024-25",
                "TypeGrouping":"offensive"}

playtype = ["Isolation","Transition","PRBallHandler","PRRollman","Postup","Spotup",
            "Handoff","Cut","OffScreen","OffRebound","Misc"]

seasontype = ["Playoffs","Regular Season"]


headers = {
    "User-Agent":"Mozzila/5.0",
    "Referer":"https://www.nba.com/"
}

playoffs_data = []
reg_szn_data = []
for season_type in seasontype:

    for play_type in playtype:

        params = base_params.copy()
        params['PlayType'] = play_type
        params['SeasonType'] = season_type

        response = requests.get(url,headers = headers, params = params) #Python’s requests library takes your params dictionary 
        #and appends it to the URL like a query string.



        if response.status_code == 200: #if http request was successful 
            data = response.json()
            headers_ = data["resultSets"][0]['headers'] #data['resultSets'] returns a list of tables. (usually only 1 table)
            #we need to grab the first table from that list of table, so [0] will access the actual table
            #'headers' specifies the column we want to retrieve
            rows = data["resultSets"][0]['rowSet']

            df = pd.DataFrame(rows,columns=headers_)
            if season_type == "Playoffs":
                playoffs_data.append(df) # all_data is a list of DataFrames: [Isolation ... , Transition..., etc]
            else:
                reg_szn_data.append(df)
        else:
            print("HTTP Request could not be parsed")

        
        time.sleep(1) #pauses for 1 second to prevent getting rate-limited or blocked by the nba site

playoffs_df = pd.concat(playoffs_data,ignore_index = True) #pd.concat will stack all those DataFrame in all_data row_wise 
reg_szn_df = pd.concat(reg_szn_data,ignore_index = True)

#ignore_index resets row indices to be continuous from 0 

# playoffs_df.to_csv(r"C:\Users\calvin\Documents\python\NBA data analysis project\nba analysis csv files\nba_playoffs_playtype_2024-2025.csv",index = False)
# reg_szn_df.to_csv(r"C:\Users\calvin\Documents\python\NBA data analysis project\nba analysis csv files\nba_regular_season_playtype_2024-2025.csv",index = False)














ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

From observing the API endpoint URLs, we notice that only Playtype stats is provided by synergy.

The rest of the stats have a similar endpoint URL beginning with https://stats.nba.com/stats/

API endpoint URL for Clutch Stats: https://stats.nba.com/stats/leaguedashteamclutch?AheadBehind=Ahead%20or%20Behind&ClutchTime=Last%205%20Minutes&College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&ISTRound=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&PointDiff=5&Rank=N&Season=2024-25&SeasonSegment=&SeasonType=Playoffs&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight=

URL for Tracking Drives: https://stats.nba.com/stats/leaguedashptstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&Height=&ISTRound=&LastNGames=0&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PerMode=PerGame&PlayerExperience=&PlayerOrTeam=Team&PlayerPosition=&PtMeasureType=Drives&Season=2024-25&SeasonSegment=&SeasonType=Playoffs&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight=



We can do it manually, or we can use selenium-wire to automate the retrieval of all these Request URLs. **(nvm, selenium doesnt automate it. i will do what i did for playtype and repeat for the other categories)**




In [None]:
from seleniumwire import webdriver

#start browser
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(options=options)

#load webpage
driver.get("https://www.nba.com/stats/teams/clutch-traditional")


time.sleep(5) #wait a few seconds before making requests

for request in driver.requests:
    if request.response and "stats.nba.com" in request.url:
        print(request.url)
        



https://stats.nba.com/stats/leaguedashteamclutch?AheadBehind=Ahead%20or%20Behind&ClutchTime=Last%205%20Minutes&College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&ISTRound=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&PointDiff=5&Rank=N&Season=2024-25&SeasonSegment=&SeasonType=Playoffs&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight=


Through the manual way, we will have to find out the base_params that we are interested in, and we find these params by observing the URL. There may be some unnecessary params that are empty that we can ignore. 

#### 1.2 Clutch Stats Dataset


Lets try to do Clutch Stats as a function

API URL CLutch Stats: https://stats.nba.com/stats/leaguedashteamclutch?AheadBehind=Ahead%20or%20Behind&ClutchTime=Last%205%20Minutes&College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&ISTRound=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&PointDiff=5&Rank=N&Season=2024-25&SeasonSegment=&SeasonType=Playoffs&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight=


When we actually attempt to find the base params from the URL, we can see that it can be abit time consuming to manually analyse the link. Another easier way is to F12 on the webpage > Network > Fetch/XHR > Click on the API endpoint URL > Payload

Here, we can see the query string parameters and its values. (we still have to derive which is base_params and which is variable params ourselves). We can ignore empty parameters, parameters with "N", parameters with "0"

For Clutch Stats, we can see that variable params will be MeasureType and SeasonType (and Season if we want to get multiple seasons)

In [3]:
def clutch_stats():
    url = "https://stats.nba.com/stats/leaguedashteamclutch"
    base_param = {
        "AheadBehind": "Ahead or Behind",
        "ClutchTime": "Last 5 Minutes",
        "LastNGames": "0",
        "LeagueID": "00",
        "PerMode": "PerGame",
        "PointDiff": "5",
        "Season": "2024-25",
        "SeasonSegment": "",
        "DateFrom": "",
        "DateTo": "",
        "GameScope": "",
        "PlayerExperience": "",
        "PlayerPosition": "",
        "StarterBench": "",
        "Conference": "",
        "Division": "",
        "GameSegment": "",
        "Location": "",
        "MeasureType": "Base",
        "Month": "0",
        "OpponentTeamID": "0",
        "Outcome": "",
        "PaceAdjust": "N",
        "PlusMinus": "N",
        "Period": "0",
        "Rank": "N",
        "SeasonType": "Regular Season",
        "ShotClockRange": "",
        "TeamID": "0",
        "TwoWay": "0",
        "VsConference": "",
        "VsDivision": ""
    }
    measuretype = ['Base','Advanced','Four Factors','Misc','Scoring','Opponent'] #Note, Opponent is to see opponent's clutch stats against current team

    seasontype = ['Playoffs','Regular Season']

    headers = {
        "Host": "stats.nba.com",
        "Connection": "keep-alive",
        "Accept": "application/json, text/plain, */*",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                    "AppleWebKit/537.36 (KHTML, like Gecko) "
                    "Chrome/138.0.0.0 Safari/537.36",
        "Referer": "https://www.nba.com/stats/teams/clutch-traditional/",
        "Origin": "https://www.nba.com",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "en-US,en;q=0.9"
    }



    playoffs_data = []
    reg_szn_data = []
    for season_type in seasontype:

        for measure_type in measuretype:

            param = base_param.copy()
            param['MeasureType'] = measure_type
            param['SeasonType'] = season_type

            response = requests.get(url,headers = headers, params = param) #Python’s requests library takes your params dictionary 
            #and appends it to the URL like a query string.



            if response.status_code == 200: #if http request was successful 
                data = response.json()
                headers_ = data["resultSets"][0]['headers'] #data['resultSets'] returns a list of tables. (usually only 1 table)
                #we need to grab the first table from that list of table, so [0] will access the actual table
                #'headers' specifies the column we want to retrieve
                rows = data["resultSets"][0]['rowSet']

                df = pd.DataFrame(rows,columns=headers_)
                if season_type == "Playoffs":
                    playoffs_data.append(df) # all_data is a list of DataFrames: [Isolation ... , Transition..., etc]
                else:
                    reg_szn_data.append(df)
            else:
                print(f"❌ Request failed | {season_type=} | {measure_type=} | Status code: {response.status_code}")

            
            time.sleep(1) #pauses for 1 second to prevent getting rate-limited or blocked by the nba site

#for Clutch Stats, we cannot use pd.concat unlike for Playtype. This is because the headers for each column for Playtype is the same
#for all variable parameters, while it is all different for Clutch Stats. 
#Therefore, we have to use .concat, and introduce a extra column labelling the variable parameter for that

    playoffs_df = pd.concat(playoffs_data,axis = 1) 
    reg_szn_df = pd.concat(reg_szn_data,axis = 1)


    return playoffs_df,reg_szn_df




In [4]:
def clutch_stats():
    url = "https://stats.nba.com/stats/leaguedashteamclutch"
    base_param = {
        "AheadBehind": "Ahead or Behind",
        "ClutchTime": "Last 5 Minutes",
        "LastNGames": "0",
        "LeagueID": "00",
        "PerMode": "PerGame",
        "PointDiff": "5",
        "Season": "2024-25",
        "SeasonSegment": "",
        "DateFrom": "",
        "DateTo": "",
        "GameScope": "",
        "PlayerExperience": "",
        "PlayerPosition": "",
        "StarterBench": "",
        "Conference": "",
        "Division": "",
        "GameSegment": "",
        "Location": "",
        "MeasureType": "Base",
        "Month": "0",
        "OpponentTeamID": "0",
        "Outcome": "",
        "PaceAdjust": "N",
        "PlusMinus": "N",
        "Period": "0",
        "Rank": "N",
        "SeasonType": "Regular Season",
        "ShotClockRange": "",
        "TeamID": "0",
        "TwoWay": "0",
        "VsConference": "",
        "VsDivision": ""
    }
    measuretype = ['Base','Advanced','Four Factors','Misc','Scoring','Opponent'] #Note, Opponent is to see opponent's clutch stats against current team

    seasontype = ['Playoffs','Regular Season']

    headers = {
        "Host": "stats.nba.com",
        "Connection": "keep-alive",
        "Accept": "application/json, text/plain, */*",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                    "AppleWebKit/537.36 (KHTML, like Gecko) "
                    "Chrome/138.0.0.0 Safari/537.36",
        "Referer": "https://www.nba.com/stats/teams/clutch-traditional/",
        "Origin": "https://www.nba.com",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "en-US,en;q=0.9"
    }



    playoffs_data = []
    reg_szn_data = []
    combined_df_playoffs = pd.DataFrame()
    combined_df_reg = pd.DataFrame()
    for season_type in seasontype:

        for measure_type in measuretype:

            param = base_param.copy()
            param['MeasureType'] = measure_type
            param['SeasonType'] = season_type

            response = requests.get(url,headers = headers, params = param) #Python’s requests library takes your params dictionary 
            #and appends it to the URL like a query string.



            if response.status_code == 200: #if http request was successful 
                data = response.json()
                headers_ = data["resultSets"][0]['headers'] #data['resultSets'] returns a list of tables. (usually only 1 table)
                #we need to grab the first table from that list of table, so [0] will access the actual table
                #'headers' specifies the column we want to retrieve
                rows = data["resultSets"][0]['rowSet']

                df = pd.DataFrame(rows,columns=headers_)
                if season_type == "Playoffs":
                    # playoffs_data.append(df) # all_data is a list of DataFrames: [Isolation ... , Transition..., etc]
                    df = df.add_suffix(f'_{measure_type}') #we add suffix because each variable param may contain the same header
                    #suffix of each measure_type will indicate which param the header belongs to 
                    df = df.rename(columns = {f"TEAM_NAME_{measure_type}" : "TEAM_NAME"}) #however, we want the TEAM_NAME to be constant, as we want to merge on this
                    if combined_df_playoffs.empty:
                        combined_df_playoffs = df
                    
                    else:
                        df = df.drop(columns=[f'TEAM_ID_{measure_type}'])
                        combined_df_playoffs = pd.merge(combined_df_playoffs,df,on="TEAM_NAME",how="outer") 




                else:
                    # reg_szn_data.append(df)
                    df = df.add_suffix(f'_{measure_type}')
                    df = df.rename(columns = {f"TEAM_NAME_{measure_type}" : "TEAM_NAME"})
                    if combined_df_reg.empty:
                        combined_df_reg = df
                    else:
                        df = df.drop(columns=[f'TEAM_ID_{measure_type}'])

                        combined_df_reg = pd.merge(combined_df_reg,df,on="TEAM_NAME",how="outer")
            else:
                print(f"❌ Request failed | {season_type=} | {measure_type=} | Status code: {response.status_code}")




                
            time.sleep(1) #pauses for 1 second to prevent getting rate-limited or blocked by the nba site

#for Clutch Stats, we cannot use pd.concat unlike for Playtype. This is because the headers for each column for Playtype is the same
#for all variable parameters, while it is all different for Clutch Stats. 
#Therefore, we have to use .concat, and introduce a extra column labelling the variable parameter for that group of data.

    # playoffs_df = pd.concat(playoffs_data,ignore_index = True) 
    # reg_szn_df = pd.concat(reg_szn_data,ignore_index = True)


    return combined_df_playoffs,combined_df_reg




In [None]:
clutch_playoffs,clutch_regszn = clutch_stats()

clutch_playoffs.to_csv(r"C:\Users\calvin\Documents\python\NBA data analysis project\nba analysis csv files\nba_playoffs_clutch_2024-2025.csv",index = False)
clutch_regszn.to_csv(r"C:\Users\calvin\Documents\python\NBA data analysis project\nba analysis csv files\nba_regular_season_clutch_2024-2025.csv",index = False)


First attempt of running the clutch_stats() function and we get this error:



❌ Request failed | season_type='Playoffs' | measure_type='Base' | Status code: 500
❌ Request failed | season_type='Playoffs' | measure_type='Advanced' | Status code: 500
❌ Request failed | season_type='Playoffs' | measure_type='Four Factors' | Status code: 500
❌ Request failed | season_type='Playoffs' | measure_type='Misc' | Status code: 500
❌ Request failed | season_type='Playoffs' | measure_type='Scoring' | Status code: 500
❌ Request failed | season_type='Playoffs' | measure_type='Opponent' | Status code: 500
❌ Request failed | season_type='Regular Season' | measure_type='Base' | Status code: 500
❌ Request failed | season_type='Regular Season' | measure_type='Advanced' | Status code: 500
❌ Request failed | season_type='Regular Season' | measure_type='Four Factors' | Status code: 500
❌ Request failed | season_type='Regular Season' | measure_type='Misc' | Status code: 500
❌ Request failed | season_type='Regular Season' | measure_type='Scoring' | Status code: 500
❌ Request failed | season_type='Regular Season' | measure_type='Opponent' | Status code: 500

#### Status code 500 comes from the requests.status_code, indicating that there might be an issue with our headers, such that the NBA site is blocking it. 

#### **As it turns out, the headers were not the issue, but instead, the NBA Stats API page required that all params in the request URL be defined, even if 0, N or empty.**

#### 1.3 Tracking Stats 

The "Tracking" category measures the attempts, efficiency and relevant stats for that particular action or or outcome, such as "Drives", "Pullup Shooting", "Paint Touches", etc. 

sample API URL: https://stats.nba.com/stats/leaguedashptstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&Height=&ISTRound=&LastNGames=0&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PerMode=PerGame&PlayerExperience=&PlayerOrTeam=Team&PlayerPosition=&PtMeasureType=Drives&Season=2024-25&SeasonSegment=&SeasonType=Regular%20Season&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight=



Variable parameters are SeasonType and PtMeasureType

In [5]:
def tracking_stats():
    url = "https://stats.nba.com/stats/leaguedashptstats"
    base_params = {
        "College": "",
        "Conference": "",
        "Country":"",
        "DateFrom":"",
        "DateTo":"",
        "Division":"",
        "DraftPick":"",
        "DraftYear":"",
        "GameScope":"",
        "Height":"",
        "ISTRound":"",
        "LastNGames":"0",
        "LeagueID":"00",
        "Location":"",
        "Month":"0",
        "OpponentTeamID":"0",
        "Outcome":"",
        "PORound":"0",
        "PerMode":"PerGame",
        "PlayerExperience":"",
        "PlayerOrTeam":"Team",
        "PlayerPosition":"",
        "Season":"2024-25",
        "TeamID":"0",
    }
    measuretype = ['Drives','Defense','CatchShoot','Passing','Possessions','PullUpShot','Rebounding',
                   'Efficiency','SpeedDistance','ElbowTouch','PostTouch','PaintTouch']
    seasontype = ['Playoffs','Regular Season'] 
    #Possessions = Touches ##Rebounding, Offensive Rebounding and Defensive Rebounding share the same API URL, unsure how to separate them. 

    headers = {
        "Host": "stats.nba.com",
        "Connection": "keep-alive",
        "Accept": "application/json, text/plain, */*",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                    "AppleWebKit/537.36 (KHTML, like Gecko) "
                    "Chrome/138.0.0.0 Safari/537.36",
        "Referer": "https://www.nba.com/stats/teams/drives",
        "Origin": "https://www.nba.com",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "en-US,en;q=0.9"
    }

    playoff_tracking = []
    RS_tracking = []
    combined_playoff_tracking = pd.DataFrame
    combined_RS_tracking = pd.DataFrame
    for season_type in seasontype:
        for measure_type in measuretype:
            param = base_params.copy()
            param['PTMeasureType'] = measure_type
            param['SeasonType'] = season_type


            response = requests.get(url,params=param,headers=headers)

            if response.status_code == 200:
                data = response.json()
                rows = data['resultSets'][0]['rowSet']
                headers_ = data['resultSets'][0]['headers']
                
                df = pd.DataFrame(rows,columns = headers_)


                df = df.add_suffix(f"_{measure_type}")
                df = df.rename(columns = {f"TEAM_NAME_{measure_type}":"TEAM_NAME"})

                if season_type == "Playoffs":
                    if combined_playoff_tracking.empty:
                        combined_playoff_tracking = df
                    else:
                        df = df.drop(columns=[f"TEAM_ID_{measure_type}",f"TEAM_ABBREVIATION_{measure_type}"])

                        combined_playoff_tracking = pd.merge(combined_playoff_tracking,df,on = "TEAM_NAME",how = "outer")
                else:
                    if combined_RS_tracking.empty:
                        combined_RS_tracking = df
                    else:
                        df = df.drop(columns=[f"TEAM_ID_{measure_type}",f"TEAM_ABBREVIATION_{measure_type}"])

                        combined_RS_tracking = pd.merge(combined_RS_tracking,df,on = "TEAM_NAME",how = "outer")


            else:
                print(f"Print Failed, response status code {response.status_code}")

            time.sleep(1)

    return combined_playoff_tracking,combined_RS_tracking







In [None]:
tracking_playoffs,tracking_RS = tracking_stats()

tracking_playoffs.to_csv(r"C:\Users\calvin\Documents\python\NBA data analysis project\nba analysis csv files\nba_playoffs_tracking_2024-2025.csv",index = False)
tracking_RS.to_csv(r"C:\Users\calvin\Documents\python\NBA data analysis project\nba analysis csv files\nba_regular_season_tracking_2024-2025.csv",index = False)


#### 1.4 General Stats Version 1


https://stats.nba.com/stats/leaguedashteamstats?Conference=&DateFrom=&DateTo=&Division=&GameScope=&GameSegment=&Height=&ISTRound=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2024-25&SeasonSegment=&SeasonType=Playoffs&ShotClockRange=&StarterBench=&TeamID=0&TwoWay=0&VsConference=&VsDivision=

In [None]:
def seasons(start_year,current_year):
    #this function is to generate a list of seasons "2024-25","2023-24","2022-23", etc to pass into our main function
    #stats are tracked starting from 1996-97 season
    #current_year = "2025"
    base = int(start_year)
    num_years = int(current_year) - base 
    year_list = []
    for i in range(num_years):
        season = [f'{base+i}',f'{base+i+1}']
        sliced = season[0] + '-' + season[1][2:4]
        year_list.append(sliced)
    return year_list


def traditional_stats():
    url = "https://stats.nba.com/stats/leaguedashteamstats"
    base_params = {
        "Conference": "",
        "DateFrom":"",
        "DateTo":"",
        "Division":"",
        "GameScope":"",
        "GameSegment":"",
        "Height":"",
        "ISTRound":"",
        "LastNGames":"0",
        "LeagueID":"00",
        "Location":"",
        "Month":"0",
        "OpponentTeamID":"0",
        "Outcome":"",
        "PORound":"0",
        "PaceAdjust":"N",
        "PerMode":"PerGame",
        "Period":"0",
        "PlayerExperience":"",
        "PlayerPosition":"",
        "PlusMinus":"N",
        "Rank":"N",
        "Season":"2024-25",
        "SeasonSegment":"",
        "ShotClockRange":"",
        "StarterBench":"",
        "TeamID":"0",
        "TwoWay":"",
        "VsConference":"",
        "VsDivision":"",
    }
    measuretype = ['Base','Advanced','Four Factors','Misc','Scoring','Opponent','Defense',
                   'Violations']
    seasontype = ['Playoffs','Regular Season'] 
    cur_seasons = seasons('2010','2025')
    seasonsegment = ['Pre All-Star','Post All-Star']
    #Possessions = Touches ##Rebounding, Offensive Rebounding and Defensive Rebounding share the same API URL, unsure how to separate them. 

    headers = {
        "Host": "stats.nba.com",
        "Connection": "keep-alive",
        "Accept": "application/json, text/plain, */*",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                    "AppleWebKit/537.36 (KHTML, like Gecko) "
                    "Chrome/138.0.0.0 Safari/537.36",
        "Referer": "https://www.nba.com/stats/teams/traditional",
        "Origin": "https://www.nba.com",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "en-US,en;q=0.9"
    }

    print(cur_seasons)
    print(len(cur_seasons))
    playoff_list = []
    rs_list = []
    for season_var in cur_seasons: #looping through all seasons from the seasons function. 
        combined_playoff_df = pd.DataFrame() #reinitialising empty df for each season
        combined_RS_df = pd.DataFrame()
        for season_type in seasontype:
            for measure_type in measuretype:
                param = base_params.copy()
                param['MeasureType'] = measure_type #the column name MUST match the header in the API endpoint
                param['SeasonType'] = season_type
                param['Season'] = season_var

                response = requests.get(url,params=param,headers=headers) #utilising .get to combine url, params and headers to retrieve information

                if response.status_code == 200: #status code 200: http approved the request. other status code can look online
                    data = response.json() #to view this data, go to the api endpoint url > response to see the json data 
                    rows = data['resultSets'][0]['rowSet'] #from the json data, we extract the rows and columns 
                    headers_ = data['resultSets'][0]['headers']
                    
                    df = pd.DataFrame(rows,columns = headers_)
                    df = df.add_suffix(f"_{measure_type}") #adds a suffix to all headers in the dataframe > all headers will have _measuretype
                    df = df.rename(columns = {f"TEAM_NAME_{measure_type}":"TEAM_NAME"}) #maintain the teamname header as the common header, since we will merge on this header

                    if season_type == "Playoffs":
                        if combined_playoff_df.empty:
                            combined_playoff_df = df
                        else:
                            df = df.drop(columns=[f'TEAM_ID_{measure_type}',f'GP_{measure_type}',f'W_{measure_type}',f'L_{measure_type}',f'W_PCT_{measure_type}'])
                            #drop duplicate columns 
                            combined_playoff_df = pd.merge(combined_playoff_df,df,on = "TEAM_NAME",how = "outer")
                            
                    else:
                        if combined_RS_df.empty:
                            combined_RS_df = df
                        else:
                            df = df.drop(columns=[f'TEAM_ID_{measure_type}',f'GP_{measure_type}',f'W_{measure_type}',f'L_{measure_type}',f'W_PCT_{measure_type}'])
                            combined_RS_df = pd.merge(combined_RS_df,df,on = "TEAM_NAME",how = "outer")


                else:
                    print(f"Print Failed, response status code {response.status_code}")
            
                time.sleep(0.3)


            if not 'Season' in combined_playoff_df: #if season is not already a header, then insert
                combined_playoff_df.insert(0,'Season',season_var)
            if not 'Season' in combined_RS_df:
                combined_RS_df.insert(0,'Season',season_var)

        playoff_list.append(combined_playoff_df) #we append the newly created dataframes into a list, so that we can concat row-wise for each season. 
        rs_list.append(combined_RS_df)

    reverse_playoff = reversed(playoff_list) #reversing the list, as the list starts with the oldest season, but we usually want to start with the latest season
    reverse_rs = reversed(rs_list)
    final_playoff_DF = pd.concat(reverse_playoff,axis = 0,ignore_index = True) #concat method will join "elements" of the list, in this case the dataframes, row-wise
    final_RS_DF = pd.concat(reverse_rs,axis=0,ignore_index = True)
    
    return final_playoff_DF,final_RS_DF






In [70]:
playoff_df,RS_df = traditional_stats()



playoff_df.to_csv(r"C:\Users\calvin\Documents\python\NBA data analysis project\nba analysis csv files\nba_playoffs_traditional_2024-2025.csv",index = False)
RS_df.to_csv(r"C:\Users\calvin\Documents\python\NBA data analysis project\nba analysis csv files\nba_regular_season_traditional_2024-2025.csv",index = False)


['2010-11', '2011-12', '2012-13', '2013-14', '2014-15', '2015-16', '2016-17', '2017-18', '2018-19', '2019-20', '2020-21', '2021-22', '2022-23', '2023-24', '2024-25']
15


#### 1.5 General Stats for x seasons, split into 3 segments  



In [7]:
def seasons(start_year,current_year):
    #this function is to generate a list of seasons "2024-25","2023-24","2022-23", etc to pass into our main function
    #stats are tracked starting from 1996-97 season
    #current_year = "2025"
    base = int(start_year)
    num_years = int(current_year) - base 
    year_list = []
    for i in range(num_years):
        season = [f'{base+i}',f'{base+i+1}']
        sliced = season[0] + '-' + season[1][2:4]
        year_list.append(sliced)
    return year_list


def traditional_stats():
    url = "https://stats.nba.com/stats/leaguedashteamstats"
    base_params = {
        "Conference": "",
        "DateFrom":"",
        "DateTo":"",
        "Division":"",
        "GameScope":"",
        "GameSegment":"",
        "Height":"",
        "ISTRound":"",
        "LastNGames":"0",
        "LeagueID":"00",
        "Location":"",
        "Month":"0",
        "OpponentTeamID":"0",
        "Outcome":"",
        "PORound":"0",
        "PaceAdjust":"N",
        "PerMode":"PerGame",
        "Period":"0",
        "PlayerExperience":"",
        "PlayerPosition":"",
        "PlusMinus":"N",
        "Rank":"N",
        "Season":"2024-25",
        "ShotClockRange":"",
        "StarterBench":"",
        "TeamID":"0",
        "TwoWay":"",
        "VsConference":"",
        "VsDivision":"",
    }
    measuretype = ['Base','Advanced','Four Factors','Misc','Scoring','Opponent','Defense',
                   'Violations']
    cur_seasons = seasons('2010','2025')
    season_segment = ['Pre All-Star','Post All-Star','Playoffs']
    #Possessions = Touches ##Rebounding, Offensive Rebounding and Defensive Rebounding share the same API URL, unsure how to separate them. 

    headers = {
        "Host": "stats.nba.com",
        "Connection": "keep-alive",
        "Accept": "application/json, text/plain, */*",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                    "AppleWebKit/537.36 (KHTML, like Gecko) "
                    "Chrome/138.0.0.0 Safari/537.36",
        "Referer": "https://www.nba.com/stats/teams/traditional",
        "Origin": "https://www.nba.com",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "en-US,en;q=0.9"
    }

    data_list = []

    for season_var in cur_seasons: #looping through all seasons from the seasons function. 
        for segment in season_segment:
            combined = pd.DataFrame()

            if segment == "Playoffs":
                season_type = "Playoffs"
                segment = ""
            else:
                season_type = "Regular Season"


            for measure_type in measuretype:
                param = base_params.copy()
                # param['MeasureType'] = measure_type #the column name MUST match the header in the API endpoint
                # param['SeasonType'] = season_type
                # param['Season'] = season_var
                # param['SeasonSegment'] = segment

                #another way to do this is to use .update
                param.update({
                    'MeasureType' : measure_type,
                    'SeasonType' : season_type,
                    'Season' : season_var,
                    'SeasonSegment' : segment
                })

                response = requests.get(url,params=param,headers=headers) #utilising .get to combine url, params and headers to retrieve information

                if response.status_code == 200: #status code 200: http approved the request. other status code can look online
                    data = response.json() #to view this data, go to the api endpoint url > response to see the json data 
                    rows = data['resultSets'][0]['rowSet'] #from the json data, we extract the rows and columns 
                    headers_ = data['resultSets'][0]['headers']
                    
                    df = pd.DataFrame(rows,columns = headers_)
                    df = df.add_suffix(f"_{measure_type}") #adds a suffix to all headers in the dataframe > all headers will have _measuretype
                    df = df.rename(columns = {f"TEAM_NAME_{measure_type}":"TEAM_NAME"}) #maintain the teamname header as the common header, since we will merge on this header

                    if combined.empty:
                            combined = df
                    else:
                        df = df.drop(columns=[f'TEAM_ID_{measure_type}',f'GP_{measure_type}',f'W_{measure_type}',
                                              f'L_{measure_type}',f'W_PCT_{measure_type}',f'GP_RANK_{measure_type}',f'W_RANK_{measure_type}',
                                              f'L_RANK_{measure_type}',f'W_PCT_RANK_{measure_type}'])
                        #drop duplicate columns 
                        combined = pd.merge(combined,df,on = "TEAM_NAME",how = "outer")
                            


                else:
                    print(f"Print Failed, response status code {response.status_code}")
            
                time.sleep(0.3)


            if not 'Season' in combined: #if season is not already a header, then insert
                if segment:
                    combined.insert(0,'Season',season_var)
                    combined.insert(1,'Season Segment',segment)
                else:
                    combined.insert(0,'Season',season_var)
                    combined.insert(1,'Season Segment',"Playoff")


            data_list.append(combined) #we append the newly created dataframes into a list, so that we can concat row-wise for each season. 


    # reverse_playoff = reversed(playoff_list) #reversing the list, as the list starts with the oldest season, but we usually want to start with the latest season
    # reverse_rs = reversed(rs_list)
    reverse_list = reversed(data_list)
    final_DF = pd.concat(reverse_list,axis = 0,ignore_index = True) #concat method will join "elements" of the list, in this case the dataframes, row-wise
    
    return final_DF






In [8]:
final_df = traditional_stats()


final_df.to_csv(r"C:\Users\calvin\Documents\python\NBA data analysis project\nba analysis csv files\nba_general_stats_2024-2025.csv",index = False)


#### 2.0 Simple Trends/Prediction

Now, we have a complete dataset for the "general" stats, which consists of common basketball analytical stats that can determine the performance of a team throughout a given period. 

First, we will define our target columns, and drop any columns that may "leak" our data



In [None]:
# import re
# y_data = final_df['W_PCT_Base']

# for i in range(final_df.shape[1]):
#     if 'RANK' in 


NameError: name 'final_df' is not defined