# NBA Data :: Getting Datasets (Again...)

This notebook contains functions to get Team, Player and Game Data from the `nba_api` package.

# 1. Importing Packages

The following packages can be installed using `pip install pkg1 pkg2 ... pkgN`.

In [47]:
import nba_api
import numpy as np
import pandas as pd
import polars as pl
import seaborn as sns
import matplotlib.pyplot as plt
import requests as req
from tqdm import tqdm
import time

We will also need the following headers to connect to the NBA API.

In [29]:
headers = {
        "Host": "stats.nba.com",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0",
        "Accept": "application/json, text/plain, */*",
        "Accept-Language": "en-US,en;q=0.5",
        "Accept-Encoding": "gzip, deflate, br",
        "x-nba-stats-origin": "stats",
        "x-nba-stats-token": "true",
        "Connection": "keep-alive",
        "Referer": "https://stats.nba.com/",
        "Pragma": "no-cache",
        "Cache-Control": "no-cache",
    }


# 2. NBA API Data Outline

The data needed are games, teams and players. The data will be collected from swar's `nba_api` [package](<https://github.com/swar/nba_api>)

### Team Data

Team data should be a dataframe of each NBA team's season average statistics for a given season, a column `made_playoffs` that is 1 when a team made the playoffs that season, and 0 when a team did not make the playoffs.

For the Team Data, we must first collect the teams in the NBA.

In [5]:
from nba_api.stats.static import teams

t = teams.get_teams()
t[:1]

[{'id': 1610612737,
  'full_name': 'Atlanta Hawks',
  'abbreviation': 'ATL',
  'nickname': 'Hawks',
  'city': 'Atlanta',
  'state': 'Georgia',
  'year_founded': 1949}]

In [6]:
len(t)

30

In [24]:
# Get all league ids
ids = []
t[0]['id']
for idx in t:
    ids.append(idx['id'])

ids

[1610612737,
 1610612738,
 1610612739,
 1610612740,
 1610612741,
 1610612742,
 1610612743,
 1610612744,
 1610612745,
 1610612746,
 1610612747,
 1610612748,
 1610612749,
 1610612750,
 1610612751,
 1610612752,
 1610612753,
 1610612754,
 1610612755,
 1610612756,
 1610612757,
 1610612758,
 1610612759,
 1610612760,
 1610612761,
 1610612762,
 1610612763,
 1610612764,
 1610612765,
 1610612766]

Once the NBA Team Data has been accessed from the API, we can process the data to turn it into a DataFrame.

In [48]:
from nba_api.stats.endpoints import leaguedashteamstats

all_team_data = {}

measure_type_detailed_defense = ['Base', 'Advanced']
per_mode = 'Totals'

for i in tqdm(range(20)):
    # Create the str for each season (2004-2024)
    season = f'{2004+i}-{2+i:02d}'

    # try-catch block to attempt to pull team stats
    try:
        team_data = leaguedashteamstats.LeagueDashTeamStats(season=season).get_data_frames()

        all_team_data[season] = team_data
    except:
        time.sleep(20)
    else:
        break
all_team_data


 40%|████      | 8/20 [06:40<10:00, 50.07s/it]


{'2012-10': Empty DataFrame
 Columns: [TEAM_ID, TEAM_NAME, GP, W, L, W_PCT, MIN, FGM, FGA, FG_PCT, FG3M, FG3A, FG3_PCT, FTM, FTA, FT_PCT, OREB, DREB, REB, AST, TOV, STL, BLK, BLKA, PF, PFD, PTS, PLUS_MINUS, GP_RANK, W_RANK, L_RANK, W_PCT_RANK, MIN_RANK, FGM_RANK, FGA_RANK, FG_PCT_RANK, FG3M_RANK, FG3A_RANK, FG3_PCT_RANK, FTM_RANK, FTA_RANK, FT_PCT_RANK, OREB_RANK, DREB_RANK, REB_RANK, AST_RANK, TOV_RANK, STL_RANK, BLK_RANK, BLKA_RANK, PF_RANK, PFD_RANK, PTS_RANK, PLUS_MINUS_RANK]
 Index: []
 
 [0 rows x 54 columns]}

### Player Data

Player Data should be a dataframe of each NBA Player's season average statistics for a given season, with columns describing their position, whether they made the All-Star Game and their team ID (to link the data with the teams dataset).

### Game Data

Game Data should be a dataframe of every NBA game that took place in a given season or seasons. The games should have the teams that played, the home/away team statistics, the team IDs, and whether the game was a playoff game or not.