# NBA Shot Data 4 :: NBA-API connector

## Trevor Rowland :: 2/2/2025

This notebook will take the cleaned NBA shot data and create a data source of team and player statistic aggregations. There is also an API package to connect to [basketball-reference.com](<basketball-reference.com>) available on PyPi that we will try to connect to for aggregations.

## 1. Importing Packages and Data

In [1]:
import pandas as pd
import polars as pl
import numpy as np

df = pd.read_pickle('/Users/dB/Documents/repos/github/bint-capstone/data-sources/nba/all-shots.pkl')
df = df.to_pandas(use_pyarrow_extension_array=True)

## 2. Attempting to Connect to Basketball-Reference

### 1.a. Importing Packages

In [1]:
import pandas as pd
import polars as pl
import numpy as np
from nba_api.stats.static import teams, players
from nba_api.stats.endpoints import (
    teamyearbyyearstats,
    leaguedashteamstats,
    leaguedashplayerstats,
    commonteamroster
)
from tqdm import tqdm
import time
import random

### 1.b. Team Data

`get_all_teams()`

Retrieve all NBA teams as a pandas DataFrame.
    
_Returns_:

pd.DataFrame: DataFrame of NBA teams with their details

In [4]:
def get_all_teams()->pd.DataFrame:
    teams_list = teams.get_teams()
    return pd.DataFrame(teams_list)

`get_team_ids()`

Extract active team IDs as a pandas Series.
    
_Returns_:

pd.Series: Series of active team IDs

In [5]:
def get_team_ids():
    return pd.Series([team['id'] for team in teams.get_teams()])

`collect_team_stats(start_year:int, end_year:int)`

Collect yearly team statistics.
    
Args:

start_year (int): Starting year for data collection

end_year (int): Ending year for data collection
    
_Returns_:

pd.DataFrame: Comprehensive team statistics across seasons


In [6]:
def collect_team_stats(start_year=2004, end_year=2024):
    """
    Collect yearly team statistics with robust error handling.
    
    Args:
    start_year (int): Starting year for data collection
    end_year (int): Ending year for data collection
    
    Returns:
    pd.DataFrame: Comprehensive team statistics across seasons
    """
    team_stats_list = []
    team_ids = get_team_ids()
    
    for team_id in tqdm(team_ids, desc="Collecting Team Stats"):
        for season in range(start_year, end_year + 1):
            try:
                # Convert year to NBA season format (e.g., 2020-21)
                season_str = f"{season}-{str(season+1)[-2:]}"
                
                # Collect team stats
                team_stats = leaguedashteamstats.LeagueDashTeamStats(
                    season=season_str
                )
                
                # Directly convert to DataFrame
                df = team_stats.get_data_frames()[0]
                
                # Add team_id and season columns
                df['TEAM_ID'] = team_id
                df['SEASON'] = season_str
                
                team_stats_list.append(df)
                
                # Randomized rate limiting to avoid predictable patterns
                time.sleep(random.uniform(1.5, 3.5))
            
            except Exception as e:
                print(f"Error collecting stats for team {team_id} in season {season_str}: {e}")
                # Wait longer on failure with some randomness
                time.sleep(random.uniform(4, 7))
                continue
    
    # Combine all team stats into a single DataFrame
    if team_stats_list:
        final_df = pd.concat(team_stats_list, ignore_index=True)
        
        # Clean column names
        final_df.columns = [col.lower().replace(' ', '_') for col in final_df.columns]
        
        return final_df
    else:
        print("No team stats collected. Check network or API issues.")
        return pd.DataFrame()

`get_team_roster(team_id:int, season:str)`

Retrieve team roster for a specific season.
    
Args:

team_id (int): NBA team ID

season (str): NBA season in format 'YYYY-YY'
    
Returns:

pd.DataFrame: Team roster details

In [7]:
def get_team_roster(team_id, season):
    try:
        roster = commonteamroster.CommonTeamRoster(team_id=team_id, season=season)
        
        # Get DataFrame directly and clean column names
        df = roster.get_data_frames()[0]
        df.columns = [col.lower().replace(' ', '_') for col in df.columns]
        
        # Add team_id and season columns
        df['team_id'] = team_id
        df['season'] = season
        
        return df
    except Exception as e:
        print(f"Error collecting roster for team {team_id} in season {season}: {e}")
        return pd.DataFrame()

### 1.c. Player Data

`collect_player_stats(start_year:int, end_year:int)`
    
Collect comprehensive player statistics.
    
Args:
    
start_year (int): Starting year for data collection
    
end_year (int): Ending year for data collection
    
    
Returns:
    
pd.DataFrame: Comprehensive player statistics across seasons


In [8]:
def collect_player_stats(start_year=2004, end_year=2024):
    """
    Collect comprehensive player statistics with robust error handling.
    
    Args:
    start_year (int): Starting year for data collection
    end_year (int): Ending year for data collection
    
    Returns:
    pd.DataFrame: Comprehensive player statistics across seasons
    """
    player_stats_list = []
    
    for season in tqdm(range(start_year, end_year + 1), desc="Collecting Player Stats"):
        try:
            # Convert year to NBA season format (e.g., 2020-21)
            season_str = f"{season}-{str(season+1)[-2:]}"
            
            # Collect player stats for the season
            player_stats = leaguedashplayerstats.LeagueDashPlayerStats(
                season=season_str
            )
            
            # Get DataFrame directly
            df = player_stats.get_data_frames()[0]
            
            # Add season column and clean column names
            df['SEASON'] = season_str
            df.columns = [col.lower().replace(' ', '_') for col in df.columns]
            
            player_stats_list.append(df)
            
            # Randomized rate limiting
            time.sleep(random.uniform(1.5, 3.5))
        
        except Exception as e:
            print(f"Error collecting player stats for season {season_str}: {e}")
            # Wait longer on failure with some randomness
            time.sleep(random.uniform(4, 7))
            continue
    
    # Combine all player stats into a single DataFrame
    if player_stats_list:
        final_df = pd.concat(player_stats_list, ignore_index=True)
        return final_df
    else:
        print("No player stats collected. Check network or API issues.")
        return pd.DataFrame()

### 1.d. Testing

In [9]:
# Collect team stats
data_dir = '/Users/dB/Documents/repos/github/bint-capstone/data-sources/nba'
print("Talking to the API...")
print("Collecting Team Statistics...")
team_stats = collect_team_stats()
    
# Collect player stats
print("Collecting Player Statistics...")
player_stats = collect_player_stats()

print("Data pulled from API.")

Talking to the API...
Collecting Team Statistics...


Collecting Team Stats: 100%|██████████| 30/30 [26:44<00:00, 53.49s/it]


Collecting Player Statistics...


Collecting Player Stats: 100%|██████████| 21/21 [01:19<00:00,  3.80s/it]

Data pulled from API.





## 2. Writing to CSV and PKL

These files will be saved to the OneDrive

In [11]:
print('Writing Player Stats to Folder')
team_stats.to_csv(f'{data_dir}/nba_team_stats_2004_2024.csv')
team_stats.to_pickle(f'{data_dir}/nba_team_stats_2004_2024.pkl')

print('Writing Player Stats to Folder')
player_stats.to_csv(f'{data_dir}/nba_player_stats_2004_2024.csv')
player_stats.to_pickle(f'{data_dir}/nba_player_stats_2004_2024.pkl')

Writing Player Stats to Folder
Writing Player Stats to Folder


Now we have player and team stats from 2004 to 2024. The following notebook will perform an EDA on the collected API data