- [1. Importing Packages](#1)
- [2. Functions for Extracting Data](#2)
- [3. Assembling Dataset](#3)
    - [3.1 Merging Data](#3_1)
- [4. Pipeline Building](#4)
    - [4.1 Extracting Current Data](#4_1)
    - [4.2 Merging For All Seasons](#4_2)
    - [4.3 Combing Cumulative and Rolling](#4_3)
- [Extra. Getting Data for Today](#extra)

## 1. Importing Packages <a id='1'></a>

In [8]:
from nba_api.stats.endpoints import leaguegamefinder
from nba_api.stats.endpoints import playergamelog, leaguedashteamstats, teamyearbyyearstats
from nba_api.stats.library.parameters import SeasonAll
from nba_api.stats.static import players, teams
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import datetime
import time

## 2. Functions for Extracting Data <a id='2'></a>

#### Structures for Data:

Types of Data (Merged Together):
- **Base**: Basic game statistics such as points, rebounds, assists, etc.
- **Advanced**: More complex metrics that may include efficiency ratings, true shooting percentage, player impact estimate, etc.
- **Misc**: Miscellaneous statistics that don't fit neatly into the traditional or advanced categories. This could include things like points off turnovers, second chance points, bench points, etc.
- **Four Factors**: A concept from Dean Oliver's "Basketball on Paper" book. These are four statistical categories that he identifies as key to basketball success: shooting (effective field goal percentage), turnovers (turnover rate), rebounding (rebound rate), and free throws (free throw rate).
- **Scoring**: Statistics related to scoring, possibly detailing shooting efficiency from various parts of the court.
- **Opponent**: Stats about the performance of opponents, potentially useful for assessing defensive effectiveness.
- **Usage**: Metrics that indicate how involved a player or team is in various aspects of the game, like usage rate.
- **Defense**: Detailed defensive metrics, possibly including opponent shooting percentages at different distances, blocks, steals, etc.

Ranges:
- We get the cumulative season stats up to each game day
- Goal is to be able to merge these cumulative season stats with the game day data to predict next game outcome

#### Functions

In [9]:
!pwd

/Users/benjamincheng/Documents/GitHub/Sports-Betting/nba_api


In [10]:
date_str = datetime.datetime.strptime('2018-12-22', '%Y-%m-%d').strftime('%m/%d/%Y')
date_str_from = (datetime.datetime.strptime('2018-12-22', '%Y-%m-%d')-datetime.timedelta(days=23)).strftime('%m/%d/%Y')
daily_stats = leaguedashteamstats.LeagueDashTeamStats(
                measure_type_detailed_defense="Base",
                season="2018-19",
                season_type_all_star="Regular Season",
                date_from_nullable=date_str_from,
                date_to_nullable=date_str
            ).get_data_frames()[0]

In [11]:
def fetch_daily_data_cumulative(season='2020-21', season_type='Regular Season', data_type='Base', delay=5):
    """Naive function for getting a particular type of data"""
    # Define the season start and end dates. Adjust these based on the actual season dates.
    year_start = season[:4]
    year_end = str(int(year_start)+1)
    # season_start = datetime.datetime.strptime(f'{year_start}-10-19', '%Y-%m-%d')
    track_start = datetime.datetime.strptime(f'{year_start}-12-22', '%Y-%m-%d')
    season_end = datetime.datetime.strptime(f'{year_end}-05-16', '%Y-%m-%d')
    
    current_date = track_start
    all_data = []

    unsuccessful_dates = []
    
    while current_date <= season_end:
        date_str = current_date.strftime('%m/%d/%Y')
        
        try:
            daily_stats = leaguedashteamstats.LeagueDashTeamStats(
                measure_type_detailed_defense=data_type,
                season=season,
                season_type_all_star=season_type,
                date_to_nullable=date_str
            ).get_data_frames()[0]
            daily_stats['Date'] = date_str
            all_data.append(daily_stats)
            print(f"Data fetched for {date_str}")

        except Exception as e:
            unsuccessful_dates += [date_str]
            print(f"Error fetching data for {date_str}: {e}")
        
        # Delay before making the next request
        time.sleep(delay)
        
        # Move to the next day
        current_date += datetime.timedelta(days=1)
        
    full_season_data = pd.concat(all_data, ignore_index=True)
    return full_season_data, unsuccessful_dates
    
def fetch_daily_data_cumulative_with_dates(track_start_, season_end_, season_type='Regular Season', data_type='Base', delay=5):
    """Getting cumulative data for a range of days for one particular type of data"""
    track_start = datetime.datetime.strptime(track_start_, '%Y-%m-%d')
    season_end = datetime.datetime.strptime(season_end_, '%Y-%m-%d')
    
    year_start = track_start_[:4]
    year_end = season_end_[:4]
    season_ = year_start + "-" + year_end[2:]
    
    current_date = track_start
    all_data = []

    unsuccessful_dates = []
    
    
    while current_date <= season_end:
        date_str = current_date.strftime('%m/%d/%Y')
        
        try:
            daily_stats = leaguedashteamstats.LeagueDashTeamStats(
                measure_type_detailed_defense=data_type,
                season=season_,
                season_type_all_star=season_type,
                date_to_nullable=date_str
            ).get_data_frames()[0]
            daily_stats['Date'] = date_str
            all_data.append(daily_stats)
            print(f"Data fetched for {date_str}")

        except Exception as e:
            unsuccessful_dates += [date_str]
            print(f"Error fetching data for {date_str}: {e}")
        
        # Delay before making the next request
        time.sleep(delay)
        
        # Move to the next day
        current_date += datetime.timedelta(days=1)
        
    full_season_data = pd.concat(all_data, ignore_index=True)
    return full_season_data, unsuccessful_dates

def fetch_daily_data_cumulative_with_dates_rolling(track_start_, season_end_, season_type='Regular Season', data_type='Base', delay=5, p_n_days=23):
    """Getting rolling data for a range of days for one particular type of data"""
    current_date = datetime.datetime.strptime(track_start_, '%Y-%m-%d')+datetime.timedelta(days=p_n_days)
    season_end = datetime.datetime.strptime(season_end_, '%Y-%m-%d')
    
    year_start = track_start_[:4]
    year_end = season_end_[:4]
    season_ = year_start + "-" + year_end[2:]
    
    all_data = []

    unsuccessful_dates = []
    
    
    while current_date <= season_end:
        date_str = current_date.strftime('%m/%d/%Y')
        date_str_from = (current_date-datetime.timedelta(days=p_n_days)).strftime('%m/%d/%Y')
        try:
            daily_stats = leaguedashteamstats.LeagueDashTeamStats(
                measure_type_detailed_defense=data_type,
                season=season_,
                season_type_all_star=season_type,
                date_from_nullable=date_str_from,
                date_to_nullable=date_str
            ).get_data_frames()[0]
            daily_stats['Date'] = date_str
            all_data.append(daily_stats)
            print(f"Data fetched for {date_str}")

        except Exception as e:
            unsuccessful_dates += [date_str]
            print(f"Error fetching data for {date_str}: {e}")
        
        # Delay before making the next request
        time.sleep(delay)
        
        # Move to the next day
        current_date += datetime.timedelta(days=1)
        
    full_season_data = pd.concat(all_data, ignore_index=True)
    return full_season_data, unsuccessful_dates

def fetch_daily_data_cumulative_with_dates_rolling_covid(track_start_, season_end_, season_type='Regular Season', data_type='Base', delay=5, p_n_days=23):
    """Getting rolling data for a range of days for one particular type of data during the covid season"""
    current_start_date = datetime.datetime.strptime('2020-03-11', '%Y-%m-%d')-datetime.timedelta(days=p_n_days)
    current_date = datetime.datetime.strptime(track_start_, '%Y-%m-%d')
    season_end = datetime.datetime.strptime(season_end_, '%Y-%m-%d')
    
    year_start = '2019'
    year_end = season_end_[:4]
    season_ = year_start + "-" + year_end[2:]
    
    all_data = []
    unsuccessful_dates = []
    while current_date <= season_end:
        date_str = current_date.strftime('%m/%d/%Y')
        date_str_from = current_start_date.strftime('%m/%d/%Y')
        try:
            daily_stats = leaguedashteamstats.LeagueDashTeamStats(
                measure_type_detailed_defense=data_type,
                season=season_,
                season_type_all_star=season_type,
                date_from_nullable=date_str_from,
                date_to_nullable=date_str
            ).get_data_frames()[0]
            daily_stats['Date'] = date_str
            all_data.append(daily_stats)
            print(f"Data fetched for {date_str}")

        except Exception as e:
            unsuccessful_dates += [date_str]
            print(f"Error fetching data for {date_str}: {e}")
        
        # Delay before making the next request
        time.sleep(delay)
        
        # Move to the next day
        current_date += datetime.timedelta(days=1)
        current_start_date += datetime.timedelta(days=1)
        
    full_season_data = pd.concat(all_data, ignore_index=True)
    return full_season_data, unsuccessful_dates

def getting_stats(track_start, season_end, delay=5):
    """Getting cumulative season stats for each game"""
    
    base, unsuccessful_dates_base = fetch_daily_data_cumulative_with_dates(track_start, season_end, data_type='Base', delay=delay)
    advanced, unsuccessful_dates_advanced = fetch_daily_data_cumulative_with_dates(track_start, season_end, data_type='Advanced', delay=delay)
    misc, unsuccessful_dates_misc = fetch_daily_data_cumulative_with_dates(track_start, season_end, data_type='Misc', delay=delay)
    four_factors, unsuccessful_dates_four_factors = fetch_daily_data_cumulative_with_dates(track_start, season_end, data_type='Four Factors', delay=delay)
    scoring, unsuccessful_dates_scoring = fetch_daily_data_cumulative_with_dates(track_start, season_end, data_type='Scoring', delay=delay)
    opponent, unsuccessful_dates_opponent = fetch_daily_data_cumulative_with_dates(track_start, season_end, data_type='Opponent', delay=delay)
    defense, unsuccessful_dates_defense = fetch_daily_data_cumulative_with_dates(track_start, season_end, data_type='Defense', delay=delay)

    year_start = track_start[:4]
    year_end = season_end[:4]
    season_ = year_start + "_" + year_end[2:]
    season = year_start + "_" + year_end

    datas = [base, advanced, misc, four_factors, scoring, opponent, defense]
    
    unsuccessful_dates_lst = [unsuccessful_dates_base, unsuccessful_dates_advanced, unsuccessful_dates_misc, unsuccessful_dates_four_factors, unsuccessful_dates_scoring, unsuccessful_dates_opponent, unsuccessful_dates_defense]
    datas_names = ["base", "advanced", "misc", "four_factors",
               "scoring", "opponent", "defense"]
    paths_ = [f'/Users/benjamincheng/Documents/GitHub/Sports-Betting/nba_api/data/teams_stats/{season}/{i}_{season}.csv' for i in datas_names]
    [datas[i].to_csv(paths_[i], index=False) for i in range(len(datas))]
    return datas, unsuccessful_dates_lst

def getting_stats_rolling(track_start, season_end, delay=5, past_n_days=23):
    """ For rolling data collection"""
    base, unsuccessful_dates_base = fetch_daily_data_cumulative_with_dates_rolling(track_start, season_end, data_type='Base', delay=delay, p_n_days=past_n_days)
    advanced, unsuccessful_dates_advanced = fetch_daily_data_cumulative_with_dates_rolling(track_start, season_end, data_type='Advanced', delay=delay, p_n_days=past_n_days)
    misc, unsuccessful_dates_misc = fetch_daily_data_cumulative_with_dates_rolling(track_start, season_end, data_type='Misc', delay=delay, p_n_days=past_n_days)
    four_factors, unsuccessful_dates_four_factors = fetch_daily_data_cumulative_with_dates_rolling(track_start, season_end, data_type='Four Factors', delay=delay, p_n_days=past_n_days)
    scoring, unsuccessful_dates_scoring = fetch_daily_data_cumulative_with_dates_rolling(track_start, season_end, data_type='Scoring', delay=delay, p_n_days=past_n_days)
    opponent, unsuccessful_dates_opponent = fetch_daily_data_cumulative_with_dates_rolling(track_start, season_end, data_type='Opponent', delay=delay, p_n_days=past_n_days)
    defense, unsuccessful_dates_defense = fetch_daily_data_cumulative_with_dates_rolling(track_start, season_end, data_type='Defense', delay=delay, p_n_days=past_n_days)

    year_start = track_start[:4]
    year_end = season_end[:4]
    season_ = year_start + "_" + year_end[2:]
    season = year_start + "_" + year_end

    datas = [base, advanced, misc, four_factors, scoring, opponent, defense]
    
    unsuccessful_dates_lst = [unsuccessful_dates_base, unsuccessful_dates_advanced, unsuccessful_dates_misc, unsuccessful_dates_four_factors, unsuccessful_dates_scoring, unsuccessful_dates_opponent, unsuccessful_dates_defense]
    datas_names = ["base", "advanced", "misc", "four_factors",
               "scoring", "opponent", "defense"]
    paths_ = [f'/Users/benjamincheng/Documents/GitHub/Sports-Betting/nba_api/data/teams_stats/{season}/{i}_{season}_rolling_{past_n_days}.csv' for i in datas_names]
    for i in range(len(datas)):
        datas[i].to_csv(paths_[i], index=False)
#    [datas[i].to_csv(paths_[i], index=False) for i in range(len(datas))]
    return datas, unsuccessful_dates_lst

def getting_stats_rolling_covid(track_start, season_end, delay=5, past_n_days=23):
    """For covid season rolling data"""
    base, unsuccessful_dates_base = fetch_daily_data_cumulative_with_dates_rolling_covid(track_start, season_end, data_type='Base', delay=delay, p_n_days=past_n_days)
    advanced, unsuccessful_dates_advanced = fetch_daily_data_cumulative_with_dates_rolling_covid(track_start, season_end, data_type='Advanced', delay=delay, p_n_days=past_n_days)
    misc, unsuccessful_dates_misc = fetch_daily_data_cumulative_with_dates_rolling_covid(track_start, season_end, data_type='Misc', delay=delay, p_n_days=past_n_days)
    four_factors, unsuccessful_dates_four_factors = fetch_daily_data_cumulative_with_dates_rolling_covid(track_start, season_end, data_type='Four Factors', delay=delay, p_n_days=past_n_days)
    scoring, unsuccessful_dates_scoring = fetch_daily_data_cumulative_with_dates_rolling_covid(track_start, season_end, data_type='Scoring', delay=delay, p_n_days=past_n_days)
    opponent, unsuccessful_dates_opponent = fetch_daily_data_cumulative_with_dates_rolling_covid(track_start, season_end, data_type='Opponent', delay=delay, p_n_days=past_n_days)
    defense, unsuccessful_dates_defense = fetch_daily_data_cumulative_with_dates_rolling_covid(track_start, season_end, data_type='Defense', delay=delay, p_n_days=past_n_days)

    year_start = '2019'
    year_end = season_end[:4]
    season_ = year_start + "_" + year_end[2:]
    season = year_start + "_" + year_end

    datas = [base, advanced, misc, four_factors, scoring, opponent, defense]
    
    unsuccessful_dates_lst = [unsuccessful_dates_base, unsuccessful_dates_advanced, unsuccessful_dates_misc, unsuccessful_dates_four_factors, unsuccessful_dates_scoring, unsuccessful_dates_opponent, unsuccessful_dates_defense]
    datas_names = ["base", "advanced", "misc", "four_factors",
               "scoring", "opponent", "defense"]
    paths_ = [f'/Users/benjamincheng/Documents/GitHub/Sports-Betting/nba_api/data/teams_stats/{season}/{i}_{season}_rolling_{past_n_days}_covid.csv' for i in datas_names]
    [datas[i].to_csv(paths_[i], index=False) for i in range(len(datas))]
    return datas, unsuccessful_dates_lst

def merge_with_suffixes(dataframes, names, keys):
    """Merging different sets of data of the same season together"""
    suffixed_dfs = []
    for df, name in zip(dataframes, names):
        # Suffix non-key columns only
        suffixed_cols = {col: f"{col}_{name}" if col not in keys else col for col in df.columns}
        suffixed_dfs.append(df.rename(columns=suffixed_cols))

    merged_df = suffixed_dfs[0]
    for df in suffixed_dfs[1:]:
        merged_df = pd.merge(merged_df, df, on=keys, how='inner')
    return merged_df
    
def merge_for_season(season='2018_2019'):
    """Merge all types of data together for a single season"""
    base_path = f"/Users/benjamincheng/Documents/GitHub/Sports-Betting/nba_api/data/teams_stats/{season}/base_{season}.csv"
    advanced_path = f"/Users/benjamincheng/Documents/GitHub/Sports-Betting/nba_api/data/teams_stats/{season}/advanced_{season}.csv"
    misc_path = f"/Users/benjamincheng/Documents/GitHub/Sports-Betting/nba_api/data/teams_stats/{season}/misc_{season}.csv"
    four_factors_path = f"/Users/benjamincheng/Documents/GitHub/Sports-Betting/nba_api/data/teams_stats/{season}/four_factors_{season}.csv"
    scoring_path = f"/Users/benjamincheng/Documents/GitHub/Sports-Betting/nba_api/data/teams_stats/{season}/scoring_{season}.csv"
    opponent_path = f"/Users/benjamincheng/Documents/GitHub/Sports-Betting/nba_api/data/teams_stats/{season}/opponent_{season}.csv"
    defense_path = f"/Users/benjamincheng/Documents/GitHub/Sports-Betting/nba_api/data/teams_stats/{season}/defense_{season}.csv"
    
    base = pd.read_csv(base_path, parse_dates=['Date'])
    advanced = pd.read_csv(advanced_path, parse_dates=['Date'])
    misc = pd.read_csv(misc_path, parse_dates=['Date'])
    four_factors = pd.read_csv(four_factors_path, parse_dates=['Date'])
    scoring = pd.read_csv(scoring_path, parse_dates=['Date'])
    opponent = pd.read_csv(opponent_path, parse_dates=['Date'])
    defense = pd.read_csv(defense_path, parse_dates=['Date'])
    
    datas = [base, advanced, misc, four_factors, scoring, opponent, defense]
    datas_names = ["base", "advanced", "misc", "four_factors",
                   "scoring", "opponent", "defense"]
    columns_to_exclude = ['TEAM_NAME', 'GP', 'W', 'L', 'W_PCT', 'MIN']
    others = [advanced, misc, four_factors, scoring, opponent, defense]
    others = [i[i.columns[~i.columns.isin(columns_to_exclude)]] for i in others]
    datas = [base] + others
    
    merge_keys = ['Date', 'TEAM_ID']
    merged_df = merge_with_suffixes(datas, datas_names, merge_keys)    
    merged_path = f"/Users/benjamincheng/Documents/GitHub/Sports-Betting/nba_api/data/teams_stats/{season}/cumulative_season_stats_{season}.csv"
    merged_df.to_csv(merged_path, index=False)
    return merged_df

def merge_for_season_rolling(season='2018_2019', past_n_days=23):
    """Merge all types of rolling data together for a single season"""
    base_path = f"/Users/benjamincheng/Documents/GitHub/Sports-Betting/nba_api/data/teams_stats/{season}/base_{season}_rolling_{past_n_days}.csv"
    advanced_path = f"/Users/benjamincheng/Documents/GitHub/Sports-Betting/nba_api/data/teams_stats/{season}/advanced_{season}_rolling_{past_n_days}.csv"
    misc_path = f"/Users/benjamincheng/Documents/GitHub/Sports-Betting/nba_api/data/teams_stats/{season}/misc_{season}_rolling_{past_n_days}.csv"
    four_factors_path = f"/Users/benjamincheng/Documents/GitHub/Sports-Betting/nba_api/data/teams_stats/{season}/four_factors_{season}_rolling_{past_n_days}.csv"
    scoring_path = f"/Users/benjamincheng/Documents/GitHub/Sports-Betting/nba_api/data/teams_stats/{season}/scoring_{season}_rolling_{past_n_days}.csv"
    opponent_path = f"/Users/benjamincheng/Documents/GitHub/Sports-Betting/nba_api/data/teams_stats/{season}/opponent_{season}_rolling_{past_n_days}.csv"
    defense_path = f"/Users/benjamincheng/Documents/GitHub/Sports-Betting/nba_api/data/teams_stats/{season}/defense_{season}_rolling_{past_n_days}.csv"
    
    base = pd.read_csv(base_path, parse_dates=['Date'])
    advanced = pd.read_csv(advanced_path, parse_dates=['Date'])
    misc = pd.read_csv(misc_path, parse_dates=['Date'])
    four_factors = pd.read_csv(four_factors_path, parse_dates=['Date'])
    scoring = pd.read_csv(scoring_path, parse_dates=['Date'])
    opponent = pd.read_csv(opponent_path, parse_dates=['Date'])
    defense = pd.read_csv(defense_path, parse_dates=['Date'])
    
    datas = [base, advanced, misc, four_factors, scoring, opponent, defense]
    datas_names = ["base", "advanced", "misc", "four_factors",
                   "scoring", "opponent", "defense"]
    columns_to_exclude = ['TEAM_NAME', 'GP', 'W', 'L', 'W_PCT', 'MIN']
    others = [advanced, misc, four_factors, scoring, opponent, defense]
    others = [i[i.columns[~i.columns.isin(columns_to_exclude)]] for i in others]
    datas = [base] + others
    
    merge_keys = ['Date', 'TEAM_ID']
    merged_df = merge_with_suffixes(datas, datas_names, merge_keys)    
    merged_path = f"/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/{season}/rolling_{past_n_days}_season_stats_{season}.csv"
    merged_df.to_csv(merged_path, index=False)
    return merged_df
    
def merge_for_cum_and_rolling(season='2018_2019', past_n_days=23):
    """Combine cumulative seasonal data and rolling data together for a single season"""
    cum_path = f"/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/{season}/cumulative_season_stats_{season}.csv"
    roll_path = f"/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/{season}/rolling_{past_n_days}_season_stats_{season}.csv"

    cum = pd.read_csv(cum_path, parse_dates=['Date'])
    roll = pd.read_csv(roll_path, parse_dates=['Date'])
    
    
    merge_keys = ['Date', 'TEAM_ID']
    merged_df = pd.merge(cum, roll, on=merge_keys, how='left')
    merged_path = f"/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/training_data/{season}/cum_and_rolling_{season}.csv"
    merged_df.to_csv(merged_path, index=False)
    return merged_df

def extracting_today_data(from_date=None, season='2023-24', season_type='Regular Season', data_type='Base', delay=5):
    year_start = season[:4]
    year_end = str(int(year_start)+1)
    season_ = year_start + "-" + year_end[2:]
    season = year_start + "_" + year_end
    if from_date == None:
        path_today = f"/Users/benjamincheng/Documents/GitHub/Sports-Betting/nba_api/data/teams_stats/{season}/base_{season}.csv"
        from_date = pd.read_csv(path_today, parse_dates=['Date'])['Date'].max().date()

    season_end = datetime.date.today()
    
    current_date = from_date
    all_data = []

    unsuccessful_dates = []
    
    while current_date <= season_end:
        date_str = current_date.strftime('%m/%d/%Y')
        
        try:
            daily_stats = leaguedashteamstats.LeagueDashTeamStats(
                measure_type_detailed_defense=data_type,
                season=season_,
                season_type_all_star=season_type,
                date_to_nullable=date_str
            ).get_data_frames()[0]
            daily_stats['Date'] = date_str
            all_data.append(daily_stats)
            print(f"Data fetched for {date_str}")
        except Exception as e:
            unsuccessful_dates += [date_str]
            print(f"Error fetching data for {date_str}: {e}")
        time.sleep(delay)
        current_date += datetime.timedelta(days=1)
        
    full_season_data = pd.concat(all_data, ignore_index=True)
    return full_season_data, unsuccessful_dates

def getting_stats(delay=5):
    """Getting cumulative season stats for each game"""
    track_start='2023'
    season_end='2024'
    
    base, unsuccessful_dates_base = extracting_today_data(data_type='Base', delay=delay)
    advanced, unsuccessful_dates_advanced = extracting_today_data(data_type='Advanced', delay=delay)
    misc, unsuccessful_dates_misc = extracting_today_data(data_type='Misc', delay=delay)
    four_factors, unsuccessful_dates_four_factors = extracting_today_data(data_type='Four Factors', delay=delay)
    scoring, unsuccessful_dates_scoring = extracting_today_data(data_type='Scoring', delay=delay)
    opponent, unsuccessful_dates_opponent = extracting_today_data(data_type='Opponent', delay=delay)
    defense, unsuccessful_dates_defense = extracting_today_data(data_type='Defense', delay=delay)

    year_start = track_start[:4]
    year_end = season_end[:4]
    season_ = year_start + "_" + year_end[2:]
    season = year_start + "_" + year_end

    datas = [base, advanced, misc, four_factors, scoring, opponent, defense]
    unsuccessful_dates_lst = [unsuccessful_dates_base, unsuccessful_dates_advanced, unsuccessful_dates_misc, unsuccessful_dates_four_factors, unsuccessful_dates_scoring, unsuccessful_dates_opponent, unsuccessful_dates_defense]
    datas_names = ["base", "advanced", "misc", "four_factors",
               "scoring", "opponent", "defense"]
    columns_to_exclude = ['TEAM_NAME', 'GP', 'W', 'L', 'W_PCT', 'MIN']
    others = [advanced, misc, four_factors, scoring, opponent, defense]
    others = [i[i.columns[~i.columns.isin(columns_to_exclude)]] for i in others]
    datas = [base] + others
    
    merge_keys = ['Date', 'TEAM_ID']
    merged_df = merge_with_suffixes(datas, datas_names, merge_keys)    
    latest_path = f"/Users/benjamincheng/Documents/GitHub/Sports-Betting/nba_api/data/teams_stats/{season}/cumulative_season_stats_{season}.csv"
    prev = pd.read_csv(latest_path, parse_dates=['Date'])
    merged_df = pd.concat([prev, merged_df], ignore_index=True)
    merged_df.to_csv(latest_path, index=False)
    return merged_df

In [19]:
data, unsuccessful_dates = fetch_daily_data_cumulative()

Data fetched for 04/07/2022
Data fetched for 04/08/2022
Data fetched for 04/09/2022
Data fetched for 04/10/2022


## 3. Assembling Dataset <a id='3'></a>

In [26]:
## Collecting For These Areas of Stats
measure_types = ["Base", "Advanced",
                 "Misc", "Four Factorsm",
                 "Scoring", "Opponent",
                 "Usage", "Defense"]

### Measure Types

#### Base

In [151]:
base, unsuccessful_dates_base = fetch_daily_data_cumulative(data_type='Base')

Data fetched for 12/22/2020
Data fetched for 12/23/2020
Data fetched for 12/24/2020
Data fetched for 12/25/2020
Data fetched for 12/26/2020
Data fetched for 12/27/2020
Data fetched for 12/28/2020
Data fetched for 12/29/2020
Data fetched for 12/30/2020
Data fetched for 12/31/2020
Data fetched for 01/01/2021
Data fetched for 01/02/2021
Data fetched for 01/03/2021
Data fetched for 01/04/2021
Data fetched for 01/05/2021
Data fetched for 01/06/2021
Data fetched for 01/07/2021
Data fetched for 01/08/2021
Data fetched for 01/09/2021
Data fetched for 01/10/2021
Data fetched for 01/11/2021
Data fetched for 01/12/2021
Data fetched for 01/13/2021
Data fetched for 01/14/2021
Data fetched for 01/15/2021
Data fetched for 01/16/2021
Data fetched for 01/17/2021
Data fetched for 01/18/2021
Data fetched for 01/19/2021
Data fetched for 01/20/2021
Data fetched for 01/21/2021
Data fetched for 01/22/2021
Data fetched for 01/23/2021
Data fetched for 01/24/2021
Data fetched for 01/25/2021
Data fetched for 01/

In [152]:
print("Total available days: ", datetime.datetime.strptime('2021-05-16', '%Y-%m-%d')-datetime.datetime.strptime('2020-12-22', '%Y-%m-%d'))
print("Total days in data: ", base.shape[0]/30)
print("Unavailable dates size: ", len(unsuccessful_dates_base))

Total available days:  145 days, 0:00:00
Total days in data:  144.93333333333334
Unavailable dates size:  0


In [165]:
base_path = "/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/2020_2021/base_2020_2021.csv"
base.to_csv(base_path, index=False)

#### Advanced

In [166]:
advanced, unsuccessful_dates_advanced = fetch_daily_data_cumulative(data_type='Advanced')


Data fetched for 12/22/2020
Data fetched for 12/23/2020
Data fetched for 12/24/2020
Data fetched for 12/25/2020
Data fetched for 12/26/2020
Data fetched for 12/27/2020
Data fetched for 12/28/2020
Data fetched for 12/29/2020
Data fetched for 12/30/2020
Data fetched for 12/31/2020
Data fetched for 01/01/2021
Data fetched for 01/02/2021
Data fetched for 01/03/2021
Data fetched for 01/04/2021
Data fetched for 01/05/2021
Data fetched for 01/06/2021
Data fetched for 01/07/2021
Data fetched for 01/08/2021
Data fetched for 01/09/2021
Data fetched for 01/10/2021
Data fetched for 01/11/2021
Data fetched for 01/12/2021
Data fetched for 01/13/2021
Data fetched for 01/14/2021
Data fetched for 01/15/2021
Data fetched for 01/16/2021
Data fetched for 01/17/2021
Data fetched for 01/18/2021
Data fetched for 01/19/2021
Data fetched for 01/20/2021
Data fetched for 01/21/2021
Data fetched for 01/22/2021
Data fetched for 01/23/2021
Data fetched for 01/24/2021
Data fetched for 01/25/2021
Data fetched for 01/

In [167]:
print("Total available days: ", datetime.datetime.strptime('2021-05-16', '%Y-%m-%d')-datetime.datetime.strptime('2020-12-22', '%Y-%m-%d'))
print("Total days in data: ", advanced.shape[0]/30)
print("Unavailable dates size: ", len(unsuccessful_dates_advanced))

Total available days:  145 days, 0:00:00
Total days in data:  144.93333333333334
Unavailable dates size:  0


In [168]:
advanced_path = "/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/2020_2021/advanced_2020_2021.csv"
advanced.to_csv(advanced_path, index=False)

#### MISC

In [55]:
misc, unsuccessful_dates_misc = fetch_daily_data_cumulative(data_type='Misc')

Data fetched for 10/19/2021
Data fetched for 10/20/2021
Data fetched for 10/21/2021
Data fetched for 10/22/2021
Data fetched for 10/23/2021
Data fetched for 10/24/2021
Data fetched for 10/25/2021
Data fetched for 10/26/2021
Data fetched for 10/27/2021
Data fetched for 10/28/2021
Data fetched for 10/29/2021
Data fetched for 10/30/2021
Data fetched for 10/31/2021
Data fetched for 11/01/2021
Data fetched for 11/02/2021
Data fetched for 11/03/2021
Data fetched for 11/04/2021
Data fetched for 11/05/2021
Data fetched for 11/06/2021
Data fetched for 11/07/2021
Data fetched for 11/08/2021
Data fetched for 11/09/2021
Data fetched for 11/10/2021
Data fetched for 11/11/2021
Data fetched for 11/12/2021
Data fetched for 11/13/2021
Data fetched for 11/14/2021
Data fetched for 11/15/2021
Data fetched for 11/16/2021
Data fetched for 11/17/2021
Data fetched for 11/18/2021
Data fetched for 11/19/2021
Data fetched for 11/20/2021
Data fetched for 11/21/2021
Data fetched for 11/22/2021
Data fetched for 11/

In [56]:
print("Total available days: ", datetime.datetime.strptime('2021-05-16', '%Y-%m-%d')-datetime.datetime.strptime('2020-12-22', '%Y-%m-%d'))

print("Total available days: ", d_range)
print("Total days in data: ", misc.shape[0]/30)
print("Unavailable dates size: ", len(unsuccessful_dates_misc))

Total available days:  173 days, 0:00:00
Total days in data:  173.0
Unavailable dates size:  0


In [57]:
misc_path = "/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/2020_2021/misc_2020_2021.csv"
misc.to_csv(misc_path, index=False)



#### Four Factors

In [59]:
four_factors, unsuccessful_dates_four_factors = fetch_daily_data_cumulative(data_type='Four Factors')

Data fetched for 10/19/2021
Data fetched for 10/20/2021
Data fetched for 10/21/2021
Data fetched for 10/22/2021
Data fetched for 10/23/2021
Data fetched for 10/24/2021
Data fetched for 10/25/2021
Data fetched for 10/26/2021
Data fetched for 10/27/2021
Data fetched for 10/28/2021
Data fetched for 10/29/2021
Data fetched for 10/30/2021
Data fetched for 10/31/2021
Data fetched for 11/01/2021
Data fetched for 11/02/2021
Data fetched for 11/03/2021
Data fetched for 11/04/2021
Data fetched for 11/05/2021
Data fetched for 11/06/2021
Data fetched for 11/07/2021
Data fetched for 11/08/2021
Data fetched for 11/09/2021
Data fetched for 11/10/2021
Data fetched for 11/11/2021
Data fetched for 11/12/2021
Data fetched for 11/13/2021
Data fetched for 11/14/2021
Data fetched for 11/15/2021
Data fetched for 11/16/2021
Data fetched for 11/17/2021
Data fetched for 11/18/2021
Data fetched for 11/19/2021
Data fetched for 11/20/2021
Data fetched for 11/21/2021
Data fetched for 11/22/2021
Data fetched for 11/

In [60]:
print("Total available days: ", datetime.datetime.strptime('2021-05-16', '%Y-%m-%d')-datetime.datetime.strptime('2020-12-22', '%Y-%m-%d'))

print("Total available days: ", d_range)
print("Total days in data: ", four_factors.shape[0]/30)
print("Unavailable dates size: ", len(unsuccessful_dates_four_factors))

Total available days:  173 days, 0:00:00
Total days in data:  173.0
Unavailable dates size:  0


In [61]:
four_factors_path = "/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/2020_2021/four_factors_2020_2021.csv"
four_factors.to_csv(four_factors_path, index=False)


#### Scoring

In [62]:
scoring, unsuccessful_dates_scoring = fetch_daily_data_cumulative(data_type='Scoring')

Data fetched for 10/19/2021
Data fetched for 10/20/2021
Data fetched for 10/21/2021
Data fetched for 10/22/2021
Data fetched for 10/23/2021
Data fetched for 10/24/2021
Data fetched for 10/25/2021
Data fetched for 10/26/2021
Data fetched for 10/27/2021
Data fetched for 10/28/2021
Data fetched for 10/29/2021
Data fetched for 10/30/2021
Data fetched for 10/31/2021
Data fetched for 11/01/2021
Data fetched for 11/02/2021
Data fetched for 11/03/2021
Data fetched for 11/04/2021
Data fetched for 11/05/2021
Data fetched for 11/06/2021
Data fetched for 11/07/2021
Data fetched for 11/08/2021
Data fetched for 11/09/2021
Data fetched for 11/10/2021
Data fetched for 11/11/2021
Data fetched for 11/12/2021
Data fetched for 11/13/2021
Data fetched for 11/14/2021
Data fetched for 11/15/2021
Data fetched for 11/16/2021
Data fetched for 11/17/2021
Data fetched for 11/18/2021
Data fetched for 11/19/2021
Data fetched for 11/20/2021
Data fetched for 11/21/2021
Data fetched for 11/22/2021
Data fetched for 11/

In [63]:
print("Total available days: ", datetime.datetime.strptime('2021-05-16', '%Y-%m-%d')-datetime.datetime.strptime('2020-12-22', '%Y-%m-%d'))

print("Total available days: ", d_range)
print("Total days in data: ", scoring.shape[0]/30)
print("Unavailable dates size: ", len(unsuccessful_dates_scoring))

Total available days:  173 days, 0:00:00
Total days in data:  173.0
Unavailable dates size:  0


In [64]:
scoring_path = "/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/2020_2021/scoring_2020_2021.csv"
scoring.to_csv(scoring_path, index=False)

#### Opponent

In [65]:
opponent, unsuccessful_dates_opponent = fetch_daily_data_cumulative(data_type='Opponent')

Data fetched for 10/19/2021
Data fetched for 10/20/2021
Data fetched for 10/21/2021
Data fetched for 10/22/2021
Data fetched for 10/23/2021
Data fetched for 10/24/2021
Data fetched for 10/25/2021
Data fetched for 10/26/2021
Data fetched for 10/27/2021
Data fetched for 10/28/2021
Data fetched for 10/29/2021
Data fetched for 10/30/2021
Data fetched for 10/31/2021
Data fetched for 11/01/2021
Data fetched for 11/02/2021
Data fetched for 11/03/2021
Data fetched for 11/04/2021
Data fetched for 11/05/2021
Data fetched for 11/06/2021
Data fetched for 11/07/2021
Data fetched for 11/08/2021
Data fetched for 11/09/2021
Data fetched for 11/10/2021
Data fetched for 11/11/2021
Data fetched for 11/12/2021
Data fetched for 11/13/2021
Data fetched for 11/14/2021
Data fetched for 11/15/2021
Data fetched for 11/16/2021
Data fetched for 11/17/2021
Data fetched for 11/18/2021
Data fetched for 11/19/2021
Data fetched for 11/20/2021
Data fetched for 11/21/2021
Data fetched for 11/22/2021
Data fetched for 11/

In [66]:
print("Total available days: ", datetime.datetime.strptime('2021-05-16', '%Y-%m-%d')-datetime.datetime.strptime('2020-12-22', '%Y-%m-%d'))

print("Total available days: ", d_range)
print("Total days in data: ", opponent.shape[0]/30)
print("Unavailable dates size: ", len(unsuccessful_dates_opponent))

Total available days:  173 days, 0:00:00
Total days in data:  173.0
Unavailable dates size:  0


In [67]:
opponent_path = "/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/2020_2021/nba_api/data/teams_stats/opponent_2020_2021.csv"
opponent.to_csv(opponent_path, index=False)



#### Usage

In [69]:
usage, unsuccessful_dates_four_usage = fetch_daily_data_cumulative(data_type='Usage')

Error fetching data for 10/19/2021: Expecting value: line 1 column 1 (char 0)
Error fetching data for 10/20/2021: Expecting value: line 1 column 1 (char 0)
Error fetching data for 10/21/2021: Expecting value: line 1 column 1 (char 0)
Error fetching data for 10/22/2021: Expecting value: line 1 column 1 (char 0)
Error fetching data for 10/23/2021: Expecting value: line 1 column 1 (char 0)
Error fetching data for 10/24/2021: Expecting value: line 1 column 1 (char 0)
Error fetching data for 10/25/2021: Expecting value: line 1 column 1 (char 0)
Error fetching data for 10/26/2021: Expecting value: line 1 column 1 (char 0)
Error fetching data for 10/27/2021: Expecting value: line 1 column 1 (char 0)


KeyboardInterrupt: 

In [None]:
d_range = datetime.datetime.strptime('2022-04-10', '%Y-%m-%d')-datetime.datetime.strptime('2021-10-19', '%Y-%m-%d')
print("Total available days: ", d_range)
print("Total days in data: ", usage.shape[0]/30)
print("Unavailable dates size: ", len(unsuccessful_dates_usage))

In [None]:
usage_path = "/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/2020_2021/usage_2020_2021.csv"
usage.to_csv(usage_path, index=False)

#### Defense

In [70]:
defense, unsuccessful_dates_defense = fetch_daily_data_cumulative(data_type='Defense')


Data fetched for 10/19/2021
Data fetched for 10/20/2021
Data fetched for 10/21/2021
Data fetched for 10/22/2021
Data fetched for 10/23/2021
Data fetched for 10/24/2021
Data fetched for 10/25/2021
Data fetched for 10/26/2021
Data fetched for 10/27/2021
Data fetched for 10/28/2021
Data fetched for 10/29/2021
Data fetched for 10/30/2021
Data fetched for 10/31/2021
Data fetched for 11/01/2021
Data fetched for 11/02/2021
Data fetched for 11/03/2021
Data fetched for 11/04/2021
Data fetched for 11/05/2021
Data fetched for 11/06/2021
Data fetched for 11/07/2021
Data fetched for 11/08/2021
Data fetched for 11/09/2021
Data fetched for 11/10/2021
Data fetched for 11/11/2021
Data fetched for 11/12/2021
Data fetched for 11/13/2021
Data fetched for 11/14/2021
Data fetched for 11/15/2021
Data fetched for 11/16/2021
Data fetched for 11/17/2021
Data fetched for 11/18/2021
Data fetched for 11/19/2021
Data fetched for 11/20/2021
Data fetched for 11/21/2021
Data fetched for 11/22/2021
Data fetched for 11/

In [71]:
print("Total available days: ", datetime.datetime.strptime('2021-05-16', '%Y-%m-%d')-datetime.datetime.strptime('2020-12-22', '%Y-%m-%d'))

print("Total available days: ", d_range)
print("Total days in data: ", defense.shape[0]/30)
print("Unavailable dates size: ", len(unsuccessful_dates_defense))

Total available days:  173 days, 0:00:00
Total days in data:  173.0
Unavailable dates size:  0


In [72]:
defense_path = "/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/2020_2021/defense_2020_2021.csv"
defense.to_csv(defense_path, index=False)

In [None]:
base, unsuccessful_dates_base = fetch_daily_data_cumulative(data_type='Base')




### 3.1 Merging Data <a id='3_1'></a>

In [135]:
# Checking for duplicated column names
base_path = "/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/2021_2022/base_2021_2022.csv"
advanced_path = "/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/2021_2022/advanced_2021_2022.csv"
misc_path = "/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/2021_2022/misc_2021_2022.csv"
four_factors_path = "/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/2021_2022/four_factors_2021_2022.csv"
scoring_path = "/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/2021_2022/scoring_2021_2022.csv"
opponent_path = "/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/2021_2022/opponent_2021_2022.csv"
defense_path = "/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/2021_2022/defense_2021_2022.csv"

base = pd.read_csv(base_path, parse_dates=['Date'])
advanced = pd.read_csv(advanced_path, parse_dates=['Date'])
misc = pd.read_csv(misc_path, parse_dates=['Date'])
four_factors = pd.read_csv(four_factors_path, parse_dates=['Date'])
scoring = pd.read_csv(scoring_path, parse_dates=['Date'])
opponent = pd.read_csv(opponent_path, parse_dates=['Date'])
defense = pd.read_csv(defense_path, parse_dates=['Date'])

datas = [base, advanced, misc, four_factors, scoring, opponent, defense]
col_names = [i.columns.to_list() for i in datas]

combined_cols_lst = col_names[0]
for i in col_names[1:]:
    combined_cols_lst += i

combined_set = set(combined_cols_lst)
has_duplicates = len(combined_cols_lst) > len(combined_set)

print(has_duplicates)
print("Size diff: ",(len(combined_cols_lst) - len(combined_set)))

True
Size diff:  104


In [138]:
datas_names = ["base", "advanced", "misc", "four_factors",
               "scoring", "opponent", "defense"]
columns_to_exclude = ['TEAM_NAME', 'GP', 'W', 'L', 'W_PCT', 'MIN']
others = [advanced, misc, four_factors, scoring, opponent, defense]
others = [i[i.columns[~i.columns.isin(columns_to_exclude)]] for i in others]
datas = [base] + others

col_names = [i.columns.to_list() for i in datas]
combined_cols_lst = col_names[0]
for i in col_names[1:]:
    combined_cols_lst += i

combined_set = set(combined_cols_lst)
has_duplicates = len(combined_cols_lst) > len(combined_set)

print(has_duplicates)
print("Size diff: ",(len(combined_cols_lst) - len(combined_set)))

True
Size diff:  68


In [140]:
merge_keys = ['Date', 'TEAM_ID']
merged_df = merge_with_suffixes(datas, datas_names, merge_keys)
merged_df


Unnamed: 0,TEAM_ID,TEAM_NAME_base,GP_base,W_base,L_base,W_PCT_base,MIN_base,FGM_base,FGA_base,FG_PCT_base,...,MIN_RANK_defense,DEF_RATING_RANK_defense,DREB_RANK_defense,DREB_PCT_RANK_defense,STL_RANK_defense,BLK_RANK_defense,OPP_PTS_OFF_TOV_RANK_defense,OPP_PTS_2ND_CHANCE_RANK_defense,OPP_PTS_FB_RANK_defense,OPP_PTS_PAINT_RANK_defense
0,1610612751,Brooklyn Nets,1,0,1,0.000,48.0,37,84,0.440,...,1,4,4,4,4,1,4,4,4,3
1,1610612744,Golden State Warriors,1,1,0,1.000,48.0,41,93,0.441,...,1,1,1,2,1,4,2,1,3,1
2,1610612747,Los Angeles Lakers,1,0,1,0.000,48.0,45,95,0.474,...,1,3,3,3,3,3,3,3,1,4
3,1610612749,Milwaukee Bucks,1,1,0,1.000,48.0,48,105,0.457,...,1,2,1,1,2,1,1,1,2,1
4,1610612738,Boston Celtics,1,0,1,0.000,58.0,48,117,0.410,...,1,15,4,16,3,4,10,17,11,25
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5185,1610612758,Sacramento Kings,82,30,52,0.366,3961.0,3321,7223,0.460,...,8,27,23,22,20,19,29,27,23,27
5186,1610612759,San Antonio Spurs,82,34,48,0.415,3961.0,3546,7601,0.467,...,8,16,13,24,11,10,9,28,10,26
5187,1610612761,Toronto Raptors,82,48,34,0.585,3971.0,3332,7489,0.445,...,5,9,30,23,2,17,1,23,9,7
5188,1610612762,Utah Jazz,82,49,33,0.598,3946.0,3327,7067,0.471,...,24,10,3,5,20,11,17,13,11,10


In [141]:
5190/173

30.0

In [144]:
merged_path = "/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/2021_2022/cumulative_season_stats_2021_2022.csv"
merged_df.to_csv(merged_path, index=False)

#### Getting Four Factors Data

## 4. Pipeline Building <a id='4'></a>

Check Section 2 for Function

### 4.1 Extracting Current Data <a id='4_1'></a>

In [365]:
# start_dates = [
#     "2018-10-16",  # 2018-2019 Season Start
#     "2019-10-22",  # 2019-2020 Season Start
#     "2020-12-22",  # 2020-2021 Season Start
#     "2022-10-18"   # 2022-2023 Season Start
# ]

# end_dates = [
#     "2019-04-10",  # 2018-2019 Regular Season End
#     "2020-08-14",  # 2019-2020 Regular Season End (Adjusted for COVID-19 Bubble)
#     "2021-05-16",  # 2020-2021 Regular Season End
#     "2023-04-09"   # 2022-2023 Regular Season End
# ]
start_dates = [
    "2023-10-24" # Current seasons
]

end_dates = [
    "2024-03-20"
]

nba_data = []
unsuccessful_dates_lst = []
for i in range(len(start_dates)):
    datas, unsuccessful_dates_ = getting_stats(start_dates[i], end_dates[i])
    nba_data += [data]
    unsuccessful_dates_lst += [unsuccessful_dates_]

Data fetched for 10/24/2023
Data fetched for 10/25/2023
Data fetched for 10/26/2023
Data fetched for 10/27/2023
Data fetched for 10/28/2023
Data fetched for 10/29/2023
Data fetched for 10/30/2023
Data fetched for 10/31/2023
Data fetched for 11/01/2023
Data fetched for 11/02/2023
Data fetched for 11/03/2023
Data fetched for 11/04/2023
Data fetched for 11/05/2023
Data fetched for 11/06/2023
Data fetched for 11/07/2023
Data fetched for 11/08/2023
Data fetched for 11/09/2023
Data fetched for 11/10/2023
Data fetched for 11/11/2023
Data fetched for 11/12/2023
Data fetched for 11/13/2023
Data fetched for 11/14/2023
Data fetched for 11/15/2023
Data fetched for 11/16/2023
Data fetched for 11/17/2023
Data fetched for 11/18/2023
Data fetched for 11/19/2023
Data fetched for 11/20/2023
Data fetched for 11/21/2023
Data fetched for 11/22/2023
Data fetched for 11/23/2023
Data fetched for 11/24/2023
Data fetched for 11/25/2023
Data fetched for 11/26/2023
Data fetched for 11/27/2023
Data fetched for 11/

#### For Rolling Current Data

In [366]:
# start_dates = [
#     "2018-10-16",  # 2018-2019 Season Start
#     # "2019-10-22",  # 2019-2020 Season Start
#     "2020-12-22",  # 2020-2021 Season Start
#     '2021-10-19',
#     "2022-10-18"   # 2022-2023 Season Start
# ]

# end_dates = [
#     "2019-04-10",  # 2018-2019 Regular Season End
#     # "2020-08-14",  # 2019-2020 Regular Season End (Adjusted for COVID-19 Bubble)
#     "2021-05-16",  # 2020-2021 Regular Season End
#     '2022-04-10',
#     "2023-04-09"   # 2022-2023 Regular Season End
# ]

start_dates = [
    "2023-10-24" # Current seasons
]

end_dates = [
    "2024-03-20"
]
nba_data = []
unsuccessful_dates_lst = []
for i in range(len(start_dates)):
    datas, unsuccessful_dates_ = getting_stats_rolling(start_dates[i], end_dates[i])
    nba_data += [datas]
    unsuccessful_dates_lst += [unsuccessful_dates_]

Data fetched for 11/16/2023
Data fetched for 11/17/2023
Data fetched for 11/18/2023
Data fetched for 11/19/2023
Data fetched for 11/20/2023
Data fetched for 11/21/2023
Data fetched for 11/22/2023
Data fetched for 11/23/2023
Data fetched for 11/24/2023
Data fetched for 11/25/2023
Data fetched for 11/26/2023
Data fetched for 11/27/2023
Data fetched for 11/28/2023
Data fetched for 11/29/2023
Data fetched for 11/30/2023
Data fetched for 12/01/2023
Data fetched for 12/02/2023
Data fetched for 12/03/2023
Data fetched for 12/04/2023
Data fetched for 12/05/2023
Error fetching data for 12/06/2023: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Data fetched for 12/07/2023
Data fetched for 12/08/2023
Data fetched for 12/09/2023
Data fetched for 12/10/2023
Data fetched for 12/11/2023
Data fetched for 12/12/2023
Data fetched for 12/13/2023
Data fetched for 12/14/2023
Data fetched for 12/15/2023
Data fetched for 12/16/2023
Data fetched for 12/17/2023
Dat

#### For Covid Season

In [354]:
### For covid season

start_dates = [
    "2019-10-22" # 2019-2020 Season Start
#    "2020-07-30"
]

end_dates = [
    "2020-03-11"  # 2018-2019 Regular Season End
#    "2020-08-14"
]

nba_data = []
unsuccessful_dates_lst = []
for i in range(len(start_dates)):
    if i == 0:
        datas, unsuccessful_dates_ = getting_stats_rolling(start_dates[i], end_dates[i], past_n_days=23)
        nba_data += [datas]
        unsuccessful_dates_lst += [unsuccessful_dates_]
    # else:
    #     datas, unsuccessful_dates_ = getting_stats_rolling_covid(start_dates[i], end_dates[i], past_n_days=23)
    #     nba_data += [datas]
    #     unsuccessful_dates_lst += [unsuccessful_dates_]

Data fetched for 11/14/2019
Data fetched for 11/15/2019
Data fetched for 11/16/2019
Data fetched for 11/17/2019
Data fetched for 11/18/2019
Data fetched for 11/19/2019
Data fetched for 11/20/2019
Data fetched for 11/21/2019
Data fetched for 11/22/2019
Data fetched for 11/23/2019
Data fetched for 11/24/2019
Data fetched for 11/25/2019
Data fetched for 11/26/2019
Data fetched for 11/27/2019
Data fetched for 11/28/2019
Data fetched for 11/29/2019
Data fetched for 11/30/2019
Data fetched for 12/01/2019
Data fetched for 12/02/2019
Data fetched for 12/03/2019
Data fetched for 12/04/2019
Data fetched for 12/05/2019
Data fetched for 12/06/2019
Data fetched for 12/07/2019
Data fetched for 12/08/2019
Data fetched for 12/09/2019
Data fetched for 12/10/2019
Data fetched for 12/11/2019
Data fetched for 12/12/2019
Data fetched for 12/13/2019
Data fetched for 12/14/2019
Data fetched for 12/15/2019
Data fetched for 12/16/2019
Data fetched for 12/17/2019
Data fetched for 12/18/2019
Data fetched for 12/

In [358]:
four_facotrs.to_csv(paths_, index=False)

#### Combine Rolling Covid to Rolling

In [364]:
season='2019_2020'
past_n_days = 23

base_path = f"/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/{season}/base_{season}_rolling_{past_n_days}.csv"
advanced_path = f"/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/{season}/advanced_{season}_rolling_{past_n_days}.csv"
misc_path = f"/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/{season}/misc_{season}_rolling_{past_n_days}.csv"
four_factors_path = f"/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/{season}/four_factors_{season}_rolling_{past_n_days}.csv"
scoring_path = f"/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/{season}/scoring_{season}_rolling_{past_n_days}.csv"
opponent_path = f"/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/{season}/opponent_{season}_rolling_{past_n_days}.csv"
defense_path = f"/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/{season}/defense_{season}_rolling_{past_n_days}.csv"

base = pd.read_csv(base_path, parse_dates=['Date'])
advanced = pd.read_csv(advanced_path, parse_dates=['Date'])
misc = pd.read_csv(misc_path, parse_dates=['Date'])
four_factors = pd.read_csv(four_factors_path, parse_dates=['Date'])
scoring = pd.read_csv(scoring_path, parse_dates=['Date'])
opponent = pd.read_csv(opponent_path, parse_dates=['Date'])
defense = pd.read_csv(defense_path, parse_dates=['Date'])

base_path_covid = f"/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/{season}/base_{season}_rolling_{past_n_days}_covid.csv"
advanced_path_covid = f"/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/{season}/advanced_{season}_rolling_{past_n_days}_covid.csv"
misc_path_covid = f"/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/{season}/misc_{season}_rolling_{past_n_days}_covid.csv"
four_factors_path_covid = f"/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/{season}/four_factors_{season}_rolling_{past_n_days}_covid.csv"
scoring_path_covid = f"/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/{season}/scoring_{season}_rolling_{past_n_days}_covid.csv"
opponent_path_covid = f"/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/{season}/opponent_{season}_rolling_{past_n_days}_covid.csv"
defense_path_covid = f"/Users/liqingyang/Documents/GitHub/sports_trading/sports_betting/nba_api/data/teams_stats/{season}/defense_{season}_rolling_{past_n_days}_covid.csv"

base_covid = pd.read_csv(base_path_covid, parse_dates=['Date'])
advanced_covid = pd.read_csv(advanced_path_covid, parse_dates=['Date'])
misc_covid = pd.read_csv(misc_path_covid, parse_dates=['Date'])
four_factors_covid = pd.read_csv(four_factors_path_covid, parse_dates=['Date'])
scoring_covid = pd.read_csv(scoring_path_covid, parse_dates=['Date'])
opponent_covid = pd.read_csv(opponent_path_covid, parse_dates=['Date'])
defense_covid = pd.read_csv(defense_path_covid, parse_dates=['Date'])

pre_covid = [base, advanced, misc, four_factors, scoring, opponent, defense]
post_covid = [base_covid, advanced_covid, misc_covid, four_factors_covid, scoring_covid, opponent_covid, defense_covid]
paths = [base_path, advanced_path, misc_path, four_factors_path, scoring_path, opponent_path, defense_path]

combined = [pd.concat([pre_covid[i], post_covid[i]], axis=0, ignore_index=True) for i in range(len(pre_covid))]
[combined[i].to_csv(paths[i], index=False) for i in range(len(combined))]
combined[0]


Unnamed: 0,TEAM_ID,TEAM_NAME,GP,W,L,W_PCT,MIN,FGM,FGA,FG_PCT,...,AST_RANK,TOV_RANK,STL_RANK,BLK_RANK,BLKA_RANK,PF_RANK,PFD_RANK,PTS_RANK,PLUS_MINUS_RANK,Date
0,1610612737,Atlanta Hawks,11,4,7,0.364,533.0,439,954,0.460,...,19,24,7,16,29,18,20,21,26,2019-11-14
1,1610612738,Boston Celtics,10,9,1,0.900,480.0,417,912,0.457,...,21,1,22,6,15,10,20,23,2,2019-11-14
2,1610612751,Brooklyn Nets,11,4,7,0.364,538.0,466,1018,0.458,...,20,21,27,17,19,16,16,11,20,2019-11-14
3,1610612766,Charlotte Hornets,11,4,7,0.364,533.0,427,948,0.450,...,13,23,23,28,23,4,14,22,27,2019-11-14
4,1610612741,Chicago Bulls,12,4,8,0.333,576.0,468,1082,0.433,...,10,19,1,24,29,25,5,9,21,2019-11-14
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4045,1610612758,Sacramento Kings,12,5,7,0.417,581.0,535,1085,0.493,...,6,18,7,21,10,26,21,9,25,2020-08-14
4046,1610612759,San Antonio Spurs,12,7,5,0.583,581.0,516,1071,0.482,...,9,12,5,2,12,10,14,7,8,2020-08-14
4047,1610612761,Toronto Raptors,12,11,1,0.917,576.0,456,1001,0.456,...,16,29,3,12,30,19,9,15,3,2020-08-14
4048,1610612762,Utah Jazz,12,6,6,0.500,586.0,467,1038,0.450,...,16,23,12,11,18,16,20,18,20,2020-08-14


### 4.2 Merging For All Seasons <a id='4_2'></a>

In [300]:
seasons = ['2018_2019', '2019_2020', '2020_2021', '2022_2023']
merged_dfs = [merge_for_season(season) for season in seasons]

In [302]:
merged_dfs[3]

Unnamed: 0,TEAM_ID,TEAM_NAME_base,GP_base,W_base,L_base,W_PCT_base,MIN_base,FGM_base,FGA_base,FG_PCT_base,...,MIN_RANK_defense,DEF_RATING_RANK_defense,DREB_RANK_defense,DREB_PCT_RANK_defense,STL_RANK_defense,BLK_RANK_defense,OPP_PTS_OFF_TOV_RANK_defense,OPP_PTS_2ND_CHANCE_RANK_defense,OPP_PTS_FB_RANK_defense,OPP_PTS_PAINT_RANK_defense
0,1610612738,Boston Celtics,1,1,0,1.000,48.0,46,82,0.561,...,1,3,3,1,3,3,1,3,1,1
1,1610612744,Golden State Warriors,1,1,0,1.000,48.0,45,99,0.455,...,1,1,2,2,2,1,2,1,4,4
2,1610612747,Los Angeles Lakers,1,0,1,0.000,48.0,40,94,0.426,...,1,2,1,4,1,1,4,4,2,1
3,1610612755,Philadelphia 76ers,1,0,1,0.000,48.0,40,80,0.500,...,1,4,4,3,3,3,3,1,3,3
4,1610612737,Atlanta Hawks,1,1,0,1.000,48.0,45,90,0.500,...,3,5,15,19,2,8,4,13,7,24
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5127,1610612758,Sacramento Kings,82,48,34,0.585,3966.0,3573,7232,0.494,...,12,24,15,6,20,29,12,6,1,26
5128,1610612759,San Antonio Spurs,82,22,60,0.268,3971.0,3533,7593,0.465,...,8,30,23,24,22,26,28,26,24,30
5129,1610612761,Toronto Raptors,82,41,41,0.500,3961.0,3434,7489,0.459,...,18,11,30,15,1,9,1,1,23,11
5130,1610612762,Utah Jazz,82,37,45,0.451,3961.0,3485,7365,0.473,...,18,23,6,23,30,7,27,28,29,24


In [368]:
seasons = ['2023_2024']
merged_dfs = [merge_for_season(season) for season in seasons]

In [369]:
# For rolling
seasons = ['2018_2019','2019_2020', '2020_2021','2021_2022', '2022_2023', '2023_2024']
merged_dfs = [merge_for_season_rolling(season) for season in seasons]

In [374]:
merged_dfs[5]['Date']

0      2023-11-16
1      2023-11-16
2      2023-11-16
3      2023-11-16
4      2023-11-16
          ...    
3715   2024-03-20
3716   2024-03-20
3717   2024-03-20
3718   2024-03-20
3719   2024-03-20
Name: Date, Length: 3720, dtype: datetime64[ns]

### 4.3 Combing Cumulative and Rolling <a id='4_3'></a>

In [375]:
# For rolling
seasons = ['2018_2019','2019_2020', '2020_2021','2021_2022', '2022_2023', '2023_2024']
final_dfs = [merge_for_cum_and_rolling(season) for season in seasons]

In [378]:
final_dfs[0].dropna()

Unnamed: 0,TEAM_ID,TEAM_NAME_base_x,GP_base_x,W_base_x,L_base_x,W_PCT_base_x,MIN_base_x,FGM_base_x,FGA_base_x,FG_PCT_base_x,...,MIN_RANK_defense_y,DEF_RATING_RANK_defense_y,DREB_RANK_defense_y,DREB_PCT_RANK_defense_y,STL_RANK_defense_y,BLK_RANK_defense_y,OPP_PTS_OFF_TOV_RANK_defense_y,OPP_PTS_2ND_CHANCE_RANK_defense_y,OPP_PTS_FB_RANK_defense_y,OPP_PTS_PAINT_RANK_defense_y
660,1610612737,Atlanta Hawks,11,3,8,0.273,528.0,439,974,0.451,...,15.0,22.0,17.0,25.0,10.0,4.0,30.0,29.0,28.0,12.0
661,1610612738,Boston Celtics,11,7,4,0.636,533.0,423,989,0.428,...,9.0,1.0,6.0,7.0,24.0,19.0,3.0,14.0,5.0,9.0
662,1610612751,Brooklyn Nets,11,5,6,0.455,533.0,451,982,0.459,...,9.0,18.0,24.0,28.0,20.0,24.0,26.0,28.0,8.0,6.0
663,1610612766,Charlotte Hornets,11,6,5,0.545,528.0,472,1004,0.470,...,15.0,12.0,15.0,10.0,26.0,3.0,9.0,8.0,9.0,18.0
664,1610612741,Chicago Bulls,12,3,9,0.250,591.0,475,1050,0.452,...,1.0,25.0,7.0,27.0,16.0,11.0,24.0,21.0,12.0,20.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5275,1610612758,Sacramento Kings,82,39,43,0.476,3946.0,3541,7637,0.464,...,8.0,28.0,12.0,16.0,7.0,26.0,21.0,25.0,14.0,24.0
5276,1610612759,San Antonio Spurs,82,48,34,0.585,3961.0,3468,7248,0.478,...,12.0,10.0,12.0,6.0,27.0,13.0,8.0,8.0,11.0,7.0
5277,1610612761,Toronto Raptors,82,58,24,0.707,3976.0,3460,7305,0.474,...,11.0,1.0,7.0,9.0,10.0,4.0,21.0,7.0,11.0,9.0
5278,1610612762,Utah Jazz,82,50,32,0.610,3951.0,3314,7082,0.468,...,5.0,3.0,4.0,7.0,5.0,3.0,20.0,11.0,7.0,26.0


In [None]:
combined = [pd.concat([pre_covid[i], post_covid[i]], axis=0, ignore_index=True) for i in range(len(pre_covid))]

## Extra. Getting Data for Today <a id='extra'></a>

In [5]:
today = getting_stats()

Data fetched for 03/20/2024
Data fetched for 03/21/2024
Data fetched for 03/22/2024
Data fetched for 03/23/2024
Data fetched for 03/24/2024
Data fetched for 03/25/2024
Data fetched for 03/26/2024
Data fetched for 03/27/2024
Data fetched for 03/28/2024
Data fetched for 03/29/2024
Data fetched for 03/30/2024
Data fetched for 03/31/2024
Data fetched for 04/01/2024
Data fetched for 04/02/2024
Data fetched for 03/20/2024
Data fetched for 03/21/2024
Data fetched for 03/22/2024
Data fetched for 03/23/2024
Data fetched for 03/24/2024
Data fetched for 03/25/2024
Data fetched for 03/26/2024
Data fetched for 03/27/2024
Data fetched for 03/28/2024
Data fetched for 03/29/2024
Data fetched for 03/30/2024
Data fetched for 03/31/2024
Data fetched for 04/01/2024
Data fetched for 04/02/2024
Data fetched for 03/20/2024
Data fetched for 03/21/2024
Data fetched for 03/22/2024
Data fetched for 03/23/2024
Data fetched for 03/24/2024
Data fetched for 03/25/2024
Data fetched for 03/26/2024
Data fetched for 03/

In [12]:
today = getting_stats()

Data fetched for 03/20/2024
Data fetched for 03/21/2024
Data fetched for 03/22/2024
Data fetched for 03/23/2024
Data fetched for 03/24/2024
Data fetched for 03/25/2024
Data fetched for 03/26/2024
Data fetched for 03/27/2024
Data fetched for 03/28/2024
Data fetched for 03/29/2024
Data fetched for 03/30/2024
Data fetched for 03/31/2024
Data fetched for 04/01/2024
Data fetched for 04/02/2024
Data fetched for 04/03/2024
Data fetched for 04/04/2024
Data fetched for 04/05/2024
Data fetched for 04/06/2024
Data fetched for 03/20/2024
Data fetched for 03/21/2024
Data fetched for 03/22/2024
Data fetched for 03/23/2024
Data fetched for 03/24/2024
Data fetched for 03/25/2024
Data fetched for 03/26/2024
Data fetched for 03/27/2024
Data fetched for 03/28/2024
Data fetched for 03/29/2024
Data fetched for 03/30/2024
Data fetched for 03/31/2024
Data fetched for 04/01/2024
Data fetched for 04/02/2024
Data fetched for 04/03/2024
Data fetched for 04/04/2024
Data fetched for 04/05/2024
Data fetched for 04/