### Production Features Pipeline - CSV Version

This notebook is run daily from a Github Action. 

1. It scrapes the results from the previous day's games, performs feature engineering, and saves the results back to a csv file. This is an alternative version of the pipeline that DOES NOT utilize the Hopsworks.ai Feature Store and is less dependent on other platforms.

2. It scrapes the upcoming games for today, and saves the blank records back into the csv file so that they can be accessed by the model for the prediction.

**Note:**
There are two options for webscraping in this notebook. 
Set the 'WEBSCRAPER' variable to either 'SCRAPINGANT' or 'SELENIUM' to choose which version to run.

1. SCRAPINGANT: Uses a webscraping service with a Python API, ScrapingAnt, which handles all the proxy server issues, but does require an account. The free account allows for 1000 page requests, which is more than enough for this project. Proxies are required when running this notebook from a Github Action or otherwise key data will fail to be scraped from NBA.com. 

2. SELENIUM: This option does not currently integrate proxy servers into the webscraping process, which can cause issues when scraping from certain locations, in particular Github Actions. For occasional use from local machines, this option may work fine, but you may need to setup a proxy server.

In [1]:
# select web scraper; 'SCRAPINGANT' or 'SELENIUM'
# SCRAPINGANT requires a subscription but includes a proxy server

WEBSCRAPER = 'SCRAPINGANT'
#WEBSCRAPER = 'SELENIUM'

In [2]:
import os

import pandas as pd
import numpy as np

import hopsworks

from datetime import datetime, timedelta
from pytz import timezone

import json

import time

from pathlib import Path  #for Windows/Linux compatibility

# change working directory to project root when running from notebooks folder to make it easier to import modules
# and to access sibling folders
os.chdir('..') 

 
from src.webscraping import (
    get_new_games,
    activate_web_driver,
    get_todays_matchups,
)

from src.data_processing import (
    process_games,
    add_TARGET,
)

from src.feature_engineering import (
    process_features,
)

from src.dashboard_processing import (
    NBADataProcessor,
)

from src.google_drive_utils import (
    upload_to_drive,
)

DATAPATH = Path(r'data')
GOOGLE_FOLDER_ID = "1y5AfF3KZ8FGzxr2pyuncXJpKWEa5j-CL"

**Load API keys**

In [3]:
from dotenv import load_dotenv

load_dotenv()

#try:
#    HOPSWORKS_API_KEY = os.environ['HOPSWORKS_API_KEY']
#except:
#    raise Exception('Set environment variable HOPSWORKS_API_KEY')


# if scrapingant is chosen then set the api key, otherwise load the selenium webdriver
if WEBSCRAPER == 'SCRAPINGANT':
    try:
        SCRAPINGANT_API_KEY = os.environ['SCRAPINGANT_API_KEY']
    except:
        raise Exception('Set environment variable SCRAPINGANT_API_KEY')
    driver = None
    
elif WEBSCRAPER == 'SELENIUM':
    driver = activate_web_driver('chromium')
    SCRAPINGANT_API_KEY = ""
    



**Scrape New Completed Games and Format Them**

In [4]:


df_new = get_new_games(SCRAPINGANT_API_KEY, driver)

if df_new.empty:
    print('No new games to process')

    # determine what season we are in currently
    today = datetime.now(timezone('EST')) #nba.com uses US Eastern Standard Time
    if today.month >= 10:
        SEASON = today.year
    else:
        SEASON = today.year - 1
else:

    # get the SEASON of the last game in the database
    # this will used when constructing rows for prediction
    SEASON = df_new['SEASON'].max()

    df_new




Current month is 11
Scraping https://www.nba.com/stats/teams/boxscores?SeasonType=Regular+Season&DateFrom=10/26/2025&DateTo=11/02/2025
0     1610612750
1     1610612766
2     1610612749
3     1610612758
4     1610612753
5     1610612764
6     1610612754
7     1610612744
8     1610612745
9     1610612738
10    1610612765
11    1610612742
12    1610612737
13    1610612738
14    1610612755
15    1610612741
16    1610612754
17    1610612752
18    1610612761
19    1610612739
20    1610612740
21    1610612756
22    1610612746
23    1610612743
24    1610612757
25    1610612747
26    1610612763
27    1610612762
28    1610612753
29    1610612766
30    1610612764
31    1610612760
32    1610612748
33    1610612759
34    1610612749
35    1610612744
36    1610612745
37    1610612765
38    1610612753
39    1610612740
40    1610612754
41    1610612743
42    1610612757
43    1610612762
44    1610612751
45    1610612737
46    1610612742
47    1610612758
48    1610612739
49    1610612738
dtype: object
H

**Retrieve todays games**

In [5]:
#retrieve list of teams playing today

# get today's games on NBA schedule
matchups, game_ids = get_todays_matchups(SCRAPINGANT_API_KEY, driver)

if matchups is None:
    print('No games today')
else:
    print(matchups)
    print(game_ids)


Sat
Sun
[['1610612740', '1610612760'], ['1610612755', '1610612751'], ['1610612762', '1610612766'], ['1610612737', '1610612739'], ['1610612763', '1610612761'], ['1610612741', '1610612752'], ['1610612759', '1610612756'], ['1610612748', '1610612747']]
['22500148', '22500149', '22500150', '22500151', '22500152', '22500153', '22500154', '22500155']


**Close Webdriver**

In [6]:
if WEBSCRAPER == 'SELENIUM':
    driver.close() 

**Check if anything is going on in the season**

In [7]:
if (df_new.empty) and (matchups is None):
    print('No new games to process')
    #exit()
    

**Create Rows for Today's Games with Empty Stats**

In [8]:
# reformat today's matchups to the new games dataframe

if matchups is None:
    print('No games going on. Nothing to do.')
    #exit()    

else:

    df_today = df_new.drop(df_new.index) #empty copy of df_new with same columns
    for i, matchup in enumerate(matchups):
        game_details = {'HOME_TEAM_ID': matchup[1], 
                        'VISITOR_TEAM_ID': matchup[0], 
                        'GAME_DATE_EST': datetime.now(timezone('EST')).strftime("%Y-%m-%d"), 
                        'GAME_ID': int(game_ids[i]),                       
                        'SEASON': SEASON,
                        } 
        game_details_df = pd.DataFrame(game_details, index=[i])
        # append to new games dataframe
        df_today = pd.concat([df_today, game_details_df], ignore_index = True)

    #blank rows will be filled with 0 to prevent issues with feature engineering
    df_today = df_today.fillna(0) 

    df_today



**Query Old Data Needed for Feature Engineering of New Data**

To generate features like rolling averages for the new games, older data from previous games is needed since some of the rolling averages might extend back 15 or 20 games or so.

In [9]:


df_old = pd.read_csv(DATAPATH / 'games.csv')

df_old


Unnamed: 0,GAME_DATE_EST,GAME_ID,GAME_STATUS_TEXT,HOME_TEAM_ID,VISITOR_TEAM_ID,SEASON,TEAM_ID_home,PTS_home,FG_PCT_home,FT_PCT_home,...,AST_home,REB_home,TEAM_ID_away,PTS_away,FG_PCT_away,FT_PCT_away,FG3_PCT_away,AST_away,REB_away,HOME_TEAM_WINS
0,2022-03-12,22101005.0,Final,1.610613e+09,1.610613e+09,2021.0,1.610613e+09,104.0,0.398,0.760,...,23.0,53.0,1.610613e+09,113.0,0.422,0.875,0.357,21.0,46.0,0
1,2022-03-12,22101006.0,Final,1.610613e+09,1.610613e+09,2021.0,1.610613e+09,101.0,0.443,0.933,...,20.0,46.0,1.610613e+09,91.0,0.419,0.824,0.208,19.0,40.0,1
2,2022-03-12,22101007.0,Final,1.610613e+09,1.610613e+09,2021.0,1.610613e+09,108.0,0.412,0.813,...,28.0,52.0,1.610613e+09,119.0,0.489,1.000,0.389,23.0,47.0,0
3,2022-03-12,22101008.0,Final,1.610613e+09,1.610613e+09,2021.0,1.610613e+09,122.0,0.484,0.933,...,33.0,55.0,1.610613e+09,109.0,0.413,0.696,0.386,27.0,39.0,1
4,2022-03-12,22101009.0,Final,1.610613e+09,1.610613e+09,2021.0,1.610613e+09,115.0,0.551,0.750,...,32.0,39.0,1.610613e+09,127.0,0.471,0.760,0.387,28.0,50.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
30136,2025-10-29 00:00:00,22500134.0,,1.610613e+09,1.610613e+09,2025.0,,122.0,53.300,85.700,...,40.0,56.0,,88.0,37.600,58.600,20.600,17.0,36.0,1
30137,2025-10-29 00:00:00,22500135.0,,1.610613e+09,1.610613e+09,2025.0,,134.0,48.800,83.700,...,30.0,46.0,,136.0,47.900,85.200,38.200,34.0,39.0,0
30138,2025-10-29 00:00:00,22500130.0,,1.610613e+09,1.610613e+09,2025.0,,112.0,45.500,77.800,...,23.0,47.0,,117.0,47.300,72.700,39.400,33.0,44.0,0
30139,2025-10-29 00:00:00,22500133.0,,1.610613e+09,1.610613e+09,2025.0,,107.0,45.600,81.100,...,21.0,47.0,,105.0,34.900,64.300,28.900,20.0,52.0,1


**Update Yesterday's Matchup Predictions with New Final Results**

In [12]:
# filter out games that are pending final results
# (these were the rows used for prediction yesterday)
# and then update these with the new results

def fix_dupes(df: pd.DataFrame) -> pd.DataFrame:
    
    # Ensure consistent types
    df['GAME_ID'] = pd.to_numeric(df['GAME_ID'], errors='coerce').astype('Int64')
    df = df.dropna(subset=['GAME_ID'])
    df['GAME_ID'] = df['GAME_ID'].astype('int64')

    # Optional: mark completeness (prefer completed games)
    is_final = df.get('GAME_STATUS_TEXT', '').astype(str).str.contains('Final', case=False, na=False)
    has_pts = (df.get('PTS_home', 0).fillna(0) > 0) | (df.get('PTS_away', 0).fillna(0) > 0)
    df['__row_quality'] = is_final.astype(int) + has_pts.astype(int)

    # Keep the best version per GAME_ID
    df = df.sort_values(['__row_quality', 'GAME_DATE_EST']).drop_duplicates(subset=['GAME_ID'], keep='last')
    df = df.drop(columns=['__row_quality'], errors='ignore')
    
    return df


# one approach is to simply drop the rows that were used for prediction yesterday
# which are games that have 0 points for home team
# and then append the new rows to the dataframe
df_old = df_old[df_old['PTS_home'] != 0]
df_old = pd.concat([df_old, df_new], ignore_index = True)

df_old = fix_dupes(df_old)



# save the new games to the database
df_old.to_csv(DATAPATH / 'games.csv', index=False)

df_old

Unnamed: 0,GAME_DATE_EST,GAME_ID,GAME_STATUS_TEXT,HOME_TEAM_ID,VISITOR_TEAM_ID,SEASON,TEAM_ID_home,PTS_home,FG_PCT_home,FT_PCT_home,...,AST_home,REB_home,TEAM_ID_away,PTS_away,FG_PCT_away,FT_PCT_away,FG3_PCT_away,AST_away,REB_away,HOME_TEAM_WINS
0,2003-10-07,10300006,Final,1.610613e+09,1.610613e+09,2003.0,1.610613e+09,,,,...,,,1.610613e+09,,,,,,,0
1,2003-10-08,10300013,Final,1.610613e+09,1.610613e+09,2003.0,1.610613e+09,,,,...,,,1.610613e+09,,,,,,,0
2,2003-10-08,10300015,Final,1.610613e+09,1.610613e+09,2003.0,1.610613e+09,,,,...,,,1.610613e+09,,,,,,,0
3,2003-10-09,10300020,Final,1.610613e+09,1.610613e+09,2003.0,1.610613e+09,,,,...,,,1.610613e+09,,,,,,,0
4,2003-10-09,10300021,Final,1.610613e+09,1.610613e+09,2003.0,1.610613e+09,,,,...,,,1.610613e+09,,,,,,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
30113,2022-03-12,22101007,Final,1.610613e+09,1.610613e+09,2021.0,1.610613e+09,108.0,0.412,0.813,...,28.0,52.0,1.610613e+09,119.0,0.489,1.000,0.389,23.0,47.0,0
30114,2022-03-12,22101008,Final,1.610613e+09,1.610613e+09,2021.0,1.610613e+09,122.0,0.484,0.933,...,33.0,55.0,1.610613e+09,109.0,0.413,0.696,0.386,27.0,39.0,1
30115,2022-03-12,22101009,Final,1.610613e+09,1.610613e+09,2021.0,1.610613e+09,115.0,0.551,0.750,...,32.0,39.0,1.610613e+09,127.0,0.471,0.760,0.387,28.0,50.0,0
30116,2022-03-12,22101010,Final,1.610613e+09,1.610613e+09,2021.0,1.610613e+09,134.0,0.558,0.710,...,21.0,44.0,1.610613e+09,125.0,0.500,0.857,0.394,27.0,33.0,1


**Add Today's Matchups for Feature Engineering**

In [13]:
if matchups is None:
    print('No games today')
    df_combined = df_old
else:
    df_combined = pd.concat([df_old, df_today], ignore_index = True)
    df_combined

**Data Processing**

In [14]:
df_combined = process_games(df_combined) 
df_combined = add_TARGET(df_combined)
df_combined

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Unnamed: 0,GAME_DATE_EST,GAME_ID,HOME_TEAM_ID,VISITOR_TEAM_ID,SEASON,PTS_home,FG_PCT_home,FT_PCT_home,FG3_PCT_home,AST_home,REB_home,PTS_away,FG_PCT_away,FT_PCT_away,FG3_PCT_away,AST_away,REB_away,HOME_TEAM_WINS,PLAYOFF,TARGET
99,2022-03-13,22101018.0,1610612760.0,1610612763.0,2021.0,118.0,42.3,82.1,31.7,25.0,44.0,125.0,47.1,90.0,37.9,30.0,57.0,0.0,0,0.0
100,2022-03-13,22101012.0,1610612751.0,1610612752.0,2021.0,110.0,49.4,72.7,20.7,30.0,39.0,107.0,47.0,85.7,37.9,27.0,40.0,1.0,0,1.0
101,2022-03-13,22101013.0,1610612765.0,1610612746.0,2021.0,102.0,45.0,90.0,41.4,23.0,44.0,106.0,44.9,80.0,33.3,25.0,46.0,0.0,0,0.0
102,2022-03-13,22101014.0,1610612738.0,1610612742.0,2021.0,92.0,37.5,89.5,24.3,16.0,45.0,95.0,44.6,73.7,40.5,20.0,42.0,0.0,0,0.0
103,2022-03-13,22101019.0,1610612756.0,1610612747.0,2021.0,140.0,56.9,85.7,36.4,36.0,51.0,111.0,44.4,74.2,40.0,24.0,37.0,1.0,0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
30121,2025-11-02,22500151.0,1610612739,1610612737,2025.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0
30122,2025-11-02,22500152.0,1610612761,1610612763,2025.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0
30123,2025-11-02,22500153.0,1610612752,1610612741,2025.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0
30124,2025-11-02,22500154.0,1610612756,1610612759,2025.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0


**Feature Engineering**

In [15]:
# Feature engineering to add: 
    # rolling averages of key stats, 
    # win/lose streaks, 
    # home/away streaks, 
    # specific matchup (team X vs team Y) rolling averages and streaks

# check that there are no NaN values
if df_combined.isnull().values.any():
    print('Warning: NaN values found in dataframe before feature engineering')
    print(df_combined[df_combined.isnull().any(axis=1)])
    df_combined = df_combined.fillna(0)

df_combined = process_features(df_combined)

#fix type conversion issues with hopsworks
df_combined['TARGET'] = df_combined['TARGET'].astype('int16')
df_combined['HOME_TEAM_WINS'] = df_combined['HOME_TEAM_WINS'].astype('int16')

# save file
df_combined.to_csv(DATAPATH / 'games_engineered.csv', index=False)


df_combined


Converting field PTS_home to int16 float64 0.0 175.0
Converting field AST_home to int16 float64 0.0 50.0
Converting field REB_home to int16 float64 0.0 74.0
Converting field PTS_away to int16 float64 0.0 176.0
Converting field AST_away to int16 float64 0.0 48.0
Converting field REB_away to int16 float64 0.0 81.0


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Unnamed: 0,GAME_DATE_EST,GAME_ID,HOME_TEAM_ID,VISITOR_TEAM_ID,SEASON,PTS_home,FG_PCT_home,FT_PCT_home,FG3_PCT_home,AST_home,...,FG3_PCT_AVG_LAST_10_ALL_x_minus_y,FG3_PCT_AVG_LAST_15_ALL_x_minus_y,AST_AVG_LAST_3_ALL_x_minus_y,AST_AVG_LAST_7_ALL_x_minus_y,AST_AVG_LAST_10_ALL_x_minus_y,AST_AVG_LAST_15_ALL_x_minus_y,REB_AVG_LAST_3_ALL_x_minus_y,REB_AVG_LAST_7_ALL_x_minus_y,REB_AVG_LAST_10_ALL_x_minus_y,REB_AVG_LAST_15_ALL_x_minus_y
0,2003-10-28 00:00:00+00:00,20300003,1610612747,1610612742,2003,109,0.505859,0.600098,0.350098,32,...,,,,,,,,,,
1,2003-10-28 00:00:00+00:00,20300002,1610612759,1610612756,2003,83,0.425049,0.769043,0.099976,20,...,,,,,,,,,,
2,2003-10-28 00:00:00+00:00,20300001,1610612755,1610612748,2003,89,0.439941,0.533203,0.350098,25,...,,,,,,,,,,
3,2003-10-29 00:00:00+00:00,20300006,1610612740,1610612737,2003,88,0.323975,0.700195,0.160034,24,...,,,,,,,,,,
4,2003-10-29 00:00:00+00:00,20300008,1610612765,1610612754,2003,87,0.392090,0.742188,0.333008,15,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28412,2025-11-02 00:00:00+00:00,22500151,1610612739,1610612737,2025,0,0.000000,0.000000,0.000000,0,...,-0.548437,-0.381250,-7.000000,-5.857143,-5.5,-5.600000,-1.000000,-1.285714,-0.8,-0.466667
28413,2025-11-02 00:00:00+00:00,22500150,1610612766,1610612762,2025,0,0.000000,0.000000,0.000000,0,...,0.942188,0.446875,-2.333333,-0.142857,-0.8,-1.133333,-10.333333,-3.714286,-3.9,0.266667
28414,2025-11-02 00:00:00+00:00,22500154,1610612756,1610612759,2025,0,0.000000,0.000000,0.000000,0,...,-3.407812,-3.598958,5.666667,2.285714,1.8,0.533333,-1.666667,-5.857143,-3.2,-3.600000
28415,2025-11-02 00:00:00+00:00,22500153,1610612752,1610612741,2025,0,0.000000,0.000000,0.000000,0,...,-2.400000,-3.806250,-8.000000,-4.714286,-7.6,-7.733333,-0.666667,-1.142857,-3.4,-3.800000


**Process Data for Convenient Dashboarding**

In [16]:
processor = NBADataProcessor()
exported_files = processor.export_data_for_dashboard()

print("\nData Processing Complete!")
print(f"Files exported for Dashboards:")
for key, value in exported_files.items():
    if value:
        print(f"- {key}: {value}")

2025-11-02 09:50:51,236 INFO: Initialized NBADataProcessor with data_path=data, model_path=models
2025-11-02 09:50:51,237 INFO: Exporting data for dashboards to data
2025-11-02 09:50:51,238 INFO: Preparing today's games data
2025-11-02 09:50:51,239 INFO: Loading data from data\games_engineered.csv
2025-11-02 09:50:51,905 INFO: Successfully loaded data: 28417 rows, 245 columns
2025-11-02 09:50:51,906 INFO: Season selection -> upcoming: 2025, completed: 2025
2025-11-02 09:50:51,907 INFO: Filtering for season: 2025
2025-11-02 09:50:51,908 INFO: Processing data for prediction
Converting field PTS_home to int16 int64 0 0
Converting field AST_home to int16 int64 0 0
Converting field REB_home to int16 int64 0 0
Converting field PTS_away to int16 int64 0 0
Converting field AST_away to int16 int64 0 0
Converting field REB_away to int16 int64 0 0
2025-11-02 09:50:51,945 INFO: Making predictions
2025-11-02 09:50:51,947 INFO: Loading model from models\model.pkl
2025-11-02 09:50:52,065 INFO: Succes

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user

2025-11-02 09:50:52,263 INFO: Successfully made predictions for 8 games
2025-11-02 09:50:52,265 INFO: Prepared predictions for 8 games today
2025-11-02 09:50:52,267 INFO: Preparing processed games data
2025-11-02 09:50:52,268 INFO: Loading data from data\games_engineered.csv


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


2025-11-02 09:50:52,943 INFO: Successfully loaded data: 28417 rows, 245 columns
2025-11-02 09:50:52,946 INFO: Season selection -> upcoming: 2025, completed: 2025
2025-11-02 09:50:52,946 INFO: Filtering for season: 2025
2025-11-02 09:50:52,947 INFO: Processing data for prediction
Converting field PTS_home to int16 int64 90 144
Converting field AST_home to int16 int64 12 40
Converting field REB_home to int16 int64 22 64
Converting field PTS_away to int16 int64 79 146
Converting field AST_away to int16 int64 10 37
Converting field REB_away to int16 int64 20 60
2025-11-02 09:50:52,988 INFO: Making predictions
2025-11-02 09:50:52,988 INFO: Removing unused features
2025-11-02 09:50:53,021 INFO: Successfully made predictions for 85 games
2025-11-02 09:50:53,027 INFO: Calculating daily running accuracy metrics
2025-11-02 09:50:53,034 INFO: Calculating team-specific daily running accuracy
2025-11-02 09:50:53,089 INFO: Calculating home/away daily running accuracy
2025-11-02 09:50:53,098 INFO: Ca

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user

2025-11-02 09:50:53,151 INFO: Processed 85 completed games with 50 correct predictions
2025-11-02 09:50:53,153 INFO: Loading data from data\games_engineered.csv
2025-11-02 09:50:53,799 INFO: Successfully loaded data: 28417 rows, 245 columns
2025-11-02 09:50:53,801 INFO: Season selection -> upcoming: 2025, completed: 2025
2025-11-02 09:50:53,812 INFO: Filtering columns for dashboard
2025-11-02 09:50:53,815 INFO: Filtered dataframe from 257 to 22 columns
2025-11-02 09:50:53,817 INFO: Exported filtered games data to data\games_dashboard.csv
2025-11-02 09:50:53,817 INFO: Exporting running accuracy metrics
2025-11-02 09:50:53,828 INFO: Calculating rolling average accuracy metrics with progression-based window
2025-11-02 09:50:53,834 INFO: Calculated 12 weekly average periods ending on game dates
2025-11-02 09:50:53,837 INFO: Added 12 weekly average metrics
2025-11-02 09:50:53,839 INFO: Exported running accuracy metrics to data\running_accuracy_metrics.csv
2025-11-02 09:50:53,843 INFO: Expor

**Upload to Google Drive**

In [17]:

files_to_upload = [
    DATAPATH / 'games_dashboard.csv',
    DATAPATH / 'season_summary_stats.csv',
    DATAPATH / 'running_accuracy_metrics.csv'
]
upload_to_drive(files_to_upload, GOOGLE_FOLDER_ID)

2025-11-02 09:50:53,863 INFO: file_cache is only supported with oauth2client<4.0.0
Found 3 existing files in the folder
File 'games_dashboard.csv' updated successfully. File ID: 1g8Mc2SQafXApx8pwPBPhol2bDXiNOT46
File 'season_summary_stats.csv' updated successfully. File ID: 1zeqYZGhbfFj5kmj5ZRI1qXL2HKnU3B8S
File 'running_accuracy_metrics.csv' updated successfully. File ID: 18Hb24olk1Y4iroj55CEvDSxKXEGof4Ti


['1g8Mc2SQafXApx8pwPBPhol2bDXiNOT46',
 '1zeqYZGhbfFj5kmj5ZRI1qXL2HKnU3B8S',
 '18Hb24olk1Y4iroj55CEvDSxKXEGof4Ti']