# NBA Player Statistics Prediction - Inference

This notebook loads all trained models and makes predictions on the latest season features data for the 2025-26 season.

## Models Used:
1. **Ridge Regression** (Linear model with regularization)
2. **XGBoost** (Gradient boosting)
3. **LightGBM** (Gradient boosting)
4. **Bayesian Multi-Output Regression** (PyMC with MatrixNormal)
5. **LSTM** (Deep learning with PyTorch)
6. **Transformer** (Deep learning with PyTorch)
7. **Ensemble Methods** (Simple averaging, Weighted averaging, Stacking)

## Input Data:
- `latest_season_features_for_inference.csv`: Features for the 2024-25 season

## Output:
- Individual model predictions saved as CSV files in Output folder with player mapping
- Ensemble predictions saved as CSV files in Output folder with player mapping
- Comprehensive comparison and analysis
- Predictions for 2025-26 season

In [1]:
import pandas as pd
import numpy as np
import joblib
import os
import warnings
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)
RANDOM_SEED = 42

In [2]:
def find_backend_dir(start_path=None):
    """
    Walk up directories from start_path (or cwd) until a folder named 'backend' is found.
    Returns the absolute path to the 'backend' folder.
    """
    if start_path is None:
        start_path = os.getcwd()
    curr_path = os.path.abspath(start_path)
    while True:
        # Check if 'backend' exists in this directory
        candidate = os.path.join(curr_path, "backend")
        if os.path.isdir(candidate):
            return candidate
        # If at filesystem root, stop
        parent = os.path.dirname(curr_path)
        if curr_path == parent:
            break
        curr_path = parent
    raise FileNotFoundError(f"No 'backend' directory found upward from {start_path}")

# Find the backend directory and CSV folder
backend_dir = find_backend_dir()
csv_dir = os.path.join(backend_dir, "CSVs")
models_dir = os.path.join(backend_dir, "Models")
output_dir = os.path.join(backend_dir, "Output")

# Create output directory if it doesn't exist
os.makedirs(output_dir, exist_ok=True)

print(f"Backend directory: {backend_dir}")
print(f"CSV directory: {csv_dir}")
print(f"Models directory: {models_dir}")
print(f"Output directory: {output_dir}")

Backend directory: /Users/jeevanparmar/Uni/MSE 436/Project-Mono-Repo/backend
CSV directory: /Users/jeevanparmar/Uni/MSE 436/Project-Mono-Repo/backend/CSVs
Models directory: /Users/jeevanparmar/Uni/MSE 436/Project-Mono-Repo/backend/Models
Output directory: /Users/jeevanparmar/Uni/MSE 436/Project-Mono-Repo/backend/Output


In [3]:
# Load the inference data
print("Loading inference data...")
inference_data = pd.read_csv(os.path.join(csv_dir, "latest_season_features_for_inference.csv"))

print(f"Inference data shape: {inference_data.shape}")
print(f"Columns: {len(inference_data.columns)}")
print(f"\nFirst few columns: {list(inference_data.columns[:10])}")
print(f"\nSample data:")
print(inference_data.head())

Loading inference data...
Inference data shape: (422, 58)
Columns: 58

First few columns: ['PERSON_ID', 'SEASON_ID', 'Points', 'Minutes', 'FGM', 'FGA', 'FG%', '3PM', '3PA', '3P%']

Sample data:
   PERSON_ID SEASON_ID     Points    Minutes        FGM        FGA        FG%  \
0     2544.0   2023-24  25.661972  35.323944   9.647887  17.873239  54.223944   
1   101108.0   2023-24   9.189655  26.431034   3.551724   8.051724  42.506897   
2   200768.0   2023-24   8.116667  28.183333   2.716667   6.283333  40.971667   
3   201142.0   2023-24  27.093333  37.200000  10.013333  19.146667  53.098667   
4   201143.0   2023-24   8.646154  26.800000   3.292308   6.446154  51.089231   

        3PM       3PA        3P%  ...  SEASON_Spring        AGE  \
0  2.098592  5.112676  40.459155  ...           True  40.514716   
1  1.344828  3.620690  33.778947  ...           True  40.167009   
2  1.633333  4.166667  38.740678  ...           True  39.282683   
3  2.240000  5.426667  42.233333  ...           Tru

In [4]:
# Load column information from different model files
print("Loading column information...")

# Try to load column info from different model files
column_info_sources = [
    'ridge_columns.joblib',
    'tree_models_columns.joblib',
    'bayesian_multioutput_columns.joblib',
    'lstm_columns.joblib',
    'transformer_columns.joblib'
]

feature_cols = None
target_cols = None

for source in column_info_sources:
    try:
        columns_info = joblib.load(os.path.join(models_dir, source))
        feature_cols = columns_info['feature_cols']
        target_cols = columns_info['target_cols']
        print(f"Loaded column info from {source}")
        break
    except Exception as e:
        print(f"Could not load from {source}: {e}")
        continue

if feature_cols is None or target_cols is None:
    # Fallback: infer columns from inference data
    feature_cols = [col for col in inference_data.columns if not col.startswith('next_') and col not in ['PERSON_ID', 'SEASON_ID']]
    target_cols = [col for col in inference_data.columns if col.startswith('next_')]
    print("Using fallback column inference")

print(f"\nFeature columns: {len(feature_cols)}")
print(f"Target columns: {len(target_cols)}")
print(f"\nTarget variables: {target_cols}")

Loading column information...
Loaded column info from ridge_columns.joblib

Feature columns: 56
Target columns: 21

Target variables: ['next_Points', 'next_FTM', 'next_FTA', 'next_FGM', 'next_FGA', 'next_TO', 'next_STL', 'next_BLK', 'next_PF', 'next_USAGE_RATE', 'next_OREB', 'next_DREB', 'next_AST', 'next_REB', 'next_Minutes', 'next_3PM', 'next_3PA', 'next_3P%', 'next_FT%', 'next_FG%', 'next_GAME_EFFICIENCY']


In [5]:
# Prepare inference data
print("Preparing inference data...")

# Select features
X_inference = inference_data[feature_cols].copy()

# Handle infinite values
X_inference.replace([np.inf, -np.inf], np.nan, inplace=True)

print(f"Inference features shape: {X_inference.shape}")
print(f"Missing values: {X_inference.isnull().sum().sum()}")

# Check if we have the required columns
missing_cols = set(feature_cols) - set(X_inference.columns)
if missing_cols:
    print(f"Warning: Missing columns: {missing_cols}")
    # Add missing columns with zeros
    for col in missing_cols:
        X_inference[col] = 0

# Ensure correct column order
X_inference = X_inference[feature_cols]

print(f"Final inference features shape: {X_inference.shape}")

Preparing inference data...
Inference features shape: (422, 56)
Missing values: 108
Final inference features shape: (422, 56)


## Load and Run Individual Models

We'll load each trained model and make predictions using their respective prediction functions.

In [6]:
# Function to load model predictions using direct model loading
def load_model_predictions(model_name, X_inference, models_dir):
    """
    Load pre-trained model and get predictions by loading models directly
    """
    try:
        if model_name == 'ridge':
            # Load Ridge model directly
            model = joblib.load(os.path.join(models_dir, 'ridge_regression_season_model.joblib'))
            
            # Handle missing values for Ridge
            from sklearn.impute import SimpleImputer
            imputer = SimpleImputer(strategy='median')
            X_imputed = pd.DataFrame(
                imputer.fit_transform(X_inference),
                columns=X_inference.columns,
                index=X_inference.index
            )
            
            predictions = model.predict(X_imputed)
            
            # Convert to DataFrame
            predictions_df = pd.DataFrame(
                predictions,
                columns=target_cols,
                index=X_inference.index
            )
            
        elif model_name == 'xgboost':
            # Load XGBoost model directly
            model = joblib.load(os.path.join(models_dir, 'xgboost_multioutput_tuned_model.joblib'))
            predictions = model.predict(X_inference)
            
            # Convert to DataFrame
            predictions_df = pd.DataFrame(
                predictions,
                columns=target_cols,
                index=X_inference.index
            )
            
        elif model_name == 'lightgbm':
            # Load LightGBM model directly
            model = joblib.load(os.path.join(models_dir, 'lightgbm_multioutput_tuned_model.joblib'))
            predictions = model.predict(X_inference)
            
            # Convert to DataFrame
            predictions_df = pd.DataFrame(
                predictions,
                columns=target_cols,
                index=X_inference.index
            )
            
        elif model_name == 'bayesian':
            # Load Bayesian model
            import arviz as az
            trace = az.from_netcdf(os.path.join(models_dir, 'bayesian_multioutput_trace.nc'))
            scaler = joblib.load(os.path.join(models_dir, 'bayesian_multioutput_scaler.joblib'))
            imputer_X = joblib.load(os.path.join(models_dir, 'bayesian_multioutput_imputer_X.joblib'))
            
            # Preprocess data
            X_processed = pd.DataFrame(
                imputer_X.transform(X_inference),
                columns=X_inference.columns,
                index=X_inference.index
            )
            
            # Scale features
            X_scaled = scaler.transform(X_processed)
            
            # Get predictions
            beta_samples = trace.posterior['beta'].values
            intercept_samples = trace.posterior['intercept'].values
            
            # Make predictions
            pred = np.mean(np.dot(X_scaled, beta_samples) + intercept_samples, axis=(0, 1))
            
            # Handle shape mismatch - take mean if too many samples
            if pred.shape[0] > len(X_inference):
                pred_mean = np.mean(pred, axis=0)
                pred = np.tile(pred_mean, (len(X_inference), 1))
            
            # Convert to DataFrame
            predictions_df = pd.DataFrame(
                pred,
                columns=target_cols,
                index=X_inference.index
            )
            
        elif model_name == 'lstm':
            # Load LSTM model using the saved prediction function
            import torch
            
            # Load the saved prediction function
            predict_with_lstm_model = joblib.load(os.path.join(models_dir, 'lstm_prediction_function.joblib'))
            
            # Load model components
            model_info = joblib.load(os.path.join(models_dir, 'lstm_model_info.joblib'))
            scaler_X = joblib.load(os.path.join(models_dir, 'lstm_scaler_X.joblib'))
            scaler_y = joblib.load(os.path.join(models_dir, 'lstm_scaler_y.joblib'))
            imputer_X = joblib.load(os.path.join(models_dir, 'lstm_imputer_X.joblib'))
            imputer_y = joblib.load(os.path.join(models_dir, 'lstm_imputer_y.joblib'))
            
            # Load the actual trained model
            device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
            
            # Define LSTM model class (same as training)
            class LSTMModel(torch.nn.Module):
                def __init__(self, input_size, hidden_size, num_layers, output_size, dropout=0.2):
                    super(LSTMModel, self).__init__()
                    self.hidden_size = hidden_size
                    self.num_layers = num_layers
                    self.lstm = torch.nn.LSTM(input_size, hidden_size, num_layers, 
                                             batch_first=True, dropout=dropout if num_layers > 1 else 0)
                    self.fc1 = torch.nn.Linear(hidden_size, hidden_size // 2)
                    self.dropout = torch.nn.Dropout(dropout)
                    self.fc2 = torch.nn.Linear(hidden_size // 2, output_size)
                    self.relu = torch.nn.ReLU()
                
                def forward(self, x):
                    h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
                    c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
                    lstm_out, _ = self.lstm(x, (h0, c0))
                    lstm_out = lstm_out[:, -1, :]
                    out = self.relu(self.fc1(lstm_out))
                    out = self.dropout(out)
                    out = self.fc2(out)
                    return out
            
            # Create and load the model
            model = LSTMModel(
                input_size=model_info['input_size'],
                hidden_size=model_info['hidden_size'],
                num_layers=model_info['num_layers'],
                output_size=model_info['output_size'],
                dropout=model_info['dropout']
            )
            
            model.load_state_dict(torch.load(os.path.join(models_dir, 'lstm_best_model.pth'), map_location=device))
            model = model.to(device)
            
            # Use the saved prediction function
            predictions_df = predict_with_lstm_model(
                X_inference, model, scaler_X, scaler_y, imputer_X, imputer_y,
                feature_cols, target_cols, model_info['sequence_length']
            )
        
        elif model_name == 'transformer':
            # Load Transformer model
            import torch
            
            # Load model components
            scaler = joblib.load(os.path.join(models_dir, 'transformer_scaler.joblib'))
            imputer_X = joblib.load(os.path.join(models_dir, 'transformer_imputer_X.joblib'))
            
            # Preprocess data
            X_processed = pd.DataFrame(
                imputer_X.transform(X_inference),
                columns=X_inference.columns,
                index=X_inference.index
            )
            
            # Scale features
            X_scaled = scaler.transform(X_processed)
            
            # Load and use transformer prediction function
            try:
                predict_with_transformer_model = joblib.load(os.path.join(models_dir, 'transformer_prediction_function.joblib'))
                predictions = predict_with_transformer_model(X_scaled, models_dir, sequence_length=10)
                
                # Convert to DataFrame
                predictions_df = pd.DataFrame(
                    predictions,
                    columns=target_cols,
                    index=X_inference.index
                )
            except Exception as e:
                print(f"Transformer prediction function failed: {e}")
                # Fallback: create dummy predictions
                predictions_df = pd.DataFrame(
                    np.zeros((len(X_inference), len(target_cols))),
                    columns=target_cols,
                    index=X_inference.index
                )
        
        else:
            raise ValueError(f"Unknown model: {model_name}")
        
        return predictions_df
        
    except Exception as e:
        print(f"Error with {model_name} model: {e}")
        return None

In [7]:
# Load and run all individual models
print("Loading and running all individual models...")

models = ['ridge', 'xgboost', 'lightgbm', 'bayesian', 'lstm', 'transformer']
individual_predictions = {}

for model_name in models:
    print(f"\nRunning {model_name.upper()} model...")
    pred_df = load_model_predictions(model_name, X_inference, models_dir)
    
    if pred_df is not None:
        individual_predictions[model_name] = pred_df
        print(f"  {model_name.upper()} predictions shape: {pred_df.shape}")
        print(f"  Sample predictions:")
        print(pred_df.head(3))
    else:
        print(f"  {model_name.upper()} failed to run")

print(f"\nSuccessfully ran {len(individual_predictions)} individual models")

Loading and running all individual models...

Running RIDGE model...
  RIDGE predictions shape: (422, 21)
  Sample predictions:
   next_Points  next_FTM  next_FTA  next_FGM   next_FGA   next_TO  next_STL  \
0    25.344830  4.655561  5.839144  9.355871  18.144210  1.270610  0.687263   
1    10.113258  0.968518  1.161410  3.807077   8.584739  1.272870  0.118369   
2     8.707215  1.168798  1.410836  2.942211   7.210588  1.065596  0.261160   

   next_BLK   next_PF  next_USAGE_RATE  ...  next_DREB  next_AST  next_REB  \
0  3.301547  1.773465        60.665309  ...   6.612565  7.763742  7.776291   
1  1.574110  1.693445        36.941078  ...   3.145372  6.380347  3.532516   
2  1.438889  2.134729        31.657852  ...   2.560087  4.298455  3.054039   

   next_Minutes  next_3PM  next_3PA   next_3P%   next_FT%   next_FG%  \
0     36.414858  1.977527  5.203516  34.387224  80.338979  54.053778   
1     27.473128  1.530586  3.954138  34.846396  79.513747  42.803483   
2     28.296150  1.653995 

## Create Ensemble Predictions

We'll create ensemble predictions using different methods.

In [8]:
# Create ensemble predictions
print("\nCreating ensemble predictions...")

ensemble_predictions = {}

if len(individual_predictions) > 1:
    # Get all prediction arrays
    pred_arrays = [individual_predictions[model].values for model in individual_predictions.keys()]
    
    # Simple averaging (excluding transformer)
    simple_models = [m for m in individual_predictions.keys() if m != 'transformer']
    if len(simple_models) > 1:
        simple_pred_arrays = [individual_predictions[model].values for model in simple_models]
        simple_avg_pred = np.mean(simple_pred_arrays, axis=0)
        
        simple_avg_df = pd.DataFrame(
            simple_avg_pred,
            columns=target_cols,
            index=X_inference.index
        )
        
        ensemble_predictions['ensemble_simple'] = simple_avg_df
        print(f"  Simple average shape: {simple_avg_df.shape}")
    
    # Weighted averaging (excluding transformer)
    if len(simple_models) > 1:
        weights = np.ones(len(simple_models)) / len(simple_models)
        weighted_avg_pred = np.sum([weights[i] * simple_pred_arrays[i] for i in range(len(simple_pred_arrays))], axis=0)
        
        weighted_avg_df = pd.DataFrame(
            weighted_avg_pred,
            columns=target_cols,
            index=X_inference.index
        )
        
        ensemble_predictions['ensemble_weighted'] = weighted_avg_df
        print(f"  Weighted average shape: {weighted_avg_df.shape}")
    
    # Stacking ensemble (including transformer)
    try:
        # Load ensemble results for stacking
        ensemble_results = joblib.load(os.path.join(models_dir, 'ensemble_results.joblib'))
        
        # Use the stacking meta-learner
        meta_learner = ensemble_results['stacking']['meta_learner']
        
        # Prepare meta-features from all models
        all_pred_arrays = [individual_predictions[model].values for model in individual_predictions.keys()]
        meta_features = np.stack(all_pred_arrays, axis=0).transpose(1, 0, 2).reshape(len(X_inference), -1)
        
        # Make stacking predictions
        stacking_pred_flat = meta_learner.predict(meta_features)
        stacking_pred = stacking_pred_flat.reshape(len(X_inference), len(target_cols))
        
        stacking_df = pd.DataFrame(
            stacking_pred,
            columns=target_cols,
            index=X_inference.index
        )
        
        ensemble_predictions['ensemble_stacking'] = stacking_df
        print(f"  Stacking ensemble shape: {stacking_df.shape}")
        
    except Exception as e:
        print(f"  Stacking ensemble failed: {e}")
    
    print(f"Created {len(ensemble_predictions)} ensemble methods")
else:
    print("Not enough models for ensemble")

# Combine all predictions
all_predictions = {**individual_predictions, **ensemble_predictions}
print(f"\nTotal prediction methods: {len(all_predictions)}")


Creating ensemble predictions...
  Simple average shape: (422, 21)
  Weighted average shape: (422, 21)
  Stacking ensemble failed: X has 105 features, but Ridge is expecting 84 features as input.
Created 2 ensemble methods

Total prediction methods: 7


## Save Individual Model Predictions as CSV Files with Player Mapping

In [9]:
# Save predictions as individual CSV files with player mapping
print("\nSaving predictions as individual CSV files with player mapping...")

# Create timestamp for file naming
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

# Save individual model predictions with player mapping
for model_name, pred_df in individual_predictions.items():
    # Add player information
    pred_df_with_players = pred_df.copy()
    pred_df_with_players['PERSON_ID'] = inference_data['PERSON_ID']
    pred_df_with_players['SEASON_ID'] = '2025-26'  # Predictions for next season
    pred_df_with_players['INPUT_SEASON_ID'] = inference_data['SEASON_ID']  # Season used for prediction
    
    # Reorder columns to put player info first
    player_cols = ['PERSON_ID', 'SEASON_ID', 'INPUT_SEASON_ID']
    other_cols = [col for col in pred_df_with_players.columns if col not in player_cols]
    pred_df_with_players = pred_df_with_players[player_cols + other_cols]
    
    filename = f"{model_name}_predictions_{timestamp}.csv"
    filepath = os.path.join(output_dir, filename)
    pred_df_with_players.to_csv(filepath, index=False)
    print(f"  Saved {model_name} predictions to: {filepath}")
    print(f"  Shape: {pred_df_with_players.shape}")
    print(f"  Sample data:")
    print(pred_df_with_players.head(3))

# Save ensemble predictions with player mapping
for ensemble_name, pred_df in ensemble_predictions.items():
    # Add player information
    pred_df_with_players = pred_df.copy()
    pred_df_with_players['PERSON_ID'] = inference_data['PERSON_ID']
    pred_df_with_players['SEASON_ID'] = '2025-26'  # Predictions for next season
    pred_df_with_players['INPUT_SEASON_ID'] = inference_data['SEASON_ID']  # Season used for prediction
    
    # Reorder columns to put player info first
    player_cols = ['PERSON_ID', 'SEASON_ID', 'INPUT_SEASON_ID']
    other_cols = [col for col in pred_df_with_players.columns if col not in player_cols]
    pred_df_with_players = pred_df_with_players[player_cols + other_cols]
    
    filename = f"{ensemble_name}_predictions_{timestamp}.csv"
    filepath = os.path.join(output_dir, filename)
    pred_df_with_players.to_csv(filepath, index=False)
    print(f"  Saved {ensemble_name} predictions to: {filepath}")
    print(f"  Shape: {pred_df_with_players.shape}")
    print(f"  Sample data:")
    print(pred_df_with_players.head(3))

print("\nAll predictions saved successfully as CSV files with player mapping!")


Saving predictions as individual CSV files with player mapping...
  Saved ridge predictions to: /Users/jeevanparmar/Uni/MSE 436/Project-Mono-Repo/backend/Output/ridge_predictions_20250707_124240.csv
  Shape: (422, 24)
  Sample data:
   PERSON_ID SEASON_ID INPUT_SEASON_ID  next_Points  next_FTM  next_FTA  \
0     2544.0   2025-26         2023-24    25.344830  4.655561  5.839144   
1   101108.0   2025-26         2023-24    10.113258  0.968518  1.161410   
2   200768.0   2025-26         2023-24     8.707215  1.168798  1.410836   

   next_FGM   next_FGA   next_TO  next_STL  ...  next_DREB  next_AST  \
0  9.355871  18.144210  1.270610  0.687263  ...   6.612565  7.763742   
1  3.807077   8.584739  1.272870  0.118369  ...   3.145372  6.380347   
2  2.942211   7.210588  1.065596  0.261160  ...   2.560087  4.298455   

   next_REB  next_Minutes  next_3PM  next_3PA   next_3P%   next_FT%  \
0  7.776291     36.414858  1.977527  5.203516  34.387224  80.338979   
1  3.532516     27.473128  1.53058

In [10]:
# Create summary report
print("\nCreating summary report...")

report = {
    'timestamp': timestamp,
    'input_data_shape': inference_data.shape,
    'feature_columns': len(feature_cols),
    'target_columns': len(target_cols),
    'models_used': list(all_predictions.keys()),
    'individual_models': list(individual_predictions.keys()),
    'ensemble_methods': list(ensemble_predictions.keys()),
    'predictions_summary': {},
    'prediction_season': '2025-26',
    'input_season': '2024-25'
}

# Add summary statistics for each model
for model_name, pred_df in all_predictions.items():
    report['predictions_summary'][model_name] = {
        'shape': pred_df.shape,
        'mean_values': pred_df.mean().to_dict(),
        'std_values': pred_df.std().to_dict(),
        'min_values': pred_df.min().to_dict(),
        'max_values': pred_df.max().to_dict()
    }

# Save report
report_filename = f"inference_report_{timestamp}.joblib"
report_filepath = os.path.join(output_dir, report_filename)
joblib.dump(report, report_filepath)
print(f"  Saved inference report to: {report_filepath}")

# Print summary
print("\n" + "="*60)
print("INFERENCE SUMMARY REPORT")
print("="*60)
print(f"Timestamp: {timestamp}")
print(f"Input data shape: {inference_data.shape}")
print(f"Feature columns: {len(feature_cols)}")
print(f"Target columns: {len(target_cols)}")
print(f"Individual models: {len(individual_predictions)}")
print(f"Ensemble methods: {len(ensemble_predictions)}")
print(f"Total prediction methods: {len(all_predictions)}")
print(f"Prediction season: 2025-26")
print(f"Input season: 2024-25")
print(f"\nIndividual Models:")
for model_name in individual_predictions.keys():
    print(f"  - {model_name.upper()}")
print(f"\nEnsemble Methods:")
for ensemble_name in ensemble_predictions.keys():
    print(f"  - {ensemble_name.upper()}")
print(f"\nOutput files saved to: {output_dir}")
print("="*60)


Creating summary report...
  Saved inference report to: /Users/jeevanparmar/Uni/MSE 436/Project-Mono-Repo/backend/Output/inference_report_20250707_124240.joblib

INFERENCE SUMMARY REPORT
Timestamp: 20250707_124240
Input data shape: (422, 58)
Feature columns: 56
Target columns: 21
Individual models: 5
Ensemble methods: 2
Total prediction methods: 7
Prediction season: 2025-26
Input season: 2024-25

Individual Models:
  - RIDGE
  - XGBOOST
  - LIGHTGBM
  - BAYESIAN
  - TRANSFORMER

Ensemble Methods:
  - ENSEMBLE_SIMPLE
  - ENSEMBLE_WEIGHTED

Output files saved to: /Users/jeevanparmar/Uni/MSE 436/Project-Mono-Repo/backend/Output


## Inference Summary

The inference process has been completed successfully. Here's what was accomplished:

### Models Used:
- **Ridge Regression** (using saved model)
- **XGBoost** (using saved model)
- **LightGBM** (using saved model)
- **Bayesian Multi-Output Regression** (using saved model)
- **LSTM** (using saved .pth file)
- **Transformer** (using saved .pth file)
- **Ensemble Methods** (Simple averaging, Weighted averaging, Stacking)

### Key Features:
- **Multi-model predictions**: Each model provides its own predictions
- **Ensemble combinations**: Simple, weighted, and stacking ensemble methods
- **Individual CSV outputs**: Each model's predictions saved as separate CSV files
- **Player mapping**: All predictions include PERSON_ID for player identification
- **Season tracking**: Clear indication of input season (2024-25) and prediction season (2025-26)
- **Frontend ready**: CSV files can be directly used in the frontend
- **Robust error handling**: Graceful handling of missing models

### Output Files Created:
- `ridge_predictions_[timestamp].csv`
- `xgboost_predictions_[timestamp].csv`
- `lightgbm_predictions_[timestamp].csv`
- `bayesian_predictions_[timestamp].csv`
- `lstm_predictions_[timestamp].csv`
- `transformer_predictions_[timestamp].csv`
- `ensemble_simple_predictions_[timestamp].csv`
- `ensemble_weighted_predictions_[timestamp].csv`
- `ensemble_stacking_predictions_[timestamp].csv` (if available)
- `inference_report_[timestamp].joblib`

### CSV File Structure:
Each CSV file contains:
- `PERSON_ID`: Player identification number
- `SEASON_ID`: '2025-26' (prediction season)
- `INPUT_SEASON_ID`: '2024-25' (season used for prediction)
- All 21 predicted statistics (next_Points, next_FTM, etc.)

### Usage:
These CSV files can be directly loaded into the frontend for visualization and comparison of different model predictions for the 2025-26 NBA season.