# Prediction Task: Teams Ranking

In this notebook, we present an approach for predicting team rankings.

## Load Dependencies

In [1]:
import pandas as pd
import numpy as np

from sklearn.metrics import (
    accuracy_score, 
    mean_absolute_error, 
    roc_auc_score,
)
from sklearn.preprocessing import label_binarize

# Used for Regressors
from sklearn.ensemble import (
    RandomForestRegressor, 
    GradientBoostingRegressor, 
    ExtraTreesRegressor
)
from sklearn.linear_model import Ridge

import warnings
warnings.filterwarnings("ignore")
import os

## Add Weight to Features

Based on the Spearman correlation analysis conducted in the data preparation notebook, we will assign weights to each feature. Additionally, we will apply temporal weighting to the years as follows: 70% for the previous year, 15% for the year before that, 10% for two years prior, and 5% for the earliest year.


In [2]:
def add_weighted_history(df, test_year=None):
    df = df.sort_values(["tmID", "year"]).copy()
    
    feature_weights = {
        "made_playoffs":0.33,
        "prev_win_pct": 0.8,
        "prev_coach_win_pct": 0.5,
        "Performance_weighted_2yr": 0.85,
        "OffPerformance_weighted_2yr": 0.6,
        "DefPerformance_weighted_2yr": 1.0,
        "Performance_weighted_3yr": 0.35,
        "OffPerformance_weighted_3yr": 0.25,
        "DefPerformance_weighted_3yr": 0.45,
        "Performance_weighted_4yr": 0.3,
        "OffPerformance_weighted_4yr": 0.2,
        "DefPerformance_weighted_4yr": 0.35
    }

    if test_year is not None:
        df = df[df["year"] <= test_year].copy()

    for feat, importance in feature_weights.items():
        df[f"{feat}_weighted"] = (
            importance * 0.7 * df.groupby("tmID")[feat].shift(1) +
            importance * 0.15 * df.groupby("tmID")[feat].shift(2) +
            importance * 0.1 * df.groupby("tmID")[feat].shift(3) +
            importance * 0.05 * df.groupby("tmID")[feat].shift(4)
        )
    return df


## Predict Team Conference Rank

This code defines the `predict_team_conference_rank` function, which predicts team rankings within each conference for a given year.

#### Steps in the Code:

1. **Load and Prepare Data**  
   - Reads the `teams.csv` dataset.
   - Applies historical weighting to features using `add_weighted_history`.
   - Selects relevant features, including historical performance, coaching stats, and multi-year offensive/defensive metrics.

2. **Split Data into Train and Test Sets**  
   - Training data consists of all rows from `train_years`.
   - Test data consists of rows from the `test_year`.
   - Missing feature values in the test set are filled with the corresponding training set mean.

3. **Model Training and Prediction**  
   Four models are trained and used to predict team scores, which are then converted to ranks within each conference:

   - **Extra Trees Regressor**  
     ```python
     ExtraTreesRegressor(n_estimators=200, max_depth=4, min_samples_split=2, random_state=42)
     ```
     - **n_estimators=200**: Uses 200 trees to reduce variance while keeping training manageable.  
     - **max_depth=4**: Shallow trees prevent overfitting on small datasets.  
     - **min_samples_split=2**: Standard minimum to allow splits.  
     - **Reason**: Ensemble method that reduces variance and performs well on small datasets.

   - **Random Forest Regressor**  
     ```python
     RandomForestRegressor(n_estimators=200, max_depth=4, min_samples_split=2, max_features='sqrt', random_state=42)
     ```
     - **max_features='sqrt'**: Randomly selects a subset of features at each split to reduce correlation between trees.  
     - Other parameters similar to Extra Trees.  
     - **Reason**: Handles non-linear relationships well and generalizes on small datasets.

   - **Gradient Boosting Regressor**  
     ```python
     GradientBoostingRegressor(n_estimators=200, learning_rate=0.05, max_depth=3, subsample=0.8, random_state=42)
     ```
     - **learning_rate=0.05**: Small steps reduce overfitting risk.  
     - **max_depth=3**: Shallow trees prevent overfitting.  
     - **subsample=0.8**: Uses 80% of data per tree to add randomness and reduce variance.  
     - **Reason**: Sequentially corrects previous errors while controlling overfitting.

   - **Ridge Regression**  
     ```python
     Ridge(alpha=1.0, random_state=42)
     ```
     - **alpha=1.0**: L2 regularization to reduce overfitting.  
     - **Reason**: Simple linear model with regularization; serves as a robust baseline.

4. **Evaluation Metrics**  
   - **MAE (Mean Absolute Error)** – measures average ranking error.  
   - **Accuracy** – proportion of exact rank matches.  
   - **AUC** – calculated using a binarized approach to assess how well the model separates ranks.


**Reason for Parameter Choices**:  
- Dataset is small (142 rows), so shallow trees, regularization, and small learning rates prevent overfitting.  
- Ensemble models provide robustness and can handle non-linear relationships, while Ridge regression offers a stable linear baseline.  


In [3]:
def predict_team_conference_rank(test_year):

    df = pd.read_csv("../predict_datasets/teams.csv")
    df = add_weighted_history(df, test_year=test_year)

    feature_cols = [
        "made_playoffs_weighted",
        "prev_win_pct_weighted",                # historical win %
        "prev_coach_win_pct_weighted",          # historical coach performance

        "Performance_weighted_2yr_weighted",    # overall 2-year
        "OffPerformance_weighted_2yr_weighted", # offense 2-year
        "DefPerformance_weighted_2yr_weighted", # defense 2-year

        "Performance_weighted_3yr_weighted",    # overall 3-year
        "OffPerformance_weighted_3yr_weighted", # offense 3-year
        "DefPerformance_weighted_3yr_weighted", # defense 3-year

        "Performance_weighted_4yr_weighted",    # overall 4-year
        "OffPerformance_weighted_4yr_weighted", # offense 4-year
        "DefPerformance_weighted_4yr_weighted"  # defense 4-year
    ]

    target = "rank"

    train_df_base = df[df["year"].isin(list(range(test_year-4, test_year)))]
    test_df_base = df[df["year"] == test_year]

    # Prepare training data
    train_clean_full = train_df_base.dropna(subset=feature_cols + [target])
    train_clean_full = train_clean_full.sort_values(["year", "confID"]).copy()
    
    X_train_full = train_clean_full[feature_cols].values
    y_train_full = train_clean_full[target].values
    
    # Group info for CatBoost
    train_clean_full['group_id'] = train_clean_full.groupby(["year", "confID"]).ngroup()

    # Prepare test data
    test_df_full = test_df_base.copy()
    for col in feature_cols:
        train_mean = train_clean_full[col].mean()
        test_df_full[col].fillna(train_mean, inplace=True)
    
    X_test_full = test_df_full[feature_cols].values
    all_predictions = test_df_full[["year", "confID", "tmID", target]].copy()
    metrics_list = []

    models = {
        "ExtraTrees": ExtraTreesRegressor(n_estimators=200, max_depth=4, min_samples_split=2, random_state=42),
        "RandomForest": RandomForestRegressor(n_estimators=200, max_depth=4, min_samples_split=2, max_features='sqrt', random_state=42),
        "GradientBoosting": GradientBoostingRegressor(n_estimators=200, learning_rate=0.05, max_depth=3, subsample=0.8, random_state=42),
        "Ridge": Ridge(alpha=1.0, random_state=42)
    }

    for name, model in models.items():
        model.fit(X_train_full, y_train_full)

        scores = model.predict(X_test_full)
        test_scores_df = test_df_full.copy()
        test_scores_df['score'] = scores
        
        ascending_rank = name != "CatBoostRanker"
        predicted_ranks = test_scores_df.groupby(["year","confID"])["score"].rank(
            method="first", ascending=ascending_rank
        ).astype(int).values

        all_predictions[f"{name}_Rank"] = predicted_ranks

    # Calculate metrics by conference
    for conf_id in all_predictions["confID"].unique():
        conf_results = all_predictions[all_predictions["confID"] == conf_id].copy()
        if conf_results.empty: continue
        actual = conf_results[target].values
        
        for name in models.keys():
            predicted_ranks = conf_results[f"{name}_Rank"].values
            
            mae = mean_absolute_error(actual, predicted_ranks)
            acc = accuracy_score(actual, predicted_ranks)
            
            try:
                classes = np.unique(np.concatenate([actual, predicted_ranks]))
                
                try:
                    actual_bin = label_binarize(actual, classes=classes)
                    predicted_bin = label_binarize(predicted_ranks, classes=classes)
                    auc = 0.5 if actual_bin.shape[1] == 1 else roc_auc_score(actual_bin, predicted_bin, average='macro', multi_class='ovr')
                except:
                    auc = 0.0
            except:
                auc = 0.0

            metrics_list.append({
                "year": test_year,
                "confID": conf_id,
                "model": name,
                "MAE": mae,
                "Accuracy": acc,
                "AUC": auc
            })

    return all_predictions, pd.DataFrame(metrics_list)


## Evaluate Predictions

The `evaluate_predictions` function summarizes and visualizes model performance:

- **Metrics Table:**  
  Aggregates `all_metrics` by `year` and `model`, computing average MAE, Accuracy, and AUC. Adds a rank and displays a styled table with color gradients for easy comparison.

- **Conference Predictions:**  
  For each conference, shows actual and predicted ranks from all models in a styled table, highlighting differences with color gradients.

This allows quick assessment of **which models perform best** overall and how well they predict **team rankings within each conference**.


In [4]:
from IPython.display import display
import pandas as pd

def evaluate_predictions(all_results, all_metrics):

    avg = (
        all_metrics
        .groupby(["year", "model"])
        .agg(
            Avg_MAE=("MAE", "mean"),
            Avg_Accuracy=("Accuracy", "mean"),
            Avg_AUC=("AUC", "mean"),
        )
        .reset_index()
        .sort_values("Avg_Accuracy", ascending=False)
    )

    df_display = avg.copy()
    df_display.insert(0, "Rank", range(1, len(df_display)+1))

    styled = (
        df_display.style
        .set_table_styles([
            {"selector": "th", "props": [
                ("background-color", "#f5f5dc"), 
                ("color", "black"),
                ("font-weight", "bold"),
                ("border", "1px solid #bdb76b")
            ]},
            {"selector": "td", "props": [
                ("border", "1px solid #dcdcdc")
            ]},
        ])
        .background_gradient(cmap="Blues", subset=["Avg_Accuracy", "Avg_AUC"])
        .background_gradient(cmap="Oranges", subset=["Avg_MAE"])
        .format({"Avg_Accuracy": "{:.2%}", "Avg_MAE": "{:.3f}", "Avg_AUC": "{:.3f}"})
        .set_properties(**{
            "text-align": "center",
            "padding": "6px"
        })
        .hide(axis="index")
    )

    display(styled)

    pred_cols = [c for c in all_results.columns if "_Rank" in c]

    for conf_id in sorted(all_results["confID"].unique()):
        print(f"\nConference {conf_id} Predictions:")

        cdf = (
            all_results[all_results["confID"] == conf_id]
            .sort_values("rank")
            .reset_index(drop=True)
        )

        df_conf = cdf[["year", "tmID", "rank"] + pred_cols]

        styled_conf = (
            df_conf.style
            .set_table_styles([
                {"selector": "th", "props": [
                    ("background-color", "#f5f5dc"),
                    ("color", "black"),
                    ("font-weight", "bold"),
                    ("border", "1px solid #bdb76b")
                ]},
                {"selector": "td", "props": [
                    ("border", "1px solid #d3d3d3")
                ]},
            ])
            .background_gradient(cmap="Blues", subset=pred_cols)
            .set_properties(**{
                "text-align": "center",
                "padding": "6px"
            })
            .hide(axis="index")
        )

        display(styled_conf)



### Testing Year 10

In [5]:
df10, metrics10 = predict_team_conference_rank(10)
evaluate_predictions(df10, metrics10)

Rank,year,model,Avg_MAE,Avg_Accuracy,Avg_AUC
1,10,RandomForest,1.262,39.29%,0.642
2,10,ExtraTrees,1.262,30.95%,0.592
3,10,GradientBoosting,1.571,29.76%,0.583
4,10,Ridge,1.405,23.81%,0.55



Conference 0 Predictions:


year,tmID,rank,ExtraTrees_Rank,RandomForest_Rank,GradientBoosting_Rank,Ridge_Rank
10,IND,1.0,1,1,1,1
10,ATL,2.0,3,3,2,3
10,DET,3.0,2,2,5,2
10,WAS,4.0,6,6,6,6
10,CHI,5.0,4,4,3,4
10,CON,6.0,5,5,4,7
10,NYL,7.0,7,7,7,5



Conference 1 Predictions:


year,tmID,rank,ExtraTrees_Rank,RandomForest_Rank,GradientBoosting_Rank,Ridge_Rank
10,PHO,1.0,5,5,5,5
10,SEA,2.0,2,2,3,2
10,LAS,3.0,1,3,2,1
10,SAS,4.0,4,4,4,4
10,MIN,5.0,6,6,6,6
10,SAC,6.0,3,1,1,3


### Testing Year 9

In [6]:
df9, metrics9 = predict_team_conference_rank(9)
evaluate_predictions(df9, metrics9)

Rank,year,model,Avg_MAE,Avg_Accuracy,Avg_AUC
1,9,ExtraTrees,2.0,28.57%,0.583
2,9,GradientBoosting,2.143,21.43%,0.542
3,9,RandomForest,2.0,14.29%,0.5
4,9,Ridge,2.143,14.29%,0.5



Conference 0 Predictions:


year,tmID,rank,ExtraTrees_Rank,RandomForest_Rank,GradientBoosting_Rank,Ridge_Rank
9,DET,1.0,1,2,2,2
9,CON,2.0,6,4,7,4
9,NYL,3.0,7,7,6,7
9,IND,4.0,2,1,1,1
9,CHI,5.0,5,6,5,6
9,WAS,6.0,3,3,3,3
9,ATL,7.0,4,5,4,5



Conference 1 Predictions:


year,tmID,rank,ExtraTrees_Rank,RandomForest_Rank,GradientBoosting_Rank,Ridge_Rank
9,SAS,1.0,6,5,3,6
9,SEA,2.0,2,4,4,3
9,LAS,3.0,4,2,5,4
9,SAC,4.0,1,1,1,1
9,HOU,5.0,3,3,2,5
9,PHO,6.0,5,6,6,2
9,MIN,7.0,7,7,7,7


### Testing Year 8

In [7]:
df8, metrics8 = predict_team_conference_rank(8)
evaluate_predictions(df8, metrics8)

Rank,year,model,Avg_MAE,Avg_Accuracy,Avg_AUC
1,8,ExtraTrees,1.619,39.29%,0.642
2,8,Ridge,1.762,25.00%,0.558
3,8,RandomForest,1.786,15.48%,0.5
4,8,GradientBoosting,2.405,8.33%,0.458



Conference 0 Predictions:


year,tmID,rank,ExtraTrees_Rank,RandomForest_Rank,GradientBoosting_Rank,Ridge_Rank
8,DET,1.0,1,2,4,1
8,IND,2.0,2,1,1,2
8,CON,3.0,3,3,3,3
8,NYL,4.0,6,6,6,6
8,WAS,5.0,4,4,2,4
8,CHI,6.0,5,5,5,5



Conference 1 Predictions:


year,tmID,rank,ExtraTrees_Rank,RandomForest_Rank,GradientBoosting_Rank,Ridge_Rank
8,PHO,1.0,5,6,6,6
8,SAS,2.0,7,5,7,7
8,SAC,3.0,3,3,4,1
8,SEA,4.0,2,2,1,3
8,HOU,5.0,1,1,2,2
8,MIN,6.0,6,7,5,5
8,LAS,7.0,4,4,3,4


### Testing Year 7

In [8]:
df7, metrics7 = predict_team_conference_rank(7)

evaluate_predictions(df7, metrics7)

Rank,year,model,Avg_MAE,Avg_Accuracy,Avg_AUC
1,7,GradientBoosting,1.714,28.57%,0.583
2,7,RandomForest,1.714,28.57%,0.583
3,7,ExtraTrees,1.857,21.43%,0.542
4,7,Ridge,1.857,14.29%,0.5



Conference 0 Predictions:


year,tmID,rank,ExtraTrees_Rank,RandomForest_Rank,GradientBoosting_Rank,Ridge_Rank
7,CON,1.0,5,5,6,5
7,DET,2.0,1,1,2,2
7,IND,3.0,3,3,3,1
7,WAS,4.0,6,6,4,6
7,NYL,5.0,7,7,7,7
7,CHA,6.0,2,2,1,3
7,CHI,7.0,4,4,5,4



Conference 1 Predictions:


year,tmID,rank,ExtraTrees_Rank,RandomForest_Rank,GradientBoosting_Rank,Ridge_Rank
7,LAS,1.0,2,1,2,2
7,SAC,2.0,5,5,6,4
7,HOU,3.0,1,2,1,1
7,SEA,4.0,4,4,3,5
7,PHO,5.0,6,6,4,6
7,SAS,6.0,3,3,5,3
7,MIN,7.0,7,7,7,7


## Overall Performance Assessment

The team ranking prediction models show moderate but inconsistent performance across different years. Overall, the models are able to capture some meaningful patterns in team performance history, as reflected by AUC scores consistently above 0.5, generally in the range of 0.55–0.65. This indicates that the models can discriminate between stronger and weaker teams to some extent.

Performance varies by year and model. For example, RandomForest performed best in Year 10 (39.29% accuracy), ExtraTrees in Years 8 and 9 (28–39%), and GradientBoosting in Year 7 (28.57%). This suggests that no single model is consistently optimal and that the best choice may depend on the characteristics of the specific year's data. The MAE in the best-performing years ranges from about 1.26 to 1.62, meaning predictions are typically off by 1–1.5 ranks on average.

However, exact rank accuracy remains low, between 8% and 39%. This indicates that precise predictions are difficult, with only 2–4 out of 10 team ranks predicted exactly correctly in most years. Ridge Regression consistently underperforms, suggesting that linear assumptions do not capture the complexity of team ranking dynamics.

Despite these limitations, the models do outperform a random baseline (around 14–17% for a 6–7 team conference) and can still provide useful trend insights. Even if exact ranks are not always correct, predictions close to the true rank can help identify teams likely to improve or decline.

In conclusion, while the models show some promise in identifying general ranking patterns, variability across years highlights the difficulty of predicting exact team ranks.
