# Advanced Machine Learning Techniques

Objective: To implement and benchmark non-tree-based models (SVR, Neural Networks) and advanced ensemble strategies (Voting, Stacking) on the ACME Reimbursement dataset. We will assess if these complex architectures can capture the "fuzzy" logic better than Gradient Boosting.

### Setup and Imports

SVR and Neural Networks require specific preprocessing (scaling) that decision trees do not. We import StandardScaler here.

In [1]:
# Core libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# ML Models
from sklearn.svm import SVR
from sklearn.neural_network import MLPRegressor
from sklearn.ensemble import VotingRegressor, StackingRegressor
from sklearn.linear_model import LinearRegression

# Tree models for Ensembles
from xgboost import XGBRegressor
import lightgbm as lgbm

# Data Preprocessing & Evaluation
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

# Set styles
RANDOM_STATE = 42
np.random.seed(RANDOM_STATE)
sns.set_theme(style="whitegrid")
pd.set_option('display.max_columns', 50)

print("Libraries imported successfully.")

Libraries imported successfully.


### Loading Data

Load the public_cases_derived_features.csv file containing the 27 engineered features.

In [2]:
# Load dataset
try:
    df = pd.read_csv("public_cases_derived_features.csv")
    print(f"Loaded {len(df)} records.")
except FileNotFoundError:
    print("Error: 'public_cases_derived_features.csv' not found.")

FEATURES = [col for col in df.columns if col != 'expected_output']
X = df[FEATURES]
y = df['expected_output']

# --- CRITICAL STEP: Feature Scaling ---
# SVR and Neural Networks perform poorly if data isn't scaled (e.g., miles vs. days).
# We create a SCALED dataset for them.
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split Scaled Data (for SVR/MLP)
X_train_sc, X_val_sc, y_train, y_val = train_test_split(
    X_scaled, y, test_size=0.25, random_state=RANDOM_STATE
)

# Split Unscaled Data (for Tree Ensembles)
X_train, X_val, _, _ = train_test_split(
    X, y, test_size=0.25, random_state=RANDOM_STATE
)

print("Data prepared: Scaled (for SVR/MLP) and Unscaled (for Trees).")

Loaded 1000 records.
Data prepared: Scaled (for SVR/MLP) and Unscaled (for Trees).


### Evaluation Helper

The standard evaluation function for the challenge metrics.

In [3]:
def evaluate_model(y_true, y_pred, model_name):
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    diffs = np.abs(y_pred - y_true)
    
    exact = int(np.sum(diffs < 0.01))
    close = int(np.sum(diffs < 1.00))
    mae = np.mean(diffs)
    score = (mae * 100) + (len(y_true) - exact) * 0.1
    
    print(f"--- {model_name} ---")
    print(f"[{model_name}] Total={len(y_true)} | Exact={exact} | Close(< $1.00)={close} | MAE=${mae:.2f} | Score={score:.2f}\n")
    
    return {"Model": model_name, "MAE": mae, "Score": score, "Exact": exact}

## Support Vector Regression (SVR)

SVR tries to find a "hyperplane" in high-dimensional space that fits the data. It is powerful but computationally expensive and highly sensitive to outliers.

In [4]:
# 1. Initialize SVR
# Kernel='rbf' allows it to learn non-linear curves (like the mileage tiering)
# C=100 gives the model more flexibility to fit the complex data
svr_model = SVR(kernel='rbf', C=100, epsilon=0.1)

# 2. Fit on SCALED data
print("Training SVR...")
svr_model.fit(X_train_sc, y_train)

# 3. Predict
y_pred_svr = svr_model.predict(X_val_sc)

# 4. Evaluate
stats_svr = evaluate_model(y_val, y_pred_svr, "Support Vector Regression")

Training SVR...
--- Support Vector Regression ---
[Support Vector Regression] Total=250 | Exact=0 | Close(< $1.00)=1 | MAE=$82.36 | Score=8261.15



## Neural Networks (MLP)

We used a Multi-Layer Perceptron. Given the dataset size (1,000 rows), a massive deep network would overfit. We used a modest architecture (2 hidden layers).

In [5]:
# 1. Initialize MLP
mlp_model = MLPRegressor(
    hidden_layer_sizes=(128, 64),  # Two layers
    activation='relu',
    solver='adam',
    max_iter=2000,     # High iteration count to ensure convergence
    learning_rate_init=0.001,
    early_stopping=True, # Stop if validation score stops improving
    random_state=RANDOM_STATE
)

# 2. Fit on SCALED data
print("Training Neural Network...")
mlp_model.fit(X_train_sc, y_train)

# 3. Predict
y_pred_mlp = mlp_model.predict(X_val_sc)

# 4. Evaluate
stats_mlp = evaluate_model(y_val, y_pred_mlp, "Neural Network (MLP)")

Training Neural Network...
--- Neural Network (MLP) ---
[Neural Network (MLP)] Total=250 | Exact=0 | Close(< $1.00)=4 | MAE=$82.86 | Score=8310.70



## Ensemble Methods (Stacking & Voting)

Here we combine the strengths of the best models.

In [6]:
# Define fresh models for the ensemble. 
# We use the optimal n_estimators found in previous notebooks to avoid needing a validation set here.

# XGBoost Base (approx optimal iterations)
xgb_base = XGBRegressor(
    n_estimators=800,  
    learning_rate=0.01,
    max_depth=4,
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=RANDOM_STATE
)

# LightGBM Base (approx optimal iterations)
lgbm_base = lgbm.LGBMRegressor(
    n_estimators=500,
    learning_rate=0.01,
    max_depth=4,
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=RANDOM_STATE,
    n_jobs=-1,
    verbose=-1
)

print("Base estimators defined for ensembles.")

Base estimators defined for ensembles.


## Voting Regressor

A Voting Regressor averages the predictions of its base models. It often smooths out the "jitter" of individual tree models.

In [7]:
# 1. Create Voting Regressor
voting_model = VotingRegressor(
    estimators=[
        ('xgb', xgb_base), 
        ('lgbm', lgbm_base)
    ],
    n_jobs=-1
)

# 2. Fit on UNSCALED data (Trees handle unscaled data better)
print("Training Voting Regressor...")
voting_model.fit(X_train, y_train)

# 3. Predict
y_pred_vote = voting_model.predict(X_val)

# 4. Evaluate
stats_vote = evaluate_model(y_val, y_pred_vote, "Ensemble (Voting)")

Training Voting Regressor...
--- Ensemble (Voting) ---
[Ensemble (Voting)] Total=250 | Exact=0 | Close(< $1.00)=3 | MAE=$63.45 | Score=6370.21



## Stacking Regressor

Stacking is more sophisticated. It trains a "meta-model" (Linear Regression) to learn the best combination of the base models.

In [8]:
# 1. Create Stacking Regressor
# The 'final_estimator' learns how to weight the XGB and LGBM predictions
stacking_model = StackingRegressor(
    estimators=[
        ('xgb', xgb_base), 
        ('lgbm', lgbm_base)
    ],
    final_estimator=LinearRegression(),
    passthrough=False, # The meta-model only sees the predictions, not original features
    n_jobs=-1
)

# 2. Fit on UNSCALED data
print("Training Stacking Regressor...")
stacking_model.fit(X_train, y_train)

# 3. Predict
y_pred_stack = stacking_model.predict(X_val)

# 4. Evaluate
stats_stack = evaluate_model(y_val, y_pred_stack, "Ensemble (Stacking)")

Training Stacking Regressor...
--- Ensemble (Stacking) ---
[Ensemble (Stacking)] Total=250 | Exact=0 | Close(< $1.00)=3 | MAE=$58.90 | Score=5915.02



## Final Comparison

Compare the advanced techniques against each other to see which architecture is the strongest candidate for the final Hybrid model.

In [9]:
# Aggregate results
results = [stats_svr, stats_mlp, stats_vote, stats_stack]
df_results = pd.DataFrame(results).set_index("Model").sort_values("Score")

print("--- Advanced Technique Comparison ---")
display(df_results.style.format({'MAE': '${:.2f}', 'Score': '{:.2f}'}))

--- Advanced Technique Comparison ---


Unnamed: 0_level_0,MAE,Score,Exact
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Ensemble (Stacking),$58.90,5915.02,0
Ensemble (Voting),$63.45,6370.21,0
Support Vector Regression,$82.36,8261.15,0
Neural Network (MLP),$82.86,8310.7,0
