# 5.6 Refinement Hyperparameter Tuning (Focused Search)

Goal: Improve model behavior based on explainability and evaluation results, especially the model's weakness on the minority class (`no_step`).


In [0]:
import os
import joblib
import numpy as np
from scipy.sparse import issparse


In [0]:
# Notebook is inside .../notebooks
# Artifacts are inside sibling folder .../etl_pipeline
BASE_PATH = os.path.abspath(os.path.join(os.getcwd(), "..", "etl_pipeline"))

print("CWD:", os.getcwd())
print("BASE_PATH:", BASE_PATH)
print("BASE_PATH exists:", os.path.exists(BASE_PATH))
print("Files in BASE_PATH:")
print(os.listdir(BASE_PATH))


## Step 1 — Load Model, Pipeline, and Data (Repo Path Only)


In [0]:
pipeline = joblib.load(os.path.join(BASE_PATH, "stedi_feature_pipeline.pkl"))
old_model = joblib.load(os.path.join(BASE_PATH, "best_model_final.pkl"))

X_train_transformed = joblib.load(os.path.join(BASE_PATH, "X_train_transformed.pkl"))
X_test_transformed  = joblib.load(os.path.join(BASE_PATH, "X_test_transformed.pkl"))
y_train = joblib.load(os.path.join(BASE_PATH, "y_train.pkl"))
y_test  = joblib.load(os.path.join(BASE_PATH, "y_test.pkl"))


In [0]:
def to_float_matrix(arr):
    if issparse(arr):
        return arr.toarray().astype(float)
    return np.array(arr, dtype=float)

X_train = to_float_matrix(X_train_transformed)
X_test  = to_float_matrix(X_test_transformed)
y_train = np.ravel(y_train)
y_test  = np.ravel(y_test)

print("X_train:", X_train.shape)
print("X_test:", X_test.shape)
print("y_train:", y_train.shape)
print("y_test:", y_test.shape)
print("Old model type:", type(old_model))


## Step 2 — SHAP-Based Mini-Reflection 

SHAP and feature importance showed the model relies mainly on motion-related sensor features, which is logical for step detection.  
However, evaluation revealed a major weakness: the model predicts nearly everything as `step`, resulting in **0 recall and 0 precision for `no_step`**.  
This suggests class imbalance is driving biased behavior, so the refinement tuning will focus on improving minority-class performance.


## Step 3 — Focused Hyperparameter Grid (Purposeful)

Because the model struggles to detect the minority class (`no_step`), this refinement grid focuses on:

- `class_weight`: forces the model to pay more attention to `no_step`
- `C`: controls regularization strength (can improve generalization and reduce bias toward majority class)

This grid is intentionally small to keep runtime reasonable while targeting the identified weakness.


In [0]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV

# Focused grid to address class imbalance + regularization
params_refine = {
    "C": [0.01, 0.1, 1.0, 10.0],
    "class_weight": [None, "balanced"]
}

refine_grid = GridSearchCV(
    LogisticRegression(solver="liblinear", max_iter=500),
    param_grid=params_refine,
    # Use a metric that cares about imbalance:
    scoring="balanced_accuracy",
    cv=3,
    n_jobs=-1,
    verbose=1
)

refine_grid.fit(X_train, y_train)

print("New best params:", refine_grid.best_params_)
print("New best CV score (balanced_accuracy):", refine_grid.best_score_)


## Step 5 — Compare Old vs New (Test Metrics)


In [0]:
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, balanced_accuracy_score

new_model = refine_grid.best_estimator_

# Predictions
y_pred_old = old_model.predict(X_test)
y_pred_new = new_model.predict(X_test)

# Scores
old_acc = accuracy_score(y_test, y_pred_old)
new_acc = accuracy_score(y_test, y_pred_new)

old_bal = balanced_accuracy_score(y_test, y_pred_old)
new_bal = balanced_accuracy_score(y_test, y_pred_new)

print("OLD accuracy:", old_acc)
print("NEW accuracy:", new_acc)
print("\nOLD balanced_accuracy:", old_bal)
print("NEW balanced_accuracy:", new_bal)

print("\nOLD confusion matrix:\n", confusion_matrix(y_test, y_pred_old))
print("\nNEW confusion matrix:\n", confusion_matrix(y_test, y_pred_new))

print("\nOLD classification report:\n", classification_report(y_test, y_pred_old, zero_division=0))
print("\nNEW classification report:\n", classification_report(y_test, y_pred_new, zero_division=0))


## Step 5 Decision — Did Refinement Improve the Model?

Old model score (balanced_accuracy): **0.5**  
New model score (balanced_accuracy): **0.6**  

Decision rule: If the refined model improves balanced accuracy and improves `no_step` recall without severely harming overall performance, it will replace the old model.


In [0]:
import joblib

# Decide using balanced accuracy (and your inspection of no_step recall)
if new_bal > old_bal:
    best_model = new_model
    chosen = "NEW refined model"
else:
    best_model = old_model
    chosen = "OLD model (kept)"

print("Chosen:", chosen)

save_path = os.path.join(BASE_PATH, "best_model_refined.pkl")

# overwrite if exists
if os.path.exists(save_path):
    os.remove(save_path)

joblib.dump(best_model, save_path)

print("Saved chosen model to:", save_path)
print("Files now in BASE_PATH:")
print(os.listdir(BASE_PATH))



## Step 7 — Model Refinement Summary

I ran a second, focused tuning process on Logistic Regression to address the model’s weakness in detecting the `no_step` class. The refinement grid adjusted `class_weight` to handle class imbalance and tuned the regularization parameter `C` to improve generalization.

The refined model improved balanced accuracy from **0.50 to 0.60**, indicating better performance across both classes rather than favoring the majority class. This suggests the model now detects no_step events more effectively while maintaining strong performance on step detection.

I updated the final model to the refined version because it provides a more equitable balance between classes. This decision is responsible because it prioritizes fairness and real-world usefulness instead of relying solely on overall accuracy, which can be misleading in imbalanced datasets.



## Step 8 — Ethics Reflection (4–6 sentences)

Careless hyperparameter tuning can create unfair or unsafe models because a model may appear “accurate” while failing important minority cases, as seen when `no_step` recall was near zero.  
Examining model behavior carefully (using balanced metrics and explainability) helps prevent harm, because it exposes weaknesses that overall accuracy can hide.  
Explainability supports responsible AI by showing whether a model relies on meaningful signals or biased shortcuts.  
Gospel principles such as integrity, stewardship, and accountability guide me to report results honestly and to choose a model that performs responsibly across different situations.  
“Line upon line” refinement encourages small, intentional improvements based on evidence rather than assumptions.
