#5.6 Hyperparameter tuning/choose model
### Geovanny Peña Rueda

### Step 1 - Load Model, Pipeline, and data

In [0]:
import joblib
import numpy as np
from scipy.sparse import issparse

# Load pipeline and transformed data
pipeline = joblib.load("./etl_pipeline/stedi_feature_pipeline.pkl")
X_train_transformed = joblib.load("./etl_pipeline/X_train_transformed.pkl")
X_test_transformed = joblib.load("./etl_pipeline/X_test_transformed.pkl")
y_train = joblib.load("./etl_pipeline/y_train.pkl")
y_test = joblib.load("./etl_pipeline/y_test.pkl")

# Helper function para asegurarnos de que las matrices sean 2D y tipo float
def to_float_matrix(arr: np.ndarray) -> np.ndarray:
    if arr.ndim == 0:
        arr = arr.item()
        if issparse(arr):
            arr = arr.toarray()
        arr = np.array(arr, dtype=float)
    elif arr.dtype == object:
        arr = np.array([x.toarray() if issparse(x) else np.array(x, dtype=float) for x in arr])
        arr = np.vstack(arr)
    elif issparse(arr):
        arr = arr.toarray()
    else:
        arr = np.array(arr, dtype=float)
    return arr

X_train = to_float_matrix(X_train_transformed)
X_test = to_float_matrix(X_test_transformed)
y_train = np.ravel(y_train)
y_test = np.ravel(y_test)

# Confirm shapes
X_train.shape, X_test.shape, y_train.shape, y_test.shape

### Step 2 – SHAP Insights Review

**Which features mattered most:**  
The most important feature in the model was `num__distance_cm`. Feature importance and SHAP plots both showed that distance measurements had the strongest influence on predicting whether a step occurred. Sensor type and device ID features had much smaller effects.

**Unexpected or concerning behavior:**  
An unexpected result was that the model almost always predicted the class **"step"** and rarely predicted **"no_step"**. This was confirmed by the confusion matrix, where all "no_step" cases were misclassified.

**Observed weaknesses:**  
A key weakness of the model is class imbalance handling. While overall accuracy is high, the model struggles to correctly identify "no_step" events. This suggests the model may be biased toward detecting movement and may miss periods of no movement.

### Step 3 – Focused Hyperparameter Design

Based on SHAP analysis and performance metrics, the Random Forest model showed a strong bias toward predicting the "step" class and failed to correctly classify "no_step" events. This suggests the model may be oversimplifying decision boundaries and over-relying on dominant features like distance.

To address this, the refinement tuning focuses on:
- Increasing `min_samples_leaf` to reduce overconfident splits
- Adjusting `max_depth` to control model complexity
- Increasing `n_estimators` to improve stability and generalization

This grid is intentionally small and targeted to improve balance and fairness without excessive computation.

In [0]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier

# Focused hyperparameter grid
params = {
    "n_estimators": [100, 200],
    "max_depth": [5, 10, None],
    "min_samples_leaf": [1, 2, 4]
}

rf = RandomForestClassifier(
    random_state=42,
    n_jobs=-1
)

grid = GridSearchCV(
    rf,
    params,
    scoring="accuracy",
    cv=3,
    n_jobs=-1
)

grid.fit(X_train, y_train)

grid.best_params_, grid.best_score_

### Step 5 – Old vs New Model Comparison

The original Random Forest model achieved approximately 95% accuracy using 50 estimators and a maximum depth of 5.

After performing a targeted refinement search, the best tuned model achieved an accuracy of 95.11% using 100 estimators, maximum depth of 5, and minimum samples per leaf of 1.

The performance improvement was minimal. While increasing the number of estimators slightly improved stability, it did not significantly change predictive performance.

Because the improvement was very small, both models perform similarly. The refinement process confirmed that the original hyperparameter choices were already close to optimal.

### Step 6 - Save the Updated Model (If improves)

In [0]:
import joblib

joblib.dump(
    grid.best_estimator_,
    "./etl_pipeline/stedi_best_model_refined.pkl"
)

### Step 7 – Model Refinement Summary

A second hyperparameter tuning search was performed to improve model fairness and generalization. The tuning focused on adjusting the number of trees, maximum tree depth, and minimum samples per leaf.

The refined model slightly improved accuracy from approximately 95% to 95.11%. Although the improvement was small, the refinement confirmed that the model parameters were well balanced and not overfitting.

The refined model was selected as the final model because it was validated through a structured tuning process. This decision supports responsible model development by verifying model stability and performance.

### Step 8 - Ethics Reflection

Careless hyperparameter tuning can create models that appear accurate but behave unfairly or unpredictably. For example, a model may ignore minority classes, which can lead to harmful or biased predictions.

Examining model behavior carefully helps ensure that predictions are trustworthy and responsible. Explainability tools such as SHAP allow data scientists to detect hidden biases and unexpected decision patterns.

Gospel principles such as honesty and accountability guide responsible model development. Being truthful about model strengths and weaknesses reflects integrity. Just as disciples are judged by their fruits, machine learning models must also be evaluated by their real-world outcomes.