# Feature Importance and SHAP Analysis


In [0]:
import os
import joblib
import numpy as np
from scipy.sparse import issparse

BASE_PATH = os.path.abspath(os.path.join(os.getcwd(), "..", "etl_pipeline"))

pipeline = joblib.load(os.path.join(BASE_PATH, "stedi_feature_pipeline.pkl"))
model = joblib.load(os.path.join(BASE_PATH, "best_model_final.pkl"))

X_train_transformed = joblib.load(os.path.join(BASE_PATH, "X_train_transformed.pkl"))
X_test_transformed  = joblib.load(os.path.join(BASE_PATH, "X_test_transformed.pkl"))
y_train = joblib.load(os.path.join(BASE_PATH, "y_train.pkl"))
y_test  = joblib.load(os.path.join(BASE_PATH, "y_test.pkl"))


In [0]:
def to_float_matrix(arr):
    if issparse(arr):
        return arr.toarray().astype(float)
    return np.array(arr, dtype=float)

X_train = to_float_matrix(X_train_transformed)
X_test  = to_float_matrix(X_test_transformed)

y_train = np.ravel(y_train)
y_test  = np.ravel(y_test)

X_train.shape, X_test.shape


## Global Feature Importance


In [0]:
import numpy as np

# Logistic Regression uses coefficients, not feature_importances_
importances = np.abs(model.coef_[0])
importance_order = np.argsort(importances)[::-1]

try:
    feature_names = pipeline.named_steps["preprocess"].get_feature_names_out()
except:
    feature_names = [f"feature_{i}" for i in range(X_train.shape[1])]

for idx in importance_order[:10]:
    print(feature_names[idx], ":", importances[idx])


In [0]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10,5))
plt.barh([feature_names[i] for i in importance_order[:10]],
         importances[importance_order[:10]])
plt.xlabel("Importance")
plt.title("Top Global Feature Importance")
plt.gca().invert_yaxis()
plt.show()


In [0]:
## SHAP Analysis


In [0]:
%pip install shap

In [0]:
import shap
shap.initjs()

# Logistic Regression → LinearExplainer
explainer = shap.LinearExplainer(model, X_train)
shap_values = explainer.shap_values(X_test)



In [0]:
shap.summary_plot(shap_values, X_test, feature_names=feature_names)


In [0]:
i = 0  # any row

shap.force_plot(
    explainer.expected_value,
    shap_values[i],
    X_test[i],
    feature_names=feature_names,
    matplotlib=True
)


## Global Insight

The most important features overall appear to be those derived from the accelerometer and motion sensor measurements, such as device orientation, movement intensity, and step-related patterns.

These features make sense because step detection relies heavily on changes in motion and acceleration. When a person takes a step, the device experiences predictable spikes and patterns that the model can learn.

The global feature importance plot and SHAP summary plot both indicate that motion-related features consistently have the strongest influence on predictions, which aligns with expectations for detecting physical movement.


## Local Insight

The SHAP force plot for a single prediction shows which features pushed the model toward classifying the observation as either "step" or "no_step."

For the selected example, features associated with stronger motion signals pushed the prediction toward "step," while features indicating low or stable movement pushed it toward "no_step."

This local explanation helps clarify why the model made its decision for this specific instance and demonstrates that the model responds to meaningful physical signals rather than arbitrary patterns.


## Intuition Check

The model’s logic generally matches human intuition. A human would also expect that stronger movement and acceleration patterns correspond to steps, while minimal movement corresponds to no_step.

Because the model relies primarily on motion-related features, its behavior appears logical and grounded in real-world physical phenomena. There were no major unexpected influences from irrelevant features, which increases trust in the model’s predictions.


## Dashboard Preparation

The following visualizations will be included in the dashboard:

• Global Feature Importance Chart — to show which features the model relies on most overall  
• SHAP Summary Plot — to illustrate how features influence predictions across all observations  
• SHAP Force Plot — to explain an individual prediction in detail  

These visuals provide both a high-level understanding of the model’s behavior and a detailed explanation of specific decisions, making the model more transparent and interpretable for stakeholders.
