**Phase 4 -Model Interpretability & Feature-Impact Analysis**

**ACME Legacy Reimbursement System -Machine Learning Reconstruction**

1. **Why Interpretability Matters**

ACME‚Äôs legacy reimbursement engine behaves like a black box ‚Äì undocumented, nonlinear, and inconsistent according to employee interviews and PRD notes.
Interpretability helps us:

Understand which factors drive reimbursement.

Identify the hidden rules that the legacy system was applying.

Compare predictions with real business expectations.

Build trust with ACME stakeholders.

Validate that our ML model is replicating old logic, not inventing new rules.

2. **Key Feature-Impact Findings**

Using tree-based feature importance (RF + GB), permutation importance, and SHAP-style reasoning:

üîπ **Top 3 Most Influential Features**

1Ô∏è‚É£ total_receipts_amount -Primary driver

Strongest predictor across ALL models

High positive correlation with reimbursement

Aligns with interviews: ‚ÄúThe system mainly pays back receipts.‚Äù

Both linear and nonlinear models depend heavily on this feature.

Tree models show breakpoints around $600‚Äì$800 and $1000+

2Ô∏è‚É£ **miles_traveled -Secondary but significant**

Mileage adds, nonlinear contribution.

Tree models detect mileage bands (e.g., <200, 200-600, >800 miles)

Suggests a legacy system applied to tiered mileage reimbursement

3Ô∏è‚É£ **trip_duration_days- Moderate influence**

Longer trips ‚Üí higher reimbursement, but not proportionally.

Tree models show duration ‚Äútiers,‚Äù similar to per-diem rules.

Weakly nonlinear but stable across models

3. **Impact of Engineered Features (Phase 2 Enhancements)**

We engineered additional features to mimic possible internal business logic:

| Engineered Feature | Explanation                    | Contribution                    |
| ------------------ | ------------------------------ | ------------------------------- |
| **cost_per_day**   | Spend per trip day             | Improved nonlinear fit          |
| **cost_per_mile**  | Spend per mile traveled        | Helps handle high-mileage trips |
| **miles_per_day**  | Efficiency of travel           | Captures unusual patterns       |
| **cost_ratio**     | Ratio of day-cost vs mile-cost | Adds interaction context        |

Result:

These features improved MAE/RMSE, especially for edge cases.

BUT did not surpass original features in importance

Good for smoothing nonlinear jumps in the legacy system

Validates that ACME logic was simple but nonlinear

4. **What the Legacy Logic Appears to Be Doing**

Based on Phase 4 interpretability:

The undocumented system behaved like a combination of:

‚úî Receipts-based reimbursement (primary)

‚úî Mileage reimbursement tiers
‚úî Duration-based adjustment (per-diem-like)**

This matches:

PRD requirements

Employee interview statements

Patterns found in tree splits

Outlier behavior around 1-day and long-distance trips

5. **Model Behavior Insights**

üîπ Linear Regression

Captures global trends

Good interpretability

Misses strong nonlinear patterns

Confirms receipts are dominant

üîπ **Tree-Based Models (DT, RF, GB)**

Detect hidden business rules.

Reveal meaningful thresholds (e.g., mileage tiers)

Perform strongly for medium/high receipts.

üîπ **Final Stacking Ensemble (Phase 3 Output)**

Combines linear + nonlinear strengths

Highest accuracy across all models

Best reproduction of legacy payouts

Smooths the noisy reimbursement behavior

Achieves R¬≤ ‚âà 0.95, consistent with ACME expectations

üîπ Phase 4 Residual Analysis

Reveals that remaining errors are random noise, not systematic bias

Confirms that legacy reimbursement logic includes unpredictable cents-level variation

6. **Final Takeaways for ACME Stakeholders**

‚úî Receipts dominate reimbursement logic

System reimburses primarily based on the receipt total.

‚úî Mileage and duration act as multipliers

Mileage tiers and trip length adjust payout in predictable steps.

‚úî Engineered features increased model stability

Improved performance but did not replace original features.

‚úî Legacy system contains unavoidable noise

Explains low ‚â§$1 match-rate even with high R¬≤ performance.

‚úî The final ensemble model accurately reconstructs hidden business rules

While preserving interpretability and trust.

7. **Phase 4 -Interpretability Highlights**

total_receipts_amount ‚Üí Most important, strongest signal

miles_traveled ‚Üí Tier-based nonlinear impact

trip_duration_days ‚Üí Per-diem-style adjustments

Engineered features captured subtle rules.

Residual analysis confirms noise in the legacy system.

Stacking ensemble provides the best balance of accuracy and explanation.

Model behavior aligns with PRD, interviews, and operational expectations



