#Note:

- The results were computed in deep learning sheet as using pkl outputs here was not working out.
- Weather features and Illinois Hub-only models were also tested.
However, these variants did not provide performance improvements over the engineered Ridge baseline (details available in the deep-learning notebook).
Thus, this notebook focuses on interpreting the model that demonstrated the best performance.


# Feature Importance for Illinois LMP Forecasting

This notebook summarizes the **feature importance analysis** for Illinois Locational Marginal Price (LMP) forecasting.

It is based on the models and experiments developed in:

> `Deep_Learning+Feature Importance.ipynb – Milestone 3: Deep Learning Model Development and Hyperparameter Tuning`

The main deep-learning notebook showed that:

-  **Ridge Regression** baseline with engineered features achieves:
  - MAE ≈ **\$4.19**
  - RMSE ≈ **\$9.19**
  - R² ≈ **0.8977**

- Baseline **Dense Neural Network** (feed-forward) gets:
  - MAE ≈ **\$10.04**
  - RMSE ≈ **\$12.46**
  - R² ≈ **0.8119**

- Multiple **LSTM** and **GRU** configurations with expanding windows, including weather features, and PCA struggled to exceed **R² ≈ 0.2–0.25**, even after careful data cleaning and outlier removal.

These results suggested that:
- The relationship between features and LMP is **mostly linear**.
- The forecast is dominated by **current-hour market and hub price information**, rather than long historical sequences.

This notebook focuses only on interpretation of the model via feature importance, using the Ridge model fitted in the deep-learning notebook.


## 2. Coefficient-Based Feature Importance (Ridge)

The table below shows the **top 15 features** ranked by the absolute value of their standardized Ridge coefficients, as computed in the deep-learning notebook.

| feature           |   coef |   abs_coef |
|:------------------|-------:|-----------:|
| da_ILLINOIS.HUB   | 12.847 |     12.847 |
| da_INDIANA.HUB    | -7.395 |      7.395 |
| act_MICHIGAN.HUB  |  6.866 |      6.866 |
| act_INDIANA.HUB   |  3.816 |      3.816 |
| act_MINN.HUB      |  2.813 |      2.813 |
| ClearedLoad       | -2.324 |      2.324 |
| da_ARKANSAS.HUB   | -2.300 |      2.300 |
| da_MINN.HUB       | -2.257 |      2.257 |
| ActualLoad        |  2.103 |      2.103 |
| lmp_act_lag_1h    |  1.986 |      1.986 |
| da_MICHIGAN.HUB   | -1.225 |      1.225 |
| ForecastedLoad    | -0.940 |      0.940 |
| da_MS.HUB         |  0.858 |      0.858 |
| act_LOUISIANA.HUB | -0.722 |      0.722 |
| lmp_act_roll24    | -0.684 |      0.684 |

**Interpretation:**

- The largest positive coefficient belongs to **`da_ILLINOIS.HUB`**, indicating that the **day-ahead Illinois hub price** is the most influential feature for predicting the actual LMP.
- Other highly important price-related features include:
  - **`da_INDIANA.HUB`**
  - **`act_MICHIGAN.HUB`**
  - **`act_INDIANA.HUB`**
  - **`act_MINN.HUB`**
- Load-related features also play a role:
  - **`ClearedLoad`**, **`ActualLoad`**, and **`ForecastedLoad`** appear among the top coefficients.
- Simple time-based history features such as:
  - **`lmp_act_lag_1h`** (1-hour lag) and **`lmp_act_roll24`** (24-hour rolling statistic)
  have moderate but non-negligible importance.

Overall, this confirms that the model is primarily driven by **current and nearby hub prices**, with **load** and a small amount of short-term history providing additional adjustment.


## 3. Permutation Importance (Ridge)

Permutation importance gives a unbiased view of which features matter by measuring how much the **test R² decreases** when a feature is randomly mixed.

The top 15 features by permutation importance (drop in R²) from the deep-learning notebook are:

| feature          |   importance_mean |   importance_std |
|:-----------------|------------------:|-----------------:|
| act_MICHIGAN.HUB |             0.442 |            0.004 |
| da_ILLINOIS.HUB  |             0.376 |            0.004 |
| da_INDIANA.HUB   |             0.139 |            0.001 |
| act_INDIANA.HUB  |             0.124 |            0.001 |
| act_MINN.HUB     |             0.115 |            0.001 |
| lmp_act_lag_1h   |             0.023 |            0.001 |
| da_MINN.HUB      |             0.021 |            0     |
| da_ARKANSAS.HUB  |             0.012 |            0     |
| ActualLoad       |             0.010 |            0     |
| ClearedLoad      |             0.009 |            0     |
| da_MICHIGAN.HUB  |             0.005 |            0     |
| act_TEXAS.HUB    |             0.002 |            0     |
| act_ARKANSAS.HUB |             0.002 |            0     |
| ForecastedLoad   |             0.001 |            0     |
| act_MS.HUB       |             0.001 |            0     |

**Interpretation:**

- The largest drop in R² occurs when **`act_MICHIGAN.HUB`** is permuted, followed by **`da_ILLINOIS.HUB`** and **`da_INDIANA.HUB`**.
- This means the Ridge model’s predictive performance relies most heavily on a **small set of hub prices**:
  - Actual Michigan hub price,
  - Day-ahead Illinois and Indiana hub prices,
  - Other nearby actual hub prices (Indiana, Minnesota).
- The **1-hour lag feature** `lmp_act_lag_1h` provides useful short-term historical context but is secondary compared to the main hub price features.
- Load-related variables—`ActualLoad`, `ClearedLoad`, and `ForecastedLoad`—still show up as helpful but **do not dominate** performance.
- The overall ranking from permutation importance is consistent with the coefficient-based ranking:  
  **hub prices + load + short history** are the key drivers.


## 4. Conclusions (based on the results):

From the deep-learning notebook:

- **Ridge Regression** achieved:
  - MAE ≈ **\$4.19**, RMSE ≈ **\$9.19**, R² ≈ **0.8977**.
- A baseline **Dense Neural Network** achieved:
  - MAE ≈ **\$10.04**, RMSE ≈ **\$12.46**, R² ≈ **0.8119**.
- Multiple **LSTM** and **GRU** experiments with expanding windows, PCA, and outlier controls reached **R² in the 0.2–0.25 range** on the test set.

The feature importance analysis makes these patterns easier to understand:

1. **Importance of current hub prices**

   Both coefficient and permutation importance show that a **small set of hub prices** (especially `act_MICHIGAN.HUB`, `da_ILLINOIS.HUB`, `da_INDIANA.HUB`, and other nearby hubs) explain most of the variance in Illinois LMP.

   This means the problem is largely about learning a **cross-sectional mapping** from current hub and load conditions to the Illinois LMP, rather than about remembering long sequences of past values.

2. **Short-term history and load help, but are secondary**

   Features like `lmp_act_lag_1h`, `lmp_act_roll24`, and the load variables (`ActualLoad`, `ClearedLoad`, `ForecastedLoad`) improve the predictions a little bit but do not change model performance.

   This explains why:
   - Ridge, which is linear and uses these features directly, already achieves **R² ≈ 0.90**.
   - Adding more deep-learning capacity does not automatically give better results.

3. **Why LSTM/GRU struggled?**

   LSTM and GRU were designed to use *long temporal dependencies* in sequences.  
   However, the feature importance analysis shows that **long historical data are not where the data is useful**:

   - The most important information is **already used** in the **current-hour hub prices and load**.
   - When the model tries to learn from long sequences, it tends to:
     - Overfit noise,
     - Struggle with high volatility and outliers,
     - Fail to significantly improve over the simple cross-sectional mapping learned by Ridge.

   This is consistent with the low R² values observed for LSTM/GRU in the deep-learning notebook.

---

## 5. Key Takeaways

- The Illinois LMP forecasting task in this project is majorly influenced by a **small number of current hub price features**, supported by load and short-term history.
- A simple, well-regularized **Ridge Regression** model is sufficient to capture almost all the useful signal in the engineered feature set.
- More complex sequence models (LSTM/GRU) do not yield better performance unless the problem is re-framed to genuinely depend on long-term temporal structure.

In other words, the **feature importance results fully support the model conclusions** from the deep-learning notebook:
> Invest in good **current-hour features** and simple models, rather than in very complex sequence architectures for this particular LMP dataset.
