<center>
    <h1 id='permutation-importance' style='color:#7159c1'>🔎 Permutation Importance 🔎</h1>
    <i>What features in the data did the model think are most important?</i>
</center>

---

Features Importance is a process to find out which features have the bigger impact on predictions. There are a bunch of techniques to calculate Features Importance, such as `Permutation Importance`, which has the following advantages when compared to the other techniques:

- fast to calculate;
- widely used and understood;
- consistent with properties we would want a feature importance measure to have.

To use this technique, the model MUST BE FITTED, and the technique works shuffling a single datasets' column in order to check out how the column would affect the accuracy of predictions. Think about Permutation Importance like this:

> *"If I randomly shuffle a single column of the VALIDATION data, leaving the target and all other columns in place, how would that affect the accuracy of predictions in that now-shuffled data?"*

As bigger the difference between the Real Prediction and the New Prediction (after shuffling the column), as most important the Feature is!!

In [4]:
# ---- Importations ----
import pandas as pd # pip install pandas
from sklearn.model_selection import train_test_split # pip install sklearn
from sklearn.ensemble import RandomForestRegressor
import eli5 # pip install eli5
from eli5.sklearn import PermutationImportance

# ---- Preparing Dataset ----
autos_df = pd.read_csv('./datasets/autos.csv')
autos_df = autos_df.select_dtypes(exclude='object')

X = autos_df.copy()
y = X.pop('price')

X_train, X_valid, y_train, y_valid = train_test_split(
    X, y
    , train_size=0.70
    , test_size=0.30
    , random_state=20242301
)

# ---- Preparing the Model ----
random_forest_model = RandomForestRegressor(
    n_estimators=100
    , random_state=20242301
)

random_forest_model.fit(X_train, y_train)

In [8]:
# ---- Calculating Permutation Importance ----
permutation_importance = PermutationImportance(
    random_forest_model
    , random_state=20242301
)

permutation_importance.fit(X_valid, y_valid)

eli5.show_weights(
    permutation_importance
    , feature_names=X_valid.columns.tolist()
)

Weight,Feature
0.7247  ± 0.0799,engine_size
0.1215  ± 0.0223,curb_weight
0.0256  ± 0.0164,highway_mpg
0.0225  ± 0.0038,horsepower
0.0139  ± 0.0056,length
0.0083  ± 0.0046,width
0.0079  ± 0.0045,city_mpg
0.0059  ± 0.0056,peak_rpm
0.0040  ± 0.0029,num_of_cylinders
0.0032  ± 0.0028,stroke


---

The first number in each row shows how much model performance decreased with a random shuffling (in this case, using "accuracy" as the performance metric). The number after the ± measures how performance varied from one-reshuffling to the next (it's like the Error Margin).

You'll occasionally see negative values for permutation importances. In those cases, the predictions on the shuffled (or noisy) data happened to be more accurate than the real data. This happens when the feature didn't matter (should have had an importance close to 0), but random chance caused the predictions on shuffled data to be more accurate. This is more common with small datasets, like the one in this example, because there is more room for luck/chance.

Also, Permutation Importance is great to check out whether the variables you've created in Features Engineering Step are important.

---

<h1 id='reach-me' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>📫 | Reach Me</h1>

> **Email** - [csfelix08@gmail.com](mailto:csfelix08@gmail.com?)

> **Linkedin** - [linkedin.com/in/csfelix/](https://www.linkedin.com/in/csfelix/)

> **GitHub:** - [CSFelix](https://github.com/CSFelix)

> **Kaggle** - [DSFelix](https://www.kaggle.com/dsfelix)

> **Portfolio** - [CSFelix.io](https://csfelix.github.io/).