# DECIMA2 FEATURE IMPORTANCES TAKE 1/10 TIME AS SHAP

In this notebook we train a Random Forest Regressor on the California dataset provided by the SHAP library. we then generate Decima2 and SHAP feature importance explanations for this model and show that Decima2 explanations take half the time as SHAP

We first import the relevant libraries

In [1]:
import shap

from decima2 import model_feature_importance


from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

from decima2.utils.utils import feature_names
import time

We then download the California dataset and train a Random Forest Regressor on this dataset

In [2]:
X, y = shap.datasets.california(n_points=5000)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)
model = RandomForestRegressor(max_depth=100, random_state=42)
model.fit(X_train, y_train)
model.score(X_test,y_test)

0.7939460881429082

We then generate our Decima2 feature importances and print out the time taken 

In [3]:
st = time.time()
explanation_app = model_feature_importance(X_test,y_test,model,output='text')
et = time.time()
print("Decima2 explanations took "+str(et-st) + " seconds to run")
explanation_app

Decima2 explanations took 1.913661003112793 seconds to run


Unnamed: 0,Feature,Importance
0,MedInc,1.61204
5,AveOccup,1.10874
1,HouseAge,1.05748
6,Latitude,0.94145
7,Longitude,0.82292
4,Population,0.76992
2,AveRooms,0.75381
3,AveBedrms,0.7475


We then generate our SHAP explanations and print out the time taken 

In [4]:
st = time.time()
explainer = shap.Explainer(model, X_test)
shap_values = explainer(X_test,check_additivity=False)
et = time.time()
print("SHAP explanations took "+str(et-st) + " seconds to run")



SHAP explanations took 27.751878023147583 seconds to run


In [5]:
attributions = shap_values.values.mean(axis=0)
attributions = attributions.reshape(X_test.shape[1])
feature_names(X_test,attributions)

Unnamed: 0,Feature,Importance
0,MedInc,0.12332
6,Latitude,0.07649
7,Longitude,0.0277
2,AveRooms,0.01672
1,HouseAge,0.01238
3,AveBedrms,0.00348
5,AveOccup,0.00317
4,Population,0.00146


From this exampe we can see that both explanation methods agree on the most important feature for this dataset and model, however, our Decima2 explanation method took 1/10 of the time the SHAP algorithm took!