<a href="https://colab.research.google.com/github/aeshwin10/XAI/blob/main/Dalex.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Libraries and data**

In [None]:
!pip install dalex



In [None]:
import dalex as dx
import xgboost as xgb
import pandas as pd

In [None]:
#Fetch the data which is already in the dalex library. We are using the titanic dataset.
#We will get the characteristics of the people who survived.
data = dx.datasets.load_titanic()
data.head()

Unnamed: 0,gender,age,class,embarked,fare,sibsp,parch,survived
0,male,42.0,3rd,Southampton,7.11,0,0,0
1,male,13.0,3rd,Southampton,20.05,0,2,0
2,male,16.0,3rd,Southampton,20.05,1,1,0
3,female,39.0,3rd,Southampton,20.05,1,1,1
4,female,16.0,3rd,Southampton,7.13,0,0,1


**Data Preparation:**

 transforming into dummy variables, also known as one-hot encoding, is a technique used to convert categorical variables into numerical features that can be processed by machine learning algorithms



In [None]:
#transform into dummies.
data = pd.get_dummies(data, drop_first = True)

In [None]:
#isolate X and Y
y = data.survived
X = data.drop(columns = 'survived')

In [None]:
#Create a xgboost matrix
train = xgb.DMatrix(X, label = y)

**XGBoost**
This is using regression as default.

In [None]:
#parameters
params = {"objective": "binary:logistic",
          "eval_metric": "auc"}

In [None]:
#XGBoost model
model = xgb.train(params, train)

**Dalex Local Interpretability**

In [None]:
#explainer
explainer = dx.Explainer(model, X, y,
                         predict_function = lambda m,
                         d: m.predict(xgb.DMatrix(d)))

Preparation of a new explainer is initiated

  -> data              : 2207 rows 14 cols
  -> target variable   : Parameter 'y' was a pandas.Series. Converted to a numpy.ndarray.
  -> target variable   : 2207 values
  -> model_class       : xgboost.core.Booster (default)
  -> label             : Not specified, model's class short name will be used. (default)
  -> predict function  : <function <lambda> at 0x7a823dec9000> will be used
  -> predict function  : Accepts only pandas.DataFrame, numpy.ndarray causes problems.
  -> predicted values  : min = 0.0476, mean = 0.321, max = 0.961
  -> model type        : 'model_type' not provided and cannot be extracted.
  -> model type        : Some functionalities won't be available.
  -> residual function : difference between y and yhat (default)
  -> residuals         : min = -0.827, mean = 0.00102, max = 0.94
  -> model_info        : package xgboost

A new explainer has been created!


In [None]:
X.head()

Unnamed: 0,age,fare,sibsp,parch,gender_male,class_2nd,class_3rd,class_deck crew,class_engineering crew,class_restaurant staff,class_victualling crew,embarked_Cherbourg,embarked_Queenstown,embarked_Southampton
0,42.0,7.11,0,0,1,0,1,0,0,0,0,0,0,1
1,13.0,20.05,0,2,1,0,1,0,0,0,0,0,0,1
2,16.0,20.05,1,1,1,0,1,0,0,0,0,0,0,1
3,39.0,20.05,1,1,0,0,1,0,0,0,0,0,0,1
4,16.0,7.13,0,0,0,0,1,0,0,0,0,0,0,1


In [None]:
#local interpretability
explainer.predict_parts(X.iloc[4,:]).plot()