# Generate Model Interpreter Report with House Price dataset using XAI

This notebook demonstrates how to generate explanations report using complier implemented in the XAI library.


## Motivation
Once the PoC is done (and you know where your data comes from, what it looks like, and what it can predict) comes the ideal next step is to put your model into production and make it useful for the rest of the business.

Does it sound familiar? do you also need to answer the questions below, before promoting your model into production:
1. _How you sure that your model is ready for production?_
2. _How you able to explain the model performance? in business context that non-technical management can understand?_
3. _How you able to compare between newly trained models and existing models is done manually every iteration?_

In XAI project, our simply vision is to:
1. __Speed up data validation__
2. __Simplify model engineering__
3. __Build trust__  
  
For more details, please refer to our [whitepaper](https://sap.sharepoint.com/sites/100454/ML_Apps/Shared%20Documents/Reusable%20Components/Explainability/XAI_Whitepaper.pdf?csf=1&e=phIUNN&cid=771297d7-d488-441a-8a65-dab0305c3f04)


## Steps
1. Create a model to Predict House Price, using the data provide in [house prices dataset](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data)
2. Evaluate the model performance with XAI report

## Credits
1. Pramodh, Manduri <manduri.pramodh@sap.com>

### 1. Performance Model Training

In [1]:
import warnings

from pprint import pprint
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from xgboost import XGBRegressor

#### 1.1. Loading Data and XGB-Model

In [2]:
data = pd.read_csv('train.csv')
data.dropna(axis=0, subset=['SalePrice'], inplace=True)
y = data.SalePrice

X = data.drop(['SalePrice', 'Id'], axis=1).select_dtypes(exclude=['object'])
train_X, test_X, train_y, test_y = train_test_split(X.values, y.values, test_size=0.25)

my_imputer = SimpleImputer()
train_X = my_imputer.fit_transform(train_X)
test_X = my_imputer.transform(test_X)

my_model = XGBRegressor(n_estimators=1000, 
                        max_depth=5, 
                        learning_rate=0.1, 
                        subsample=0.7, 
                        colsample_bytree=0.8, 
                        colsample_bylevel=0.8, 
                        base_score=train_y.mean(), 
                        random_state=42, seed=42)
hist = my_model.fit(train_X, train_y, 
                    early_stopping_rounds=5, 
                    eval_set=[(test_X, test_y)], eval_metric='rmse', 
                    verbose=100)

[0]	validation_0-rmse:66964.2
Will train until validation_0-rmse hasn't improved in 5 rounds.
Stopping. Best iteration:
[54]	validation_0-rmse:21674.5



#### 1.2. Review Best and Worse Predictions

In [3]:
test_pred = my_model.predict(test_X)
errors = test_pred - test_y
sorted_errors = np.argsort(abs(errors))
worse_5 = sorted_errors[-5:]
best_5 = sorted_errors[:5]

print(pd.DataFrame({'worse':errors[worse_5]}))
print()
print(pd.DataFrame({'best':errors[best_5]}))

         worse
0  -78879.0625
1  -79771.8125
2  -95349.3125
3   97974.9375
4 -128526.9375

         best
0  -60.687500
1  -95.265625
2   97.875000
3 -135.640625
4 -139.718750


#### 1.3. Perform LIME (Local Interpretable Model-Agnostic Explanations)

In [4]:
import lime
import lime.lime_tabular
explainer = lime.lime_tabular.LimeTabularExplainer(train_X, feature_names=X.columns, class_names=['SalePrice'], verbose=True, mode='regression')

##### Explaining a few worse predictions:

In [5]:
type(train_X)
X.columns.tolist()
import pandas as pd
df1 = pd.DataFrame(data =train_X, columns=  X.columns.tolist())
#train_y.tolist()
#X.columns.tolist()

In [6]:
X_train = df1
clf = my_model
clf_fn = my_model.predict
y_train = []
feature_names=X.columns.tolist()
target_names_list =['SalePrice']
pprint(target_names_list)

['SalePrice']


### 2. Involve XAI complier

In [7]:
import os
import json
import sys
sys.path.append('../../../')
from xai.compiler.base import Configuration, Controller

The sklearn.metrics.classification module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.metrics. Anything that cannot be imported from sklearn.metrics is now part of the private API.


#### 2.1 Specify config file

In [8]:
json_config = 'regressor-model-interpreter.json'

#### 2.2 Load and Check config file (before rendering)

In [9]:
with open(json_config) as file:
    config = json.load(file)
config
pprint(config)

{'content_table': True,
 'contents': [{'desc': 'This section provides the Interpretation of model',
               'sections': [{'component': {'_comment': 'refer to document '
                                                       'section xxxx',
                                           'attr': {'domain': 'tabular',
                                                    'error_analysis_k_value': 5,
                                                    'error_analysis_stats_type': 'average_score',
                                                    'error_analysis_top_value': 15,
                                                    'feature_names': 'var:feature_names',
                                                    'method': 'lime',
                                                    'mode': 'regression',
                                                    'model_interpret_k_value': 5,
                                                    'model_interpret_stats_type': 'top_k',
          

#### 2.3  Initial compiler controller with config - withe locals()

In [10]:
controller = Controller(config=Configuration(config, locals()))
pprint(controller.config)

{'content_table': True,
 'contents': [{'desc': 'This section provides the Interpretation of model',
               'sections': [{'component': {'_comment': 'refer to document '
                                                       'section xxxx',
                                           'attr': {'domain': 'tabular',
                                                    'error_analysis_k_value': 5,
                                                    'error_analysis_stats_type': 'average_score',
                                                    'error_analysis_top_value': 15,
                                                    'feature_names': ['MSSubClass',
                                                                      'LotFrontage',
                                                                      'LotArea',
                                                                      'OverallQual',
                                                                      'OverallCond

#### 2.4 Render report

In [11]:
controller.render()

Interpret 100/1095 samples
Interpret 200/1095 samples
Interpret 300/1095 samples
Interpret 400/1095 samples
Interpret 500/1095 samples
Interpret 600/1095 samples
Interpret 700/1095 samples
Interpret 800/1095 samples
Interpret 900/1095 samples
Interpret 1000/1095 samples
Setting feature_perturbation = "tree_path_dependent" because no background data was given.


### Results

In [12]:
pprint("report generated : %s/housingpricing-regression-model-interpreter-report.pdf" % os.getcwd())
('report generated : '
 '/Users/i062308/Development/Explainable_AI/tutorials/compiler/housingpricing/housingpricing-regression-model-interpreter-report.pdf')

('report generated : '
 '/Users/i062308/Development/Explainable_AI/tutorials/compiler/housingpricing/housingpricing-regression-model-interpreter-report.pdf')


'report generated : /Users/i062308/Development/Explainable_AI/tutorials/compiler/housingpricing/housingpricing-regression-model-interpreter-report.pdf'