# Illustrating the `teller` (v0.2.0)

This notebook illustrates the use of the [`teller`](https://github.com/thierrymoudiki/teller), a model-agnostic tool for Machine Learning explainability. Version `0.2.0` improves the interface, and introduces tests on the significance of marginal effects. 

Currently, the `teller` can be installed from Github as: 

In [0]:
pip install git+https://github.com/thierrymoudiki/teller.git

Data for the demo is Boston Housing dataset. The response is MEDV, Median value of owner-occupied homes in $1000’s (the __reponse__):



- CRIM per capita crime rate by town
- ZN proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS proportion of non-retail business acres per town
- CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- NOX nitric oxides concentration (parts per 10 million)
- RM average number of rooms per dwelling
- AGE proportion of owner-occupied units built prior to 1940
- DIS weighted distances to five Boston employment centres
- RAD index of accessibility to radial highways
- TAX full-value property-tax rate per $10,000

- PTRATIO pupil-teacher ratio by town
- LSTAT % lower status of the population
- MEDV Median value of owner-occupied homes in $1000’s (the __reponse__)


In [0]:
import teller as tr
import pandas as pd
import numpy as np      

from sklearn import datasets, linear_model
from sklearn.ensemble import RandomForestRegressor
from sklearn import datasets
from sklearn.model_selection import train_test_split


# import data
boston = datasets.load_boston()
X = np.delete(boston.data, 11, 1)
y = boston.target
col_names = np.append(np.delete(boston.feature_names, 11), 'MEDV')


Split data into a training and a testing set:

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, 
                                                    random_state=123)
print(X_train.shape)
print(X_test.shape)


(404, 12)
(102, 12)


In [0]:
# fit a random forest model 
regr2 = RandomForestRegressor(n_estimators=1000, random_state=123)
regr2.fit(X_train, y_train)


# creating the explainer
expr = tr.Explainer(obj=regr2)


# fitting the explainer (for heterogeneity of effects only)
expr.fit(X_test, y_test, X_names=col_names[:-1], method="avg")


# heterogeneity of effects
print(expr.summary())

In [12]:
# confidence int. and tests on effects (Jackkife)
expr.fit(X_test, y_test, X_names=col_names[:-1], method="ci")

print(expr.summary())




Residuals: 
     Min      1Q  Median      3Q     Max
-20.7672 -1.1802 -0.2857  1.0939  8.8958


Tests on marginal effects (Jackknife): 
          Estimate   Std. Error   95% lbound   95% ubound     Pr(>|t|)     
LSTAT     -11.6629      4.21614     -20.0266     -3.29925   0.00674382   **
PTRATIO   -5.83981      1.01549     -7.85428     -3.82534  9.51649e-08  ***
INDUS     -3.45108     0.865402     -5.16781     -1.73436  0.000126284  ***
TAX     -0.0527896   9.7629e-16   -0.0527896   -0.0527896            0  ***
CHAS             0  2.22045e-16 -4.40477e-16  4.40477e-16            1     
AGE       0.983501    0.0202223     0.943385      1.02362  6.71929e-72  ***
ZN         1.05412   0.00505945      1.04409      1.06416  7.3924e-135  ***
NOX        1.42195      11.3752     -21.1434      23.9873     0.900769     
DIS        2.03424  1.78522e-14      2.03424      2.03424            0  ***
RAD          18.61     0.929718      16.7657      20.4543   6.2148e-37  ***
RM         28.8366      1.