Skip to content
Extensions for the DALEX package
R
Branch: master
Clone or download
maksymiuks and pbiecek Issue #26 (#29)
* Added categories

* Desciption update

* Issue #26 fix (#28)

* fix funnel plot

* fix documentation for funnel plot

* Update plot.funnel_measure.R

* New tests

* Test fix
Latest commit 3fb605e Nov 11, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R Issue #26 (#29) Nov 11, 2019
README_files/figure-gfm Site update Oct 15, 2019
docs New template for site Oct 15, 2019
inst candidate fix for #19 (#22) Oct 30, 2019
man Issue #26 (#29) Nov 11, 2019
pkgdown/favicon Site update Oct 15, 2019
tests Issue #26 (#29) Nov 11, 2019
vignettes candidate fix for #19 (#22) Oct 30, 2019
.Rbuildignore candidate fix for #19 (#22) Oct 30, 2019
.gitignore Champion challenger (#18) Sep 23, 2019
.travis.yml Trying to fix travis Oct 15, 2019
DESCRIPTION Issue #26 (#29) Nov 11, 2019
NAMESPACE Issues #24&#21 fix (#25) Nov 6, 2019
NEWS.md Issue #26 (#29) Nov 11, 2019
README.md Site update Oct 15, 2019
README.rmd Readme update nad bug fix Oct 1, 2019
_pkgdown.yml Champion challenger (#18) Sep 23, 2019
codecov.yml Tests added (#5) Jul 30, 2019
cran-comments.md Champion challenger (#18) Sep 23, 2019

README.md

DALEXtra

Build Status Coverage Status CRAN_Status_Badge Total Downloads DrWhy-eXtrAI

Overview

The DALEXtra package is an extension pack for DALEX package. This package provides easy to use connectors for models created with scikitlearn, keras, H2O, mljar and mlr.

Installation

# Install the development version from GitHub:

# it is recommended to install latest version of DALEX from GitHub
devtools::install_github("ModelOriented/DALEX")
# install.packages("devtools")
devtools::install_github("ModelOriented/DALEXtra")

Package reticulate will be downloaded along with DALEXtra but if you seek for it’s latest version it can be downloaded here

devtools::install_github("rstudio/reticulate")

Other packages useful with explanations.

devtools::install_github("ModelOriented/ingredients")
devtools::install_github("ModelOriented/iBreakDown")
devtools::install_github("ModelOriented/shapper")
devtools::install_github("ModelOriented/auditor")

Above packages can be used along with explain object to create explanations (ingredients, iBreakDown, shapper) or audit our model (audiotr).

How to setup Anaconda

In order to be able to use some features associated with DALEXtra, Anaconda in needed. The easiest way to get it, is visiting Anaconda website. And choosing proper OS as it stands in the following picture. There is no big difference bewtween Python versions when downloading Anaconda. You can always create virtual environment with any version of Python no matter which version was downloaded first.

Windows

Crucial thing is adding conda to PATH environment variable when using Windows. You can do it during installation, by marking this checkbox.

or, if conda is already installed, by following those instructions.

Unix

While using unix-like OS, adding conda to PATH is not required.

Demo

Here we will present short use case for our package and its compatibility with Python

Loading data

First we need provide the data, explainer is useless without them. Thing is Python object does not store training data so always have to provide dataset. Feel free to use those attached to DALEX package or those stored in DALEXtra files.

titanic_test <- read.csv(system.file("extdata", "titanic_test.csv", package = "DALEXtra"))

Keep in mind that dataframe includes target variable (18th column) and scikit-learn models cannot work with it.

Creating explainer

Creating exlainer from scikit-learn Python model is very simple thanks to DALEXtra. The only thing you need to provide is path to pickle and, if necessary, something that lets recognize Python environment. It may be a .yml file with packages specification, name of existing conda environment or path to Python virtual environment. Execution of scikitlearn_explain only with .pkl file and data will cause usage of default Python.

library(DALEXtra)
explainer <- explain_scikitlearn(system.file("extdata", "scikitlearn.pkl", package = "DALEXtra"),
yml = system.file("extdata", "testing_environment.yml", package = "DALEXtra"), 
data = titanic_test[,1:17], y = titanic_test$survived)
## Preparation of a new explainer is initiated
##   -> model label       :  scikitlearn_model  ( �[33m default �[39m )
##   -> data              :  524  rows  17  cols 
##   -> target variable   :  524  values 
##   -> predict function  :  yhat.scikitlearn_model  will be used ( �[33m default �[39m )
##   -> predicted values  :  numerical, min =  0.02086126 , mean =  0.288584 , max =  0.9119996  
##   -> residual function :  difference between y and yhat ( �[33m default �[39m )
##   -> residuals         :  numerical, min =  -0.8669431 , mean =  0.02248468 , max =  0.9791387  
##   -> model_info        :  model_info not specified, assuming type: regression ( �[33m default �[39m ) 
##   -> model_info        :  package: Model of class: sklearn.ensemble.gradient_boosting.GradientBoostingClassifier package unrecognized Unknown ( �[33m default �[39m ) 
##  �[32m A new explainer has been created! �[39m

Creating explanations

Now with explainer ready we can use any of DrWhy.ai universe tools to make explanations. Here is a small demo.

library(DALEX)
plot(model_performance(explainer))

library(ingredients)
plot(feature_importance(explainer))

describe(feature_importance(explainer))
## The number of important variables for scikitlearn_model's prediction is 2 out of 17. 
##  Variables gender.female, gender.male have the highest importantance.
library(iBreakDown)
plot(break_down(explainer, titanic_test[2,1:17]))

describe(break_down(explainer, titanic_test[2,1:17]))
## Scikitlearn_model predicts, that the prediction for the selected instance is 0.132 which is lower than the average model prediction.
##  
## The most important variables that decrease the prediction are class.3rd, gender.female. 
## The most important variable that increase the prediction is age.
##  
## Other variables are with less importance. The contribution of all other variables is -0.108 .
library(shapper)
plot(shap(explainer, titanic_test[2,1:17]))

library(auditor)
eval <- model_evaluation(explainer)
plot_roc(eval)

# Predictions with newdata
predict(explainer, titanic_test[1:10, 1:17])
##  [1] 0.3565896 0.1321947 0.7638813 0.1037486 0.1265221 0.2949228 0.1421281
##  [8] 0.1421281 0.4154695 0.1321947

Acknowledgments

Work on this package was financially supported by the ‘NCN Opus grant 2016/21/B/ST6/02176’.

You can’t perform that action at this time.