Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factor Variables Converted To Numeric Makes Results Less User Friendly #96

Closed
alexanderjwhite opened this issue Sep 12, 2021 · 3 comments · Fixed by #97
Closed

Factor Variables Converted To Numeric Makes Results Less User Friendly #96

alexanderjwhite opened this issue Sep 12, 2021 · 3 comments · Fixed by #97
Labels
invalid ❕ This doesn't seem right

Comments

@alexanderjwhite
Copy link

Factors are converted to numerics resulting in variables in the plots being labeled with a value rather than their label. I've identified where this occurs. The stacktrace is shown below along with example images. In nice_format, which is called by nice_pair (shown below) calls as.character will make a factor into a numeric.

Result of reprex:
image
image

Issue isolation
image
image
image

reprex

library(tidymodels)
library(modeldata) 
library(DALEXtra)
data(ames)

rf_model <- 
  rand_forest(trees = 1000) %>% 
  set_engine("ranger") %>% 
  set_mode("regression")

rf_wflow <- 
  workflow() %>% 
  add_formula(
    Sale_Price ~ Neighborhood + Gr_Liv_Area + Year_Built + Bldg_Type + 
      Latitude + Longitude) %>% 
  add_model(rf_model) 

rf_fit <- rf_wflow %>% fit(data = ames)

exp_train <- ames %>% 
  select(-Sale_Price)

exp_rf <- 
  explain_tidymodels(
    rf_fit, 
    data = exp_train, 
    y = ames$Sale_Price,
    verbose = TRUE
  )

first_obs <- exp_train %>% 
  slice(1)

breakdown <- predict_parts(explainer = exp_rf, new_observation = first_obs, type = "break_down")
breakdown[1:7,]
first_obs$Bldg_Type
first_obs$Neighborhood

sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] DALEXtra_2.1.1 DALEX_2.3.0 yardstick_0.0.8 workflowsets_0.1.0
[5] workflows_0.2.3 tune_0.1.5 tidyr_1.1.3 tibble_3.1.1
[9] rsample_0.1.0 recipes_0.1.16 purrr_0.3.4 parsnip_0.1.5
[13] modeldata_0.1.0 infer_0.5.4 ggplot2_3.3.3 dplyr_1.0.6
[17] dials_0.0.9 scales_1.1.1 broom_0.7.6 tidymodels_0.1.3

loaded via a namespace (and not attached):
[1] jsonlite_1.7.2 splines_4.0.2 foreach_1.5.1 prodlim_2019.11.13
[5] vip_0.3.2 assertthat_0.2.1 GPfit_1.0-8 globals_0.14.0
[9] ipred_0.9-11 pillar_1.6.1 backports_1.2.1 lattice_0.20-44
[13] glue_1.4.2 reticulate_1.20 visdat_0.5.3 pROC_1.17.0.1
[17] digest_0.6.27 hardhat_0.1.6 colorspace_2.0-1 Matrix_1.2-18
[21] plyr_1.8.6 timeDate_3043.102 pkgconfig_2.0.3 lhs_1.1.1
[25] DiceDesign_1.9 listenv_0.8.0 ranger_0.12.1 gower_0.2.2
[29] lava_1.6.9 generics_0.1.0 ellipsis_0.3.2 withr_2.4.2
[33] furrr_0.2.3 nnet_7.3-14 cli_3.0.0 survival_3.1-12
[37] magrittr_2.0.1 crayon_1.4.1 future_1.21.0 fansi_0.4.2
[41] parallelly_1.26.1 MASS_7.3-51.6 class_7.3-17 tools_4.0.2
[45] lifecycle_1.0.0 munsell_0.5.0 compiler_4.0.2 rlang_0.4.11
[49] grid_4.0.2 iterators_1.0.13 rstudioapi_0.13 rappdirs_0.3.3
[53] gtable_0.3.0 codetools_0.2-16 DBI_1.1.1 R6_2.5.0
[57] gridExtra_2.3 lubridate_1.7.10 utf8_1.2.1 iBreakDown_2.0.1
[61] parallel_4.0.2 Rcpp_1.0.6 vctrs_0.3.8 rpart_4.1-15
[65] png_0.1-7 tidyselect_1.1.1

@hbaniecki hbaniecki added the invalid ❕ This doesn't seem right label Sep 13, 2021
@hbaniecki
Copy link
Member

Indeed, this is due to new_observation being a tibble; related:

as.character(factor("a"))
library(tidyr)
as.character(tibble(factor("a")))

@alexanderjwhite
Copy link
Author

alexanderjwhite commented Sep 13, 2021

I see. It works as a data.frame. Is this the intended functionality? This is called by DALEX/DALEXtra which is built to provide tidyverse extensions. tibbles are a central component of tidyverse functionality. A new observation (which frequently will be a multivariable slice of a tibble) will likely be used often here. Wouldn't it be beneficial to provide this robustness?

@hbaniecki
Copy link
Member

Yes, we shall patch this (probably update nice_fotmat).

hbaniecki added a commit that referenced this issue Sep 21, 2021
pbiecek pushed a commit that referenced this issue Sep 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid ❕ This doesn't seem right
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants