Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ALE doesn't preserve factor level order in the output #82

Closed
hbaniecki opened this issue Dec 9, 2019 · 5 comments
Closed

ALE doesn't preserve factor level order in the output #82

hbaniecki opened this issue Dec 9, 2019 · 5 comments
Labels
before release 📌 TODO before release

Comments

@hbaniecki
Copy link
Member

hbaniecki commented Dec 9, 2019

Consider the following code:

library("DALEX")
library("randomForest")

d2 <- titanic_imputed
levels(d2$gender) <- c("male" ,"female")
model_titanic_rf <- randomForest(survived ~ .,  data = d2)

explain_titanic_rf <- explain(model_titanic_rf,
                              data = d2[,-8],
                              y = d2[,8])

selected_passangers <- select_sample(d2, n = 100)
cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers)

ale <- aggregate_profiles(cp_rf, variables = "gender", type = "accumulated", variable_type = "categorical")

ale$`_x_`
levels(d2$gender)
plot(ale)

I deliberately changed the factor levels to showcase the problem. _x_ column should probably be sorted on d2$gender levels (they matter in computation). The same may apply to all aggregate_profiles outputs.

@hbaniecki
Copy link
Member Author

hbaniecki commented Dec 9, 2019

Ceteris Paribus does that:

library("DALEX")
# smaller data, quicker example
titanic_small <- select_sample(titanic_imputed, n = 500, seed = 1313)
levels(titanic_small$gender) <- c("male", "female")
# build a model
model_titanic_glm <- glm(survived ~ gender + age + fare,
                         data = titanic_small,
                         family = "binomial")

explain_titanic_glm <- explain(model_titanic_glm,
                               data = titanic_small[,-8],
                               y = titanic_small[,8],
                               verbose = FALSE)

cp_rf <- ceteris_paribus(explain_titanic_glm, titanic_small[1,])
levels(cp_rf$gender)

image

@pbiecek
Copy link
Member

pbiecek commented Dec 9, 2019

Not that easy, as _x_ has levels for all factor variables.
And these levels are stored as characters not as factors. Characters do not have order.

Maybe the easiest solution would to add some postprocessing that will convert characters into factors with an ordered derived from the input.

@hbaniecki
Copy link
Member Author

Yes, this should be the best solution.

@pbiecek
Copy link
Member

pbiecek commented Dec 10, 2019

CP is working fine

> levels(cp_rf$gender)
[1] "male"   "female"

it's a problem with aggregate_profiles

@hbaniecki
Copy link
Member Author

hbaniecki commented Dec 10, 2019

Sure, I meant that it is working fine. We can just sort _x_ column in corresponding groups, but have in mind that not all of the columns have to be factors (not factors can have unique(x) order, as it's probably used in computation). NVM. maybe they have to be factors.

library("DALEX")
library("randomForest")

d2 <- titanic_imputed
levels(d2$gender) <- c("male" ,"female")
model_titanic_rf <- randomForest(survived ~ .,  data = d2)

explain_titanic_rf <- explain(model_titanic_rf,
                              data = d2[,-8],
                              y = d2[,8])

selected_passangers <- select_sample(d2, n = 100)
cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers)

ale <- aggregate_profiles(cp_rf, variables = c("gender","embarked"), type = "accumulated", variable_type = "categorical")

ale$`_x_`
c(levels(d2$gender),levels(d2$embarked))

levels(c("b", "a")) #NULL
unique(c("b", "a"))

@hbaniecki hbaniecki added the before release 📌 TODO before release label Dec 11, 2019
pbiecek added a commit that referenced this issue Dec 12, 2019
pbiecek added a commit that referenced this issue Dec 12, 2019
pbiecek added a commit that referenced this issue Dec 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
before release 📌 TODO before release
Projects
None yet
Development

No branches or pull requests

2 participants