-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
calculate_variable_profile
coerces integer
s to numeric
s
#145
Comments
Hi Simon, thanks for this report. A quick workaround is to state # ok
ingredients::ceteris_paribus(
explainer_rf,
explainer_rf$data,
variable_splits = list(Year_Built=unique(vip_train$Year_Built))
) An error occurs due to the default # error
ingredients::ceteris_paribus(
explainer_rf,
explainer_rf$data,
variable_splits = ingredients:::calculate_variable_split.default(explainer_rf$data, variables=c("Year_Built"))
)
# float, not an integer
ingredients:::calculate_variable_split.default(explainer_rf$data, variables=c("Year_Built")) Fixing this issue requires adding
which would lead to treating integer features like categorical features with unique() .
# ok
ingredients::ceteris_paribus(
explainer_rf,
explainer_rf$data,
variable_splits = list(Year_Built=unique(vip_train$Year_Built))
) @pbiecek what do you think? |
Thanks for tracking down this tricky error! Treating an integer as a categorical variable is a good idea, as long as it doesn't have too many different levels (e.g. someone has a column with an ids and there are 10000 different values in it, that would kill our profile calculation). |
I implemented the fix, and it actually still fails ungracefully in the above scenario, because there are 113 unique year values. This got me thinking that with categorical variables, we don't have a threshold on how many unique values there should be. We can either:
|
great idea, |
The tidymodels team recently introduced support for finer-grained numeric classes in recipes. A user recently pointed on our community forums that this introduced issues with
model_profile()
in some cases. Here's a reprex:Created on 2022-12-05 with reprex v2.0.2
The issue arises here, where the numeric
split_points
are dropped into the (possibly) integervariable
:ingredients/R/calculate_variable_profile.R
Line 39 in a44ad39
The text was updated successfully, but these errors were encountered: