-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pred_grid() breaks when using output from caret::train() unless all explanatory vars are cast as factors #86
Comments
@alexkrolak Thank you for reporting the issue! It is difficult to say what the specific cause is without a reproducible example, would you mind posing one? In any case, it looks like you're using the df <- data.frame(x1 = 1:3, x2 = c("a", "b", "a"), x3 = 5:7, stringsAsFactors = FALSE)
# This fails bc "x2" is character, not a factor
pdp:::pred_grid(df, pred.var = "x2")
# Error in seq.default(from = min(y, na.rm = TRUE), to = max(y, na.rm = TRUE), :
# 'from' must be a finite number
# In addition: Warning message:
# In seq.default(from = min(y, na.rm = TRUE), to = max(y, na.rm = TRUE), :
# This works
pdp:::pred_grid(df, pred.var = "x2", cats = "x2")
# x2
# 1 a
# 2 b So I suspect the following should work for you: partial(wc_med_fit_new_test, pred.var = "age_cut", cats="age_cut") If this is the case, what is the class of column |
Also, it looks like the |
Thanks! The dev version of pdp that passes cats into pred_grid() has helped. Also, changing my "cats" argument to the actual column names in the function call helped. I'm not sure, but it's possible my factor variables are being recoded within the partial() function call somewhere, and I'm getting really extreme partial effects for some of my variables. Some much larger than I'd expect based on the logistic's coefficients/their odds ratios. |
Partial doesn’t recode anything, but it’s difficult to say for certain without an example. If you can post one in a new issue, please do and I’ll be sure to look into it ASAP. |
After fighting with partial for the greater part of today, I've come to realize that pred_grid() - called by partial() - expects all of the explanatory variables as being cast as factor()s.
I created a binary classifier via caret's train() function, and tested partial() on all of the example cases - which worked fine. However, the only way I could force it to work with my actual data was to pre-cast all explanatory variables to factor()s before running caret::train(), then put the result into partial(). Even after trying to utilize the "cats" argument, I kept getting the same error messages (below). It seems like any data.frame/data.table won't utilize the "cats" argument, and I don't know if this is intentional. Perhaps it ought to be usable for these classes as well? I'm not sure if you're expecting all qualitative predictor variables to be factors already or not either. Ideally that would not be the case, and the cats argument would be able to be utilized for this sort of situation.
From partial's documentation:
Character string indicating which columns of train should be treated as categorical variables. Only used when train inherits from class "matrix" or "dgCMatrix".
partial(wc_med_fit_new_test, pred.var = "age_cut", cats="factor")
Error in seq.default(from = min(y, na.rm = TRUE), to = max(y, na.rm = TRUE), :
'from' must be a finite number
partial(wc_med_fit_new_test, pred.var = "age_cut", cats="character")
Error in seq.default(from = min(y, na.rm = TRUE), to = max(y, na.rm = TRUE), :
'from' must be a finite number
The text was updated successfully, but these errors were encountered: