Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pred_grid() breaks when using output from caret::train() unless all explanatory vars are cast as factors #86

Closed
alexkrolak opened this issue Feb 6, 2019 · 4 comments

Comments

@alexkrolak
Copy link

After fighting with partial for the greater part of today, I've come to realize that pred_grid() - called by partial() - expects all of the explanatory variables as being cast as factor()s.

I created a binary classifier via caret's train() function, and tested partial() on all of the example cases - which worked fine. However, the only way I could force it to work with my actual data was to pre-cast all explanatory variables to factor()s before running caret::train(), then put the result into partial(). Even after trying to utilize the "cats" argument, I kept getting the same error messages (below). It seems like any data.frame/data.table won't utilize the "cats" argument, and I don't know if this is intentional. Perhaps it ought to be usable for these classes as well? I'm not sure if you're expecting all qualitative predictor variables to be factors already or not either. Ideally that would not be the case, and the cats argument would be able to be utilized for this sort of situation.

From partial's documentation:
Character string indicating which columns of train should be treated as categorical variables. Only used when train inherits from class "matrix" or "dgCMatrix".

wc_med_fit_new_test$trainingData %>% class
[1] "data.table" "data.frame"

wc_med_fit_new_test %>% class
[1] "train" "train.formula"

partial(wc_med_fit_new_test, pred.var = "age_cut", cats="factor")
Error in seq.default(from = min(y, na.rm = TRUE), to = max(y, na.rm = TRUE), :
'from' must be a finite number

partial(wc_med_fit_new_test, pred.var = "age_cut", cats="character")
Error in seq.default(from = min(y, na.rm = TRUE), to = max(y, na.rm = TRUE), :
'from' must be a finite number

@bgreenwell
Copy link
Owner

@alexkrolak Thank you for reporting the issue! It is difficult to say what the specific cause is without a reproducible example, would you mind posing one? In any case, it looks like you're using the cats argument incorrectly (perhaps the documentation could be improved here). The cats argument specifies the column names listed in pred.var which should be treated as categorical if they are not factors. For instance:

df <- data.frame(x1 = 1:3, x2 = c("a", "b", "a"), x3 = 5:7, stringsAsFactors = FALSE)

# This fails bc "x2" is character, not a factor
pdp:::pred_grid(df, pred.var = "x2")
# Error in seq.default(from = min(y, na.rm = TRUE), to = max(y, na.rm = TRUE),  : 
#   'from' must be a finite number
# In addition: Warning message:
# In seq.default(from = min(y, na.rm = TRUE), to = max(y, na.rm = TRUE),  :

# This works
pdp:::pred_grid(df, pred.var = "x2", cats = "x2")
#   x2
# 1  a
# 2  b

So I suspect the following should work for you:

partial(wc_med_fit_new_test, pred.var = "age_cut", cats="age_cut")

If this is the case, what is the class of column "age_cut"? If it is character, this should be an easy fix to avoid having to specify the cats argument.

@bgreenwell
Copy link
Owner

Also, it looks like the cats argument in partial() never got passed to pred_grid(). I just pushed a fix to the dev version on GitHub. Let me know if it still does not work for you.

@alexkrolak
Copy link
Author

Thanks! The dev version of pdp that passes cats into pred_grid() has helped.

Also, changing my "cats" argument to the actual column names in the function call helped.

I'm not sure, but it's possible my factor variables are being recoded within the partial() function call somewhere, and I'm getting really extreme partial effects for some of my variables. Some much larger than I'd expect based on the logistic's coefficients/their odds ratios.

@bgreenwell
Copy link
Owner

bgreenwell commented Feb 9, 2019

Partial doesn’t recode anything, but it’s difficult to say for certain without an example. If you can post one in a new issue, please do and I’ll be sure to look into it ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants