Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃悰 in autoplot with factor followed by numeric #79

Closed
bgreenwell opened this issue Aug 27, 2018 · 5 comments
Closed

馃悰 in autoplot with factor followed by numeric #79

bgreenwell opened this issue Aug 27, 2018 · 5 comments

Comments

@bgreenwell
Copy link
Owner

bgreenwell commented Aug 27, 2018

# Load required packages
library(ggplot2)
library(pdp)
library(randomForest)

# Load Boston housing data
data(boston)

# Fit a random forest model
set.seed(101)
boston.rf <- randomForest(cmedv ~ ., data = boston)

# Two predictor PDP (factor/numeric)
boston.rf %>%
  partial(pred.var = c("chas", "lstat"), chull = TRUE) %>%
  autoplot(contour = TRUE, main = "factor/numeric")

image

@bgreenwell
Copy link
Owner Author

FYI numeric followed by factor seems to work fine! 馃

@bgreenwell bgreenwell changed the title 馃悰 in autoplot with factor followed by numeric 馃悰 in autoplot with factor followed by numeric Aug 27, 2018
@bfgray3
Copy link

bfgray3 commented Aug 30, 2018

library(ggplot2)
library(pdp)
suppressPackageStartupMessages(library(randomForest))

data(boston)

set.seed(101)
boston.rf <- randomForest(cmedv ~ ., data = boston)

p1 <- partial(boston.rf, pred.var = c("chas", "lstat"), chull = TRUE)
p2 <- partial(boston.rf, pred.var = c("lstat", "chas"), chull = TRUE)

dplyr::all_equal(p1, p2)
#> [1] TRUE

# autoplot.R -> autoplot.partial -> ggplot_two_predictor_pdp
# line 373 if block broken, line 402 if block good

ggplot(p1, aes(x = p1[[2L]], y = p1[["yhat"]])) + geom_line() + facet_wrap(~ p1[[1L]])

ggplot(p2, aes(x = p2[[1L]], y = p2[["yhat"]])) + geom_line() + facet_wrap(~ p2[[2L]])

ggplot(p1, aes(x = lstat, y = yhat)) + geom_line() + facet_wrap(~ chas)

ggplot(p2, aes(x = lstat, y = yhat)) + geom_line() + facet_wrap(~ chas)

ggplot(p1, aes_string(x = "lstat", y = "yhat")) + geom_line() + facet_wrap("chas")

ggplot(p2, aes_string(x = "lstat", y = "yhat")) + geom_line() + facet_wrap("chas")

Created on 2018-08-30 by the reprex package (v0.2.0).

Somehow ggplot is getting confused by the subsetting with [[. I believe this should be fixable with aes_string after picking out the column names instead of using integer indices (I know you are wary of ggplot2 3.0.0 and tidyeval for use in packages). I could make a PR in the next week or so if it would be helpful.

@bgreenwell
Copy link
Owner Author

bgreenwell commented Aug 31, 2018

@bfgray3 thanks for the thorough reprex!! I've been toying with ggplot2 using [[ outside of pdp but cannot reproduce the error, so I'm not sure where the bug truly lies, but I suspect ggplot2 (since plotPartial()/lattice works just fine in this example). If that's the case, I'd submit an issue there. For now, maybe we can use aes_string() only for the factor/numeric case? I'm also not opposed to using tidyeval, just haven't had the time to learn it 馃様. Happy to take any PR with a fix, even if temporary!

@bfgray3
Copy link

bfgray3 commented Aug 31, 2018

I agree the issue is likely with ggplot.

library(ggplot2)
library(pdp)
suppressPackageStartupMessages(library(randomForest))

data(boston)

set.seed(101)
boston.rf <- randomForest(cmedv ~ ., data = boston)

p1 <- partial(boston.rf, pred.var = c("chas", "lstat"), chull = TRUE)
class(p1)
#> [1] "data.frame" "partial"
class(p1) <- "data.frame"
str(p1)
#> 'data.frame':    102 obs. of  3 variables:
#>  $ chas : Factor w/ 2 levels "0","1": 1 2 1 2 1 2 1 2 1 2 ...
#>  $ lstat: num  1.73 1.73 2.45 2.45 3.18 ...
#>  $ yhat : num  31.1 31.5 31.1 31.5 31.1 ...
ggplot(p1, aes(x = p1[[2L]], y = p1[["yhat"]])) + geom_line() + facet_wrap(~ p1[[1L]])

Created on 2018-08-31 by the reprex package (v0.2.0).

In the meantime I'll see what I can do with aes_string.

@bfgray3
Copy link

bfgray3 commented Aug 31, 2018

These plots look good though. This is a real noodle scratcher.

library(ggplot2)

data(iris)

dat1 <- iris[c("Species", "Sepal.Length", "Sepal.Width")]
dat2 <- iris[c("Sepal.Length", "Species", "Sepal.Width")]

ggplot(dat1, aes(x = dat1[[2L]], y = dat1[["Sepal.Width"]])) + geom_line() + facet_wrap(~ dat1[[1L]])

ggplot(dat2, aes(x = dat2[[1L]], y = dat2[["Sepal.Width"]])) + geom_line() + facet_wrap(~ dat2[[2L]])

Created on 2018-08-31 by the reprex package (v0.2.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants