make pdp and cp work with NA in data #120

hbaniecki · 2020-07-11T23:03:19Z

pbiecek · 2020-07-28T18:45:22Z

I've could not find a reproducible example,
@hbaniecki would you check if this is solved?

I've checked this with

library("DALEX")
library("ingredients")
library("randomForest")

model_titanic_glm <- randomForest(survived ~ gender + age + fare,
                        data = na.omit(titanic_imputed))
titanic_imputed[2:1000,2] = NA
explain_titanic_glm <- explain(model_titanic_glm,
                              data = titanic_imputed[,-8],
                              y = titanic_imputed[,8],
                              verbose = FALSE)
pdp_glm <- partial_dependence(explain_titanic_glm,
                             N = 25, variables = c("age", "fare","sibsp"),
                             variable_splits = list(age = seq(0,100,0.1), fare = c(0:100), sibsp=0:10))
 plot(pdp_glm)

hbaniecki · 2020-07-28T18:53:43Z

I guess that after the fix it works

library("DALEX")
library("ingredients")
library("randomForest")

model_titanic_glm <- randomForest(survived ~ gender + age + fare,
                                  data = na.omit(titanic_imputed))
titanic_imputed[2:1000,2] = NA
explain_titanic_glm <- explain(model_titanic_glm,
                               data = titanic_imputed[,-8],
                               y = titanic_imputed[,8],
                               verbose = FALSE)
pdp_glm <- partial_dependence(explain_titanic_glm,
                              N = 25, variables = c("age", "fare","sibsp"))
#, variable_splits = list(age = seq(0,100,0.1), fare = c(0:100), sibsp=0:10))
plot(pdp_glm)

pbiecek · 2020-07-28T19:21:48Z

thanks

p-schaefer · 2023-03-09T19:42:26Z

Hi there,

I'm wondering if there is some way of making conditional and accumulated dependence plots work with NAs? i,e,

library("DALEX")
library("ingredients")
library("randomForest")

model_titanic_glm <- randomForest(survived ~ gender + age + fare,
                                  data = na.omit(titanic_imputed))
titanic_imputed[2:1000,2] = NA
explain_titanic_glm <- explain(model_titanic_glm,
                               data = titanic_imputed[,-8],
                               y = titanic_imputed[,8],
                               verbose = FALSE)
pdp_glm <- conditional_dependence(explain_titanic_glm,
                              N = 25, variables = c("age", "fare","sibsp"))
#, variable_splits = list(age = seq(0,100,0.1), fare = c(0:100), sibsp=0:10))
plot(pdp_glm)

Thanks

hbaniecki · 2023-03-09T20:48:45Z

Hi, what is your goal? PD/ALE rely on estimating expected predictions with respect to data distribution.

Did you consider removing observations without age (with NAs) from data to estimate the explanation of age?

p-schaefer · 2023-03-14T15:49:27Z

Sorry, this was a bad example. I was piggybacking on the example from this thread. In doing more testing with reasonable numbers of NAs, I see that conditional_dependence() does work with NAs:

library("DALEX")
library("ingredients")
library("randomForest")

model_titanic_glm <- randomForest(survived ~ gender + age + fare,
                                  data = na.omit(titanic_imputed))

toNA<-sample(1:1000,10)

titanic_imputed[toNA,] = NA
explain_titanic_glm <- explain(model_titanic_glm,
                               data = titanic_imputed[,-8],
                               y = titanic_imputed[,8],
                               verbose = FALSE)
pdp_glm <- conditional_dependence(explain_titanic_glm,
                                  N = 25, variables = c("age", "fare","sibsp"))
#, variable_splits = list(age = seq(0,100,0.1), fare = c(0:100), sibsp=0:10))
plot(pdp_glm)

Unfortunately, in my significantly larger and more complicated models, I'm running into issues related to missing values where the aggregated profiles aren't being calculated. When I impute the missing values, there are no issues. But I can't seem to recreate it with a simpler dataset/model. Do you know of any situations where aggregating profiles fails elated to NAs? There are no instances where an entire column is NAs like in my previous examples.

hbaniecki added the invalid ❕ This doesn't seem right label Jul 11, 2020

pbiecek self-assigned this Jul 22, 2020

pbiecek added a commit that referenced this issue Jul 28, 2020

candidate fix for #120

4b71dc2

pbiecek closed this as completed Jul 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make pdp and cp work with NA in data #120

make pdp and cp work with NA in data #120

hbaniecki commented Jul 11, 2020

pbiecek commented Jul 28, 2020

hbaniecki commented Jul 28, 2020

pbiecek commented Jul 28, 2020

p-schaefer commented Mar 9, 2023 •

edited

Loading

hbaniecki commented Mar 9, 2023

p-schaefer commented Mar 14, 2023

make pdp and cp work with NA in data #120

make pdp and cp work with NA in data #120

Comments

hbaniecki commented Jul 11, 2020

pbiecek commented Jul 28, 2020

hbaniecki commented Jul 28, 2020

pbiecek commented Jul 28, 2020

p-schaefer commented Mar 9, 2023 • edited Loading

hbaniecki commented Mar 9, 2023

p-schaefer commented Mar 14, 2023

p-schaefer commented Mar 9, 2023 •

edited

Loading