-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
D1 and D2 incorrectly remove additional interaction terms if main effects are in different order #420
Comments
Thanks for your question. This is expected behaviour and is related to the internal model formula handling in The likelihood ratio test @stefvanbuuren I propose that we add some lines to help of All the best, Gerko library(mice)
gen_dat <- function(n) {
x1 <- rnorm(n)
x2 <- rnorm(n)
x3 <- rnorm(n)
y <- rnorm(n)
x1[rbinom(n,1,0.1)==1] <- NA
x2[rbinom(n,1,0.1)==1] <- NA
x3[rbinom(n,1,0.1)==1] <- NA
data.frame(y=y,x1=x1,x2=x2,x3=x3)
}
set.seed(1)
dat <- gen_dat(100)
# check default model comparison consistency
fit <- lm(y ~ x1*x2*x3, data = dat)
fit0 <- lm(y~x1 + x2 + x3 + (x1 + x2 + x3)^2, data = dat)
fit2 <- lm(y~x1 + x3 + x2 + (x1 + x2 + x3)^2, data = dat)
anova(fit, fit0, test = "LRT")
#> Analysis of Variance Table
#>
#> Model 1: y ~ x1 * x2 * x3
#> Model 2: y ~ x1 + x2 + x3 + (x1 + x2 + x3)^2
#> Res.Df RSS Df Sum of Sq Pr(>Chi)
#> 1 61 46.723
#> 2 62 47.907 -1 -1.1841 0.2137
anova(fit, fit2, test = "LRT")
#> Analysis of Variance Table
#>
#> Model 1: y ~ x1 * x2 * x3
#> Model 2: y ~ x1 + x3 + x2 + (x1 + x2 + x3)^2
#> Res.Df RSS Df Sum of Sq Pr(>Chi)
#> 1 61 46.723
#> 2 62 47.907 -1 -1.1841 0.2137
# Name differences
names(coef(fit0)) %in% names(coef(fit))
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
names(coef(fit2)) %in% names(coef(fit))
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE
coef(fit)
#> (Intercept) x1 x2 x3 x1:x2 x1:x3
#> -0.07553160 -0.15939596 -0.09645861 -0.07801288 -0.42659727 -0.21350269
#> x2:x3 x1:x2:x3
#> 0.13094054 -0.17232897
coef(fit2)
#> (Intercept) x1 x3 x2 x1:x2 x1:x3
#> -0.05249388 -0.15748075 -0.04227937 -0.07545730 -0.46607063 -0.16750329
#> x3:x2
#> 0.15378880
# Use D3 - Likelihood Ratio Test
imp <- mice(dat,
printFlag = FALSE)
fit <- with(imp, lm(y ~ x1*x2*x3))
fit0 <- with(imp, lm(y~x1 + x2 + x3 + (x1 + x2 + x3)^2))
fit2 <- with(imp, lm(y~x1 + x3 + x2 + (x1 + x2 + x3)^2))
D3(fit, fit0)
#> test statistic df1 df2 dfcom p.value riv
#> 1 ~~ 2 0.6261685 1 18.43791 92 0.4388258 0.8718629
D3(fit, fit2)
#> test statistic df1 df2 dfcom p.value riv
#> 1 ~~ 2 0.6261685 1 18.43791 92 0.4388258 0.8718629 Created on 2021-07-16 by the reprex package (v2.0.0) |
Thank you for the explanation. That makes sense, but I'm surprised this non-nesting isn't detected and doesn't give an error or warning. Something like,
But, since the other code to test if models are nested seems to come from |
Thanks for noting this issue. Now added a warning. |
If we use the
D1
orD2
functions to test a null model including interaction terms against a full model, but the null model has the main effects entered in a different order, then theD1
orD2
functions will give incorrect results. The results given are instead tests of the full model vs the null model with additional interaction terms removed.The text was updated successfully, but these errors were encountered: