Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some sample code from JSS paper 2011 can't be run #235

Closed
Bigsealion opened this issue May 23, 2020 · 8 comments
Closed

Some sample code from JSS paper 2011 can't be run #235

Bigsealion opened this issue May 23, 2020 · 8 comments

Comments

@Bigsealion
Copy link

Bigsealion commented May 23, 2020

I read the paper 'mice: Multivariate Imputation by Chained Equations in R' and try to run the sample code.
My MICE package vesion is: mice 3.9.0 2020-05-14
But, some code will give me bad result. So, I hope to know how to run this code in a right way.

# 1st -----------------------------------------------------------------------
nhanes2.ext <- cbind(nhanes2, bmi.chl = NA)
ini <- mice(nhanes2.ext, max = 0, print = FALSE)
meth <- ini$meth
meth["bmi.chl"] <- "~I((bmi-25)*(chl-200))"
pred <- ini$pred
pred[c("bmi", "chl"), "bmi.chl"] <- 0
imp <- mice(nhanes2.ext, meth = meth, pred = pred, seed = 51600)
head(ini$pad$data, 3)  **# -------> It return  Null for me, not like the paper**
# 2nd ----------------------------------------------------------------------
ini <- mice(cbind(boys, mat = NA), max = 0, print = FALSE)

meth <- ini$meth
meth["mat"] <- "~I(as.integer(gen) + as.integer(phb) + as.integer(cut(tv,breaks=c(0,3,6,10,15,20,25))))"
meth["bmi"] <- "~I(wgt/(hgt/100)^2)"

pred <- ini$pred
pred[c("bmi", "gen", "phb", "tv"), "mat"] <- 0
pred[c("hgt", "wgt", "hc", "reg"), "mat"] <- 1
pred[c("hgt", "wgt", "hc", "reg"), c("gen", "phb", "tv")] <- 0
pred[c("wgt", "hgt", "hc", "reg"), "bmi"] <- 0
pred[c("gen", "phb", "tv"), c("hgt", "wgt", "hc")] <- 0
pred

post <- ini$post
**post["gen"] <- "imp[[j]][p$data$age[!r[,j]]<5,i] <- levels(boys$gen)[1]"
post["phb"] <- "imp[[j]][p$data$age[!r[,j]]<5,i] <- levels(boys$phb)[1]"
post["tv"] <- "imp[[j]][p$data$age[!r[,j]]<5,i] <- 1"**
imp <- mice(cbind(boys, mat = NA), pred = pred, meth = meth, post = post, maxit = 10, print = FALSE)

# It return error: 
# Error in `[<-.data.frame`(`*tmp*`, p$data$age[!r[, j]] < 5, i, value = "G1") : 
# No object found'p'

In addition, I want to know what dose p and r in p$data$age[!r[,j]]<5 means. Thanks!

@stefvanbuuren stefvanbuuren changed the title Some sample code can't be run Some sample code from JSS paper 2011 can't be run May 23, 2020
@stefvanbuuren
Copy link
Member

Thanks for your interest.

The code you're citing comes from our JSS 2011 paper, which describes mice 2.09. Both errors have the same cause: They rely on the internal pad component to be present in the mids-object, and I admit that this is a hacky way to specify the model.

The pad component was superfluous and complicated development, and was removed in mice 3.0. The price is that the few code blocks that asses the pad component directly (like the above code) won't run anymore in mice 3.0 and beyond.

Variable p was the internal representation of the pad component, and is not used anymore. r is the internal representation of the response indicator.

There is plenty of example code for mice 3.0 here that you can study. The code is documented extensively in Flexible Imputation of Missing Data.

Hope this helps.

@stefvanbuuren
Copy link
Member

Related #43 and #152

@Bigsealion
Copy link
Author

Bigsealion commented May 24, 2020

Related #43 and #152

Thank you for you reply! It's helpful for me!
In addition, I'd like to quest the last question about the predictMatrix.
I view the log and get next out:

# df_na is my dataset have 7000+row and 68 col
> imp <- mice(df_na)
> imp$loggedEvents
   it im dep      meth out
1   0  0      constant V47
2   0  0     collinear V35
3   0  0     collinear V34
4   0  0     collinear V33
5   0  0     collinear V32
6   0  0     collinear V36
7   0  0     collinear V31
8   0  0     collinear V37
9   0  0     collinear V38
10  0  0     collinear V39
11  3  4 V51       pmm V50
12  3  4 V52       pmm V50
13  3  5 V45       pmm V50
14  3  5 V46       pmm V50
15  3  5 V51       pmm V50
16  3  5 V52       pmm V50
17  4  3 V45       pmm V50
18  4  4  V5       pmm V50
19  4  4 V13       pmm V50
# In fact, there are dozens of records about V50 and I omit it.

I know the means of 'constant' and 'collinear', but what the means of the log of V50?
Does it means when I impute the V50, my method is pmm and only rely the V50/V52/V45... ? But the predictmatrix is full about V50.
What does this warning suggest about my data? Thank you!

@stefvanbuuren
Copy link
Member

Collinearity introduces feedback loops, and may cause pathological convergence, so mice() tries to avoid that.

Record 11 means: V50 was detected to be multicollinear with other variables when imputing V51. Since V50 is a frequent offender, it's wise to remove it if it's not critical to the main analysis. mice() will be faster en more stable.

@Bigsealion
Copy link
Author

Collinearity introduces feedback loops, and may cause pathological convergence, so mice() tries to avoid that.

Record 11 means: V50 was detected to be multicollinear with other variables when imputing V51. Since V50 is a frequent offender, it's wise to remove it if it's not critical to the main analysis. mice() will be faster en more stable.

Thank you for your answer! It's helpful for me!

@ehayeslarson
Copy link

Thanks for your interest.

The code you're citing comes from our JSS 2011 paper, which describes mice 2.09. Both errors have the same cause: They rely on the internal pad component to be present in the mids-object, and I admit that this is a hacky way to specify the model.

The pad component was superfluous and complicated development, and was removed in mice 3.0. The price is that the few code blocks that asses the pad component directly (like the above code) won't run anymore in mice 3.0 and beyond.

Variable p was the internal representation of the pad component, and is not used anymore. r is the internal representation of the response indicator.

There is plenty of example code for mice 3.0 here that you can study. The code is documented extensively in Flexible Imputation of Missing Data.

Hope this helps.

Hi,

Thank you so much for developing and maintaining this package! A quick question about alternatives to the pad object. I am interested in including interaction terms in my imputation model between factor variables (some ordered, some unordered). With the pad object eliminated, where can I see the names of indicator variables and polynomials created in the imputation model, so that I can create and passively impute the interaction terms? I only saw information about creating interactions for continuous variables in your linked code and updated book.

Relatedly, are interaction terms for which one of the variables is determined to be collinear automatically removed from the imputation model when the main effect is removed?

Thanks!

@stefvanbuuren
Copy link
Member

stefvanbuuren commented May 29, 2020

The formulas argument is an alternative to predictorMatrix specification, and now the preferred way to specify interaction terms (just use a list of formulas).

Collinear variables are removed after the formula for the target variable/block is expanded.

@ehayeslarson
Copy link

Very helpful--thank you!

@amices amices locked and limited conversation to collaborators Apr 1, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants