Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include fully-observed variables as a response or a covariate in MVNI? #2

Open
willizhang opened this issue Aug 18, 2023 · 5 comments

Comments

@willizhang
Copy link

Hello! I'm unsure if I'm in the right place to ask a question. :) I've been utilizing the JOMO package and have found immense value in reading several of your research papers. They've provided me with a wealth of information and guidance. I currently have a question about the best approach for including fully-observed variables within MVNI models: should they be considered responses or covariates? I would greatly appreciate your insights on this matter.

Suppose I have a substantive analysis model, with missing values in X1 ONLY:
Y = b0 + b1 * X1 + b2 * X2 + b3 * X3 + b4 * X2 * X3

I would like to use MVNI to impute missing values in JOMO.

My questions related to MVNI are:

  1. Considering congeniality, how one can decide whether including the fully-observed variables (X2, X3, Y) in the MVNI, as response variable or covariate?
  2. Can the fully-observed interaction term X2 * X3 be included as covariate?
  3. In this case, since there is only one missing variable, it seems that MVNI is not applicable if X2, X3 and Y are included as covariate for imputing X1.

References to question 1 and 2:
Question 1:
In the Book "Carpenter JR, Kenward MG. Multiple Imputation and Its Application (First Edition). John Wiley & Sons, Ltd. 2013." (p. 129), it says that "For fully observed continuous and binary variables, the conditional distribution for imputing the partially observed variables will be practically equivalent, whether they are included as a response or as covariates."

Question 2:
a. In the same book (p. 131): "Summarising, when a quadratic, and in general nonlinear, relationship involving a fully observed variable is important in the substantive model, this nonlinear relationship must be included in the linear predictor for each partially observed variable in the imputation model, whether a joint or FCS approach is adopted."

b. In your paper "jomo: A Flexible Package for Two-level Joint Modelling Multiple Imputation", on p. 21, it says "When interactions or non-linear terms are present in the model of interest, ignoring them in the imputation model may lead to bias; instead, they should be included as covariates (Carpenter and Kenward, 2013, p. 130)."

Thank you so much again for the these very helpful papers!

@Matteo21Q
Copy link
Owner

Matteo21Q commented Aug 18, 2023 via email

@willizhang
Copy link
Author

Hi Matteo,

Thank you so much for sharing your insights and expertise about this. It is very helpful! :D

Best regards,
Willi

@willizhang
Copy link
Author

Hi Matteo,

I am currently utilizing a weighted multinomial logistic regression model for my substantive analysis:

Y = b0 + b1*X1 + b2*X2 + b3*X1*X2 + b4*X3 + b5*X4

Here, both Y (unordered categorical variable) and X3 (ordinal variable) have missing values.

Inspired by your recent publications, I intend to implement a two-level MVNI model with latent normal variable approach (paper 1; paper 2) (which I found really helpful :)). Here is a simplified R script for my proposed model using the jomo package:

jomo( Y = data[ , c( "Y", "X3" ) ], # suppose data includes all the variables
      X = data[ , c( "constant", "X1", "X2", "X1_X2_interaction", "X4" ], # constant = 1
      clus = data$weight_strata, # level-two strata defined using weight
      meth = "random" )

Congeniality concerns
(I don’t know if I understand correctly) It would be essential that the MVNI model reflect the interaction between X1 and X2 as well as the interactions between survey weights and all covariates (X1 through X4) as presented in the substantive model.

If this is true, my related questions are:
Q1. In light of our earlier discussion, the MVNI model should include the fully-observed variables X1, X2, and X1*X2 as covariates. However, what is the proper way to include the fully-observed X4 in the two-level MVNI – as a covariate (as in the R script) or as a response variable? (reference to the question below)

Q2. If I restrict the response variables to only the partly-observed Y and X3, it seems that the model would not adequately capture the interactions between the weight variable and X1, X2, and X4. Would this be an important issue? :)

Q3. How can auxiliary variables be incorporated into a (multi-level) MVNI model? Would it be reasonable to consider the following strategy regardless of single-level or multilevel: include fully-observed auxiliary variables as covariates and partly-observed ones as dependent variables?

References:
In your paper “jomo: A Flexible Package for Two-level Joint Modelling Multiple Imputation”, p. 4,

In joint modelling imputation, partially observed variables are dependent variables. However, as hinted above, with fully observed variables we can choose to either condition on them as predictors or include them in the (multivariate) response. The software is equally comfortable with both options, and it makes little difference in practice for single-level data. However, the choice has a bigger impact for clustered data, as we will see in the multilevel imputation section.

Another question related to the paper:
p. 8:

Fully observed binary covariates can be included in the X matrix of the imputation model as type numeric, exactly as with sex in this example. To include fully observed categorical covariates with three or more categories, appropriate dummy variables have to be created. For this purpose, we might use the R package dummies (Brown, 2012) or the function constrasts in base R.

Q4. Suppose X1 is a categorical variable with 5 categories, can it be included as it is in the script, or should it be coded in a specific way?

Your insights would be highly appreciated. Thank you so much for your time and expertise!

Best regards,
Willi

@Matteo21Q
Copy link
Owner

Matteo21Q commented Aug 29, 2023 via email

@willizhang
Copy link
Author

willizhang commented Aug 29, 2023

Hi Matteo,

Thank you so much again for sharing your invaluable expertise and kindly helping me with my questions!


UPDATES: Regarding whether in jomoImputecategorical variables (included as predictors in imputation model) are treated as numeric or factor, I checked the output from jomoImpute and found that categorical variables (with multiple categories) remain as factor in imputed datasets. :) So it seems that there is no need to create dummy variables for categorical variables which are included as predictors in the imputation model.

Warm wishes,
Willi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants