Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imputation using MLE #100

Closed
AndrMenezes opened this issue Oct 18, 2022 · 7 comments
Closed

Imputation using MLE #100

AndrMenezes opened this issue Oct 18, 2022 · 7 comments

Comments

@AndrMenezes
Copy link

Dear all,

Thanks for the package.

I have two comments concerning the imputation MLE method:

  1. Does the input matrix of impute_mle function shouldn't be transpose? Since the rows of x should correspond to the observational unit (samples) and the columns to variable (proteins/peptides).
  2. The norm package is an old R package and there are some limitation, e.g., it does not work reliably when the number of variables exceeds 30. Please, consider take a look at norm2 package.
@jorainer
Copy link
Member

@lgatto or @cvanderaa , can you please have a look?

@lgatto
Copy link
Member

lgatto commented Oct 21, 2022

Thank you @AndrMenezes - I had a quick look and

  • Yes, will switch to norm2, but I'll need to make sure I can simply replace norm::em.norm() and norm::imp.norm() by norm2::emNorm() and norm2::impNorm().
  • Re the transposition, I will also have to check because we want to impute features/rows, not samples/cols (there is a bit of a debate about this going on, in SCP), but both are conceptually valid.

@AndrMenezes
Copy link
Author

@lgatto , thank you for the reply,

  1. I understand the goal of imputation. However, the norm and norm2 packages assume that the observations (samples) comes from a multivariate normal distribution, where the variables (features) have incomplete values. I am not sure if both are conceptually valid when using the multivariate normal distribution.
  2. We can replace norm::em.norm() by norm2::emNorm() and use the component y.mean.imp as the imputed matrix. Please, see the following:
# Generate data
set.seed(1)
x <- matrix(rnorm(10 * 1000), ncol = 1000)
i <- sample(1:10, size = 3, replace = FALSE)
j <- sample(1:1000, size = 10, replace = FALSE)
x_miss <- x
x_miss[i, j] <- NA_real_

# Norm
s <- norm::prelim.norm(t(x_miss))
th <- norm::em.norm(s)
norm::rngseed(1)
x_imp_norm <- norm::imp.norm(s, th, t(x_miss))

# Norm2
res <- norm2::emNorm(obj = t(x_miss), prior = "uniform")
x_imp_norm2 <- res$y.mean.imp

# Comparison
x_imp_norm2[j, i]
x_imp_norm[j, i]
t(x[i, j])

Note that, if you use x_miss instead of t(x_miss) in norm2::emNorm() the algorithm will not converge.

@lgatto
Copy link
Member

lgatto commented Oct 24, 2022

Thank you very much! Your issue raised a really good point that is actually applicable to many other imputation methods (and has only been discussed in parts). In addition to your fixes, I will expand on this in the manual page.

@AndrMenezes
Copy link
Author

@lgatto Thanks. Notice that this assumption (columns: variables and rows: realizations of the variables) was briefly mentioned in Section 3 of Hastie et al. (2001), where the authors emphasized this data structure for imputation using regression.
Besides that, there are others methods such as SVD and KNN, which the usual data representation (features x samples) can be considered.

@lgatto
Copy link
Member

lgatto commented Oct 26, 2022

Indeed, and for KNN for example, using features x samples or sample x features makes strong assumptions on the downstream analyses. It will take some time until I close the issue, but it will be done for sure, including a section in the documentation.

@lgatto
Copy link
Member

lgatto commented Nov 8, 2022

Hi @AndrMenezes - thanks again for your issue and useful discussions. There's now a new impute_mle2() function that uses norm2, as per your suggestion. I also confirm that the default behaviour is to impute along the columns (i.e. the transposed features by samples matrix).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants