Imputation using MLE #100

AndrMenezes · 2022-10-18T12:10:53Z

Dear all,

Thanks for the package.

I have two comments concerning the imputation MLE method:

Does the input matrix of impute_mle function shouldn't be transpose? Since the rows of x should correspond to the observational unit (samples) and the columns to variable (proteins/peptides).
The norm package is an old R package and there are some limitation, e.g., it does not work reliably when the number of variables exceeds 30. Please, consider take a look at norm2 package.

The text was updated successfully, but these errors were encountered:

jorainer · 2022-10-21T08:37:30Z

@lgatto or @cvanderaa , can you please have a look?

lgatto · 2022-10-21T08:55:04Z

Thank you @AndrMenezes - I had a quick look and

Yes, will switch to norm2, but I'll need to make sure I can simply replace norm::em.norm() and norm::imp.norm() by norm2::emNorm() and norm2::impNorm().
Re the transposition, I will also have to check because we want to impute features/rows, not samples/cols (there is a bit of a debate about this going on, in SCP), but both are conceptually valid.

AndrMenezes · 2022-10-23T10:34:06Z

@lgatto , thank you for the reply,

I understand the goal of imputation. However, the norm and norm2 packages assume that the observations (samples) comes from a multivariate normal distribution, where the variables (features) have incomplete values. I am not sure if both are conceptually valid when using the multivariate normal distribution.
We can replace norm::em.norm() by norm2::emNorm() and use the component y.mean.imp as the imputed matrix. Please, see the following:

# Generate data
set.seed(1)
x <- matrix(rnorm(10 * 1000), ncol = 1000)
i <- sample(1:10, size = 3, replace = FALSE)
j <- sample(1:1000, size = 10, replace = FALSE)
x_miss <- x
x_miss[i, j] <- NA_real_

# Norm
s <- norm::prelim.norm(t(x_miss))
th <- norm::em.norm(s)
norm::rngseed(1)
x_imp_norm <- norm::imp.norm(s, th, t(x_miss))

# Norm2
res <- norm2::emNorm(obj = t(x_miss), prior = "uniform")
x_imp_norm2 <- res$y.mean.imp

# Comparison
x_imp_norm2[j, i]
x_imp_norm[j, i]
t(x[i, j])

Note that, if you use x_miss instead of t(x_miss) in norm2::emNorm() the algorithm will not converge.

lgatto · 2022-10-24T06:06:53Z

Thank you very much! Your issue raised a really good point that is actually applicable to many other imputation methods (and has only been discussed in parts). In addition to your fixes, I will expand on this in the manual page.

AndrMenezes · 2022-10-26T10:46:19Z

@lgatto Thanks. Notice that this assumption (columns: variables and rows: realizations of the variables) was briefly mentioned in Section 3 of Hastie et al. (2001), where the authors emphasized this data structure for imputation using regression.
Besides that, there are others methods such as SVD and KNN, which the usual data representation (features x samples) can be considered.

lgatto · 2022-10-26T10:52:06Z

Indeed, and for KNN for example, using features x samples or sample x features makes strong assumptions on the downstream analyses. It will take some time until I close the issue, but it will be done for sure, including a section in the documentation.

lgatto · 2022-11-08T07:50:42Z

Hi @AndrMenezes - thanks again for your issue and useful discussions. There's now a new impute_mle2() function that uses norm2, as per your suggestion. I also confirm that the default behaviour is to impute along the columns (i.e. the transposed features by samples matrix).

cvanderaa mentioned this issue Nov 7, 2022

Impute by margin #101

Merged

lgatto closed this as completed Nov 8, 2022

lgatto mentioned this issue Jan 7, 2023

MLE for proteomics data imputation #109

Open

lgatto mentioned this issue Nov 3, 2023

norm2 has been removed from CRAN #117

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Imputation using MLE #100

Imputation using MLE #100

AndrMenezes commented Oct 18, 2022

jorainer commented Oct 21, 2022

lgatto commented Oct 21, 2022

AndrMenezes commented Oct 23, 2022

lgatto commented Oct 24, 2022

AndrMenezes commented Oct 26, 2022

lgatto commented Oct 26, 2022

lgatto commented Nov 8, 2022

Imputation using MLE #100

Imputation using MLE #100

Comments

AndrMenezes commented Oct 18, 2022

jorainer commented Oct 21, 2022

lgatto commented Oct 21, 2022

AndrMenezes commented Oct 23, 2022

lgatto commented Oct 24, 2022

AndrMenezes commented Oct 26, 2022

lgatto commented Oct 26, 2022

lgatto commented Nov 8, 2022