An R package for denoising censored, Gaussian means with empirical Bayes
The data is represented with matrices:
The bounds
We use a Tobit likelihood for each measurement:
where the standard Gaussian likelihood is used when there is a direct Gaussian measurement (ie
This package provides an object ebTobit
(Empirical Bayes model with Tobit likelihood) that estimates the prior, gr
and then computes the posterior mean or gr
is set using the exemplar method so the grid is the maximum likelihood estimate for each
Suppose
library(ebTobit)
# create noisy measurements
n <- 100
t <- sample(c(0, 5), size = n, replace = TRUE, prob = c(0.8, 0.2))
x <- t + stats::rnorm(n)
# fit g-model with default prior grid
res1 <- ebTobit(x)
# measure performance of estimated posterior mean
mean((t - fitted(res1))^2)
Next we can look at a more complicated example with
library(ebTobit)
# create noisy measurements (low rank structure)
n <- 1000; p <- 10
t <- matrix(stats::rgamma(n*p, shape = 5, rate = 1), n, p)
x <- t + matrix(stats::rnorm(n*p), n, p)
# assume we can't accurately measure x < 1 but we know theta > 0
L <- ifelse(x < 1, 0, x)
R <- ifelse(x < 1, 1, x)
# fit g-model with default prior grid
res2 <- ebTobit(x)
res3 <- ebTobit(L, R)
# oberve that the censoring affects the fitted range
range(fitted(res2))
range(fitted(res3))
# fit censored data with a different grid (large and random not MLE)
res4 <- ebTobit(
L = L,
R = R,
gr = sapply(1:p, function(j) stats::runif(1e+4, min = min(L[,j]), max = max(R[,j]))),
algorithm = "EM"
)
# compute posterior mean and L1mediod given new data
# we can also predict based on partially interval-censored observations
y <- matrix(stats::rexp(5*p, rate = 0.5), 5, p)
predict(res4, y) # posterior mean
predict(res4, y, method = "L1mediod") # posterior L1-mediod
This package is available on CRAN. It can also be installed directly from GitHub:
remotes::install_github("barbehenna/ebTobit")
This R package also includes a real bile acid data.frame
taken directly from Lei et al. (2018) (https://doi.org/10.1096/fj.201700055R) via https://github.com/WandeRum/GSimp (https://doi.org/10.1371/journal.pcbi.1005973). The bile acid data contains measurements of 34 bile acids for 198 patients; no missing values are present in the data. In our modeling, we assume the bile acid values are independent log-normal measurements.
data(BileAcid, package = "ebTobit") # attach the bile acid data
Alton Barbehenn and Sihai Dave Zhao
GPL (>= 3)