daisy provides estimation and bootstrap tools for Data-Adaptive Integration with Summary Data using generalized entropy balancing (GEB).
Main functions
EB_est()— estimator (supportsauto = TRUEmodel selection)EB_bootstrap_var()— bootstrap variance (no auto inside bootstrap; specify the model explicitly)
# using pak (recommended)
# install.packages("pak")
pak::pak("KMorikawaISU/daisy")
# or using remotes
# install.packages("remotes")
remotes::install_github("KMorikawaISU/daisy")Simplest setting: [Internal] x1 ~ N(0,1), [External] x1 ~ N(0.5,1), x2 ~ Bern(0.5), **y | x ~ N(0.5*x1 − x2, 1)**,
Sample sizes: [internal] n = 200, [external] n_ext = 2000.
This chunk is **not executed** to keep the README light (eval = FALSE).
library(daisy)
## 1) Generate toy data
set.seed(1)
n <- 200
n_ext <- 2000
x1_int <- rnorm(n, 0, 1)
x2_int <- rbinom(n, 1, 0.5)
y_int <- rnorm(n, 0.5 * x1_int - x2_int, 1)
x1_ext <- rnorm(n_ext, 0.5, 1)
x2_ext <- rbinom(n_ext, 1, 0.5)
y_ext <- rnorm(n_ext, 0.5 * x1_ext - x2_ext, 1)
## 2) Build inputs
dat_int <- data.frame(
x1_int = x1_int,
x2_int = x2_int,
y_int = y_int
)
# IMPORTANT: Do NOT include an intercept in MU_int / MU_ext.
MU_int <- cbind(x1_int, x1_int^2, x2_int) # (x1, x1^2, x2)
MU_ext <- c(mean(x1_ext), mean(x1_ext^2), mean(x2_ext)) # external means for the same features
eta <- mean(y_ext) # step-2 target (mean of outcome)
## 3-1) Estimation with auto model search
fit <- EB_est(
dat_int, MU_int, MU_ext, eta,
auto = TRUE,
r_set = c(0.01, 0.1, 0.5, 1),
link = "identity"
)
## Inspect results
fit$best_model
fit$result["est"]
fit$Entropy2
## Human-readable summary (point estimate + selected model + per-model Entropy2/D1/D2)
#summary(fit)
## 3-2) Estimation with specified model
#fit <- EB_est(
# dat_int, MU_int, MU_ext, eta,
# auto = FALSE,
# divergence = "LW",
# r = 0.1,
# link = "identity"
#)
## Human-readable summary (point estimate + selected model + per-model Entropy2/D1/D2)
summary(fit)
## 4) Bootstrap variance (no auto; fix the selected model)
#B1: Number of bootstrap repetitions for SigmaW_hat
#B2: Number of bootstrap repetitions for theta_boot
bt <- EB_bootstrap_var(
dat_int, MU_int, MU_ext, eta,
n_ext = n_ext, B1 = 200, B2 = 500,
w_type = TRUE,
link = "identity",
divergence = fit$best_model$divergence,
r = fit$best_model$r,
external.boot = TRUE
)
bt$bootstrap_se
bt$ci_percentile_95- First-step moments:
MU_int/MU_extmust be provided without an intercept. - Second step uses KL (fixed). The second-step weights are computed from the step-2 linear predictor (LMD2) obtained by KL calibration.