Fast block jackknife standard errors for linear regression.
blockjack computes linear-model coefficients and block jackknife standard
errors efficiently by:
- Splitting rows into blocks.
- Precomputing block-wise
X'XandX'y. - Reusing those crossproducts for all leave-one-block-out fits.
This avoids refitting the model from scratch for each block and is much faster
than naive jackknife loops when n_blocks is large.
The computational trick follows the LD Score regression jackknife approach:
Bulik-Sullivan, B., Loh, PR., Finucane, H. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47, 291-295 (2015). https://doi.org/10.1038/ng.3211
# install from GitHub
if (!requireNamespace("remotes", quietly = TRUE)) {
install.packages("remotes")
}
remotes::install_github("MichelNivard/blockjack")# or install from a local checkout (from package root)
install.packages(".", repos = NULL, type = "source")OLS example:
library(blockjack)
set.seed(1)
n <- 20000
x1 <- rnorm(n)
x2 <- rnorm(n)
y <- 0.5 + 0.7 * x1 - 0.2 * x2 + rnorm(n)
d <- data.frame(y = y, x1 = x1, x2 = x2)
fit <- bjlm(y ~ x1 + x2, data = d, n_blocks = 200)
summary(fit)WLS example:
set.seed(2)
n <- 20000
x1 <- rnorm(n)
x2 <- rnorm(n)
v <- exp(0.7 * x1)
w <- 1 / v
y <- 0.5 + 0.7 * x1 - 0.2 * x2 + rnorm(n, sd = sqrt(v))
d <- data.frame(y = y, x1 = x1, x2 = x2, w = w)
fit_wls <- bjlm(y ~ x1 + x2, data = d, weights = w, n_blocks = 200)
summary(fit_wls)Cluster-robust block jackknife example:
set.seed(3)
n <- 20000
cluster_size <- 10
n_clusters <- n / cluster_size
cl <- rep(seq_len(n_clusters), each = cluster_size)
g1 <- rep(rnorm(n_clusters), each = cluster_size)
g2 <- rep(rnorm(n_clusters), each = cluster_size)
x1 <- sqrt(0.8) * g1 + sqrt(0.2) * rnorm(n)
x2 <- sqrt(0.8) * g2 + sqrt(0.2) * rnorm(n)
u <- rep(rnorm(n_clusters, sd = 2), each = cluster_size)
y <- 0.5 + 0.7 * x1 - 0.2 * x2 + u + rnorm(n)
d <- data.frame(y = y, x1 = x1, x2 = x2, cl = cl)
# clusters are kept intact when forming jackknife blocks
fit_cl <- bjlm(y ~ x1 + x2, data = d, cluster = "cl", n_blocks = 200)
summary(fit_cl)Cluster notes:
- When
clusteris supplied,bjlm()assigns whole clusters to jackknife blocks (no cluster is split). - It warns if
n_clusters < n_blocksand automatically reduces blocks ton_clusters. - It warns if the largest cluster exceeds
n / n_blocks.
Validation summary (100 simulations, N=20000, cluster size 10, strong within-cluster correlation):
- Ratio
JK SE / vcovCL SEquantiles (5%, 50%, 95%)(Intercept):0.9398, 0.9955, 1.0763x1:0.9191, 1.0007, 1.0657x2:0.9174, 0.9966, 1.0825 - In the same setup,
vcovCLSEs were much larger than OLS SEs (medianvcovCL / OLSaround2.59for slopes), so this is a non-trivial clustered setting.
Example output (OLS):
Call:
bjlm(formula = y ~ x1 + x2, data = d, n_blocks = 200)
Coefficients:
Estimate Jackknife SE z value Pr(>|z|)
(Intercept) 0.493175 0.006641 74.26 <2e-16 ***
x1 0.686438 0.007263 94.51 <2e-16 ***
x2 -0.196584 0.007320 -26.86 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.0071 on 19997 degrees of freedom
Jackknife blocks: 200