Skip to content

MichelNivard/blockjack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

blockjack

Fast block jackknife standard errors for linear regression.

What It Does

blockjack computes linear-model coefficients and block jackknife standard errors efficiently by:

  1. Splitting rows into blocks.
  2. Precomputing block-wise X'X and X'y.
  3. Reusing those crossproducts for all leave-one-block-out fits.

This avoids refitting the model from scratch for each block and is much faster than naive jackknife loops when n_blocks is large.

Method Source

The computational trick follows the LD Score regression jackknife approach:

Bulik-Sullivan, B., Loh, PR., Finucane, H. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47, 291-295 (2015). https://doi.org/10.1038/ng.3211

Install locally

# install from GitHub
if (!requireNamespace("remotes", quietly = TRUE)) {
  install.packages("remotes")
}
remotes::install_github("MichelNivard/blockjack")
# or install from a local checkout (from package root)
install.packages(".", repos = NULL, type = "source")

Usage

OLS example:

library(blockjack)

set.seed(1)
n <- 20000
x1 <- rnorm(n)
x2 <- rnorm(n)
y <- 0.5 + 0.7 * x1 - 0.2 * x2 + rnorm(n)
d <- data.frame(y = y, x1 = x1, x2 = x2)

fit <- bjlm(y ~ x1 + x2, data = d, n_blocks = 200)
summary(fit)

WLS example:

set.seed(2)
n <- 20000
x1 <- rnorm(n)
x2 <- rnorm(n)
v <- exp(0.7 * x1)
w <- 1 / v
y <- 0.5 + 0.7 * x1 - 0.2 * x2 + rnorm(n, sd = sqrt(v))
d <- data.frame(y = y, x1 = x1, x2 = x2, w = w)

fit_wls <- bjlm(y ~ x1 + x2, data = d, weights = w, n_blocks = 200)
summary(fit_wls)

Cluster-robust block jackknife example:

set.seed(3)
n <- 20000
cluster_size <- 10
n_clusters <- n / cluster_size
cl <- rep(seq_len(n_clusters), each = cluster_size)

g1 <- rep(rnorm(n_clusters), each = cluster_size)
g2 <- rep(rnorm(n_clusters), each = cluster_size)
x1 <- sqrt(0.8) * g1 + sqrt(0.2) * rnorm(n)
x2 <- sqrt(0.8) * g2 + sqrt(0.2) * rnorm(n)
u <- rep(rnorm(n_clusters, sd = 2), each = cluster_size)
y <- 0.5 + 0.7 * x1 - 0.2 * x2 + u + rnorm(n)
d <- data.frame(y = y, x1 = x1, x2 = x2, cl = cl)

# clusters are kept intact when forming jackknife blocks
fit_cl <- bjlm(y ~ x1 + x2, data = d, cluster = "cl", n_blocks = 200)
summary(fit_cl)

Cluster notes:

  1. When cluster is supplied, bjlm() assigns whole clusters to jackknife blocks (no cluster is split).
  2. It warns if n_clusters < n_blocks and automatically reduces blocks to n_clusters.
  3. It warns if the largest cluster exceeds n / n_blocks.

Validation summary (100 simulations, N=20000, cluster size 10, strong within-cluster correlation):

  1. Ratio JK SE / vcovCL SE quantiles (5%, 50%, 95%) (Intercept): 0.9398, 0.9955, 1.0763 x1: 0.9191, 1.0007, 1.0657 x2: 0.9174, 0.9966, 1.0825
  2. In the same setup, vcovCL SEs were much larger than OLS SEs (median vcovCL / OLS around 2.59 for slopes), so this is a non-trivial clustered setting.

Example output (OLS):

Call:
bjlm(formula = y ~ x1 + x2, data = d, n_blocks = 200)

Coefficients:
             Estimate Jackknife SE z value Pr(>|z|)    
(Intercept)  0.493175     0.006641   74.26   <2e-16 ***
x1           0.686438     0.007263   94.51   <2e-16 ***
x2          -0.196584     0.007320  -26.86   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.0071 on 19997 degrees of freedom
Jackknife blocks: 200

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors