MLCausal

Causal Inference Methods for Multilevel and Clustered Data

MLCausal provides a structured workflow for estimating causal effects in clustered observational data, such as students within schools, patients within hospitals, or employees within firms.

The package integrates the full causal design pipeline for multilevel settings: cluster-aware propensity score estimation, inverse-probability weighting, within-cluster matching, covariate balance diagnostics at both the individual and cluster levels, overlap assessment, outcome modelling with cluster-robust standard errors, and sensitivity analysis for unmeasured confounding.

Installation

# From CRAN
install.packages("MLCausal")

# Development version from GitHub
# install.packages("remotes")
remotes::install_github("causalfragility-lab/MLCausal")

Required dependencies installed automatically: sandwich, lmtest, ggplot2, rlang.

Workflow

simulate_ml_data()
        |
     ml_ps()                  cluster-aware propensity score estimation
      /     \
ml_weight() ml_match()        inverse-probability weighting  OR  matching
      \     /
   balance_ml()               balance diagnostics (individual + cluster levels)
   plot_overlap_ml()          overlap / positivity assessment
        |
estimate_att_ml()             outcome model with cluster-robust SEs
        |
     sens_ml()                tipping-point sensitivity analysis

Functions

Function	Description
`simulate_ml_data()`	Generate clustered observational data with a known data-generating process
`ml_ps()`	Cluster-aware propensity score estimation: Mundlak (default), fixed-effects, or single-level
`ml_weight()`	ATT / ATE inverse-probability weights, stabilised and with optional trimming
`ml_match()`	Within-cluster nearest-neighbour matching with a dual-balance composite distance
`balance_ml()`	Standardised mean differences at both the individual and cluster-mean level
`plot_overlap_ml()`	Propensity score overlap plot, overall or faceted by cluster
`estimate_att_ml()`	Weighted linear outcome model with cluster-robust (HC1) standard errors
`sens_ml()`	Tipping-point sensitivity analysis for omitted cluster-level confounding

Quick start

library(MLCausal)

# -- 1. Simulate clustered data ----------------------------------------------
dat <- simulate_ml_data(
  n_clusters   = 30,
  cluster_size = 20,
  seed         = 42
)

# -- 2. Propensity score estimation (Mundlak method) -------------------------
ps <- ml_ps(
  data       = dat,
  treatment  = "z",
  covariates = c("x1", "x2", "x3"),
  cluster    = "school_id",
  method     = "mundlak"
)

# -- 3. Overlap (positivity) check -------------------------------------------
plot_overlap_ml(ps)

# -- 4. Inverse-probability weighting ----------------------------------------
dat_w <- ml_weight(
  ps_fit    = ps,
  estimand  = "ATT",
  stabilize = TRUE,
  trim      = 10
)

# -- 5. Balance diagnostics --------------------------------------------------
bal <- balance_ml(
  data       = dat_w,
  treatment  = "z",
  covariates = c("x1", "x2", "x3"),
  cluster    = "school_id",
  weights    = "weights"
)
print(bal)

# -- 6. Treatment effect estimation ------------------------------------------
est <- estimate_att_ml(
  data       = dat_w,
  outcome    = "y",
  treatment  = "z",
  cluster    = "school_id",
  covariates = c("x1", "x2", "x3"),
  weights    = "weights"
)
print(est)

# -- 7. Sensitivity analysis -------------------------------------------------
sens <- sens_ml(
  estimate = est$estimate,
  se       = est$se
)

# First confounder strength that would nullify significance
sens[sens$crosses_null, ][1, ]

Matching path (alternative to weighting)

# Within-cluster matching with dual-balance penalty (lambda = 1)
matched <- ml_match(
  ps_fit  = ps,
  ratio   = 1,
  caliper = 0.5,
  lambda  = 1
)

# Balance on the matched sample
bal_m <- balance_ml(
  data       = matched$data_matched,
  treatment  = "z",
  covariates = c("x1", "x2", "x3"),
  cluster    = "school_id",
  weights    = "match_weight"
)
print(bal_m)

# Effect estimate on the matched sample
est_m <- estimate_att_ml(
  data      = matched$data_matched,
  outcome   = "y",
  treatment = "z",
  cluster   = "school_id",
  weights   = "match_weight"
)
print(est_m)

The lambda argument controls the dual-balance penalty. lambda = 0 recovers standard within-cluster propensity score matching; larger values increasingly penalise matches that worsen cluster-mean covariate balance.

Citation

If you use MLCausal in published research, please cite:

Hait, S. (2026). MLCausal: Causal inference methods for multilevel and clustered data. R package.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
R		R
build		build
inst		inst
man		man
tests		tests
vignettes		vignettes
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
MD5		MD5
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLCausal

Installation

Workflow

Functions

Quick start

Matching path (alternative to weighting)

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MLCausal

Installation

Workflow

Functions

Quick start

Matching path (alternative to weighting)

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages