Skip to content

A repository to collect our work on treatment effect estimation.

License

Notifications You must be signed in to change notification settings

Pacmed/causal_inference

Repository files navigation

Estimating Treatment Effects from EHR Data

The repository contains code used by Pacmed Labs to estimate treatment effects from electronic health record data. It allows for easy replication of the experiment reported in:

  • "A pragmatic approach to estimating average treatment effects from EHR data: the effect of prone positioning on mechanically ventilated COVID-19 patients" (Izdebski et al., 2021) (in preparation)

Getting Started

To install requirements:

conda env create -f environment.yml

📋 To set up the environment create a conda environment using the .yml file.

Use the BART model within an R environment:

conda env create -f environment_R.yml

📋 To set up the environment for BART model create a separate conda environment.

For using the CfR and TARNET models, use the official implementation with the configuration specified for the CfR and TARNET models.

📋 To set up the environment for CfR and TARNET models create a separate conda environment using environment_cfrnet.yml.

Experiments

For replicating the experiments for Outcome Regression, IPW and Blocking models run:

from causal_inference.model.ols import OLS
from causal_inference.model.weighting import IPW
from causal_inference.model.blocking import Blocking
from causal_inference.experiments.run import Experiment

N_OF_ITERATIONS = 100

batch_of_models = [OLS(), IPW(), Blocking()]

for model in batch_of_models:

    experiment = Experiment(causal_model=model,
                            n_of_iterations=N_OF_ITERATIONS)
    experiment.run(y_train=y_train, t_train=t_train,
                   X_train=X_train, y_test=y_test,
                   t_test=t_test, X_test=X_test)

📋 To execute all the steps used in the experiments execute the code available notebooks.

For replicating the BART experiment run:

Rscript BART.R

For replicating the CfR and TARNET experiments install the official implementation and run:

mkdir results
mkdir results/<config_file_name>

python cfr_param_search.py ../causal_inference/model/<config_file> 20

python evaluate.py ../causal_inference/model/<config_file> 1

Note that:

❗ To replicate the experiments you need to request access to data. For a detailed description see below.

✨ To compare any scikit-learn models, simply add them to `batch_of_models':

batch_of_models = [OLS(), IPW(), Blocking(), RandomForestRegressor()]

Running the experiments on your data

The repository can be customized to be used on any dataset. The script causal_inference/make_data/data.py describes the shape required for a Dataloader class to feed data to the experiments.

Meta

The project is part of the Pacmed Labs research agenda and was funded by SIDNFonds.

Contributing

  1. Fork it (https://github.com/Pacmed/causal_inference)
  2. Create your feature branch (git checkout -b feature/fooBar)
  3. Commit your changes (git commit -am 'Add some fooBar')
  4. Push to the branch (git push origin feature/fooBar)
  5. Create a new Pull Request

📇 Further Details

Motivation

Estimating treatment effects from observational data remains a central problem in the field of causal inference. In recent years, significant advances were made using modern machine learning and deep learning approaches.

However, it is yet to be established how to utilize those advances on real-world medical data in order to provide relevant clinical insights. In particular it is not clear, to what extend new models will outperform traditional methods based on propensity score models and allow for individual treatment effect estimation.

The methods in this repository were employed to estimate the average treatment of prone positioning on mechanically ventilated COVID-19 patients. Prone positioning is a commonly used technique for the treatment of severely hypoxemic mechanically ventilated patients with acute respiratory distress syndrome and it’s effectiveness was not yet clinically confirmed, when performed on COVID-19 patients. This provides a direct clinical use case on which various models can be compared.

Original Data

For the purpose of the observational study we used data collected in the Dutch Data Warehouse (DDW). The DDW is the result of a intensive care unit data sharing collaboration in the Nether- lands that was initiated during the COVID-19 pandemic. The DDW includes data on demographics, comorbidities, monitoring and life support devices, laboratory results, clinical observations, medi- cations, fluid balance, and outcomes. Request to data can be requested through https://icudata.nl/index-en.html.

Used Models

  • Outcome Regressions
  • IPW
  • Blocking
  • BART
  • TARNET
  • CFR

About

A repository to collect our work on treatment effect estimation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published