Skip to content

Valentyn1997/OR-learners

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Orthogonal representation learning for CATE estimation

A novel class of Neyman-orthogonal learners for causal quantities defined at the representation level

image

The project is built with the following Python libraries:

  1. PyTorch
  2. Hydra - simplified command line arguments management
  3. MlFlow - experiments tracking
  4. normflows - a PyTorch package for normalizing flows

Setup

Installations

First one needs to make the virtual environment and install all the requirements:

pip3 install virtualenv
python3 -m virtualenv -p python3 --always-copy venv
source venv/bin/activate
pip3 install -r requirements.txt

MlFlow Setup / Connection

To start an experiments server, run:

mlflow server --port=5000 --gunicorn-opts "--timeout 280"

To access the MlFLow web UI with all the experiments, connect via ssh:

ssh -N -f -L localhost:5000:localhost:5000 <username>@<server-link>

Then, one can go to the local browser http://localhost:5000.

Semi-synthetic datasets setup

Before running semi-synthetic experiments, place datasets in the corresponding folders:

 ── data/acic_2016
    ├── synth_outcomes
    |   ├── zymu_<id0>.csv   
    |   ├── ... 
    │   └── zymu_<id14>.csv 
    ├── ids.csv
    └── x.csv 

Experiments

The main training script is universal for different methods and datasets. For details on mandatory arguments - see the main configuration file config/config.yaml and other files in config/ folder.

Generic script with logging and fixed random seed is the following:

PYTHONPATH=.  python3 runnables/train.py +dataset=<dataset> +repr_net=<model> exp.seed=10

Datasets

One needs to specify a dataset / dataset generator (and some additional parameters, e.g. train size for the synthetic data dataset.n_samples_train=1000, or a subset index for ACIC 2016 data dataset.dataset_ix=0):

Models

Stage 0.

One needs to choose a model and then fill in the specific hyperparameters (they are left blank in the configs):

  • TARNet/TARFlow: +repr_net=tarnet +repr_net_type=dense / +repr_net=tarnet +repr_net_type=res_flow
  • BNN/BNNFlow: +repr_net=bnnet +repr_net_type=dense / +repr_net=bnnet +repr_net_type=res_flow
  • CFR/CFRFlow: +repr_net=cfrnet +repr_net_type=dense / +repr_net=cfrnet +repr_net_type=res_flow
  • RCFR/RCFRFlow: +repr_net=rcfrnet +repr_net_type=dense / +repr_net=rcfrnet +repr_net_type=res_flow
  • CFR-ISW/CFRNet-ISW: +repr_net=cfrisw +repr_net_type=dense / +repr_net=cfrisw +repr_net_type=res_flow
  • BWCFR/BWCFRFlow: +repr_net=bwcfr +repr_net_type=dense / +repr_net=bwcfr +repr_net_type=res_flow

Models already have the best hyperparameters saved, for each model - dataset and different sizes of the representation. One can access them via: +repr_net/<dataset>_hparams/<model>=<n_samples_train> or +model/<dataset>_hparams/<model>/<n_samples_train>=<ipm_params> etc. To perform manual hyperparameter tuning use the flags repr_net.tune_hparams=True, and then see repr_net.hparams_grid.

Stage 1.

Stage 1 models are propensity networks (src/models/prop_nets.py) and outcome networks (src/models/mu_nets.py). The hyperparameters were tuned together with the stage 0 models and are stored in the same YAML files. To perform manual hyperparameter tuning use the flags prop_net_cov.tune_hparams=True and mu_net_cov.tune_hparams=True.

Stage 2.

Stage 2 models are defined in config/config.yaml and src/models/target_net.py. One needs to specify the list of second-stage models to fit for exp.targets:

  • CAPOs estimation with $\text{DR}^{\text{K}}_a$-learner: exp.targets="['mu0', 'mu1']"
  • CAPOs estimation with $\text{DR}^{\text{FS}}_a$-learner: exp.targets="['y0', 'y1']"
  • CATE estimation with $\text{DR}^{\text{K}}$: exp.targets="['cate']"
  • CATE estimation with $\text{R}$: exp.targets="['rcate']"
  • CATE estimation with $\text{IVW}$: exp.targets="['ivw_pi_cate']"

Examples

Example of running TARFlow (CFR w/ $\alpha = 0$) without tuning based on synthetic data with n_train = 500 (stage 0):

CUDA_VISIBLE_DEVICES=<devices> PYTHONPATH=. python3 runnables/train.py -m +dataset=synthetic +repr_net=tarnet +repr_net/synthetic_hparams/tarnet_res_flow=\'500\' exp.logging=True exp.device=cuda exp.seed=10 exp.targets="[]"

Example of running BWCFRFlow without tuning based on synthetic data with n_train = 500, $\alpha \in \{0.01, 0.02, 0.05\}$, and IPM = MMD (stages 0-2):

CUDA_VISIBLE_DEVICES=<devices> PYTHONPATH=. python3 runnables/train.py -m +dataset=synthetic +repr_net=bwcfr +repr_net/synthetic_hparams/bwcfr_res_flow=\'500\' exp.logging=True repr_net.ipm=mmd repr_net.alpha=0.01,0.02,0.05 exp.device=cuda exp.seed=10 exp.targets="['mu0', 'mu1', 'y0', 'y1', 'cate', 'rcate']"

Example of all-stages tuning of CFR based on the 0-th subset of IHDP100 dataset with IPM = WM and $\alpha = 0.1$ (stages 0-2):

CUDA_VISIBLE_DEVICES=<devices> PYTHONPATH=. -m +dataset=ihdp100 +repr_net=cfrnet exp.logging=True exp.device=cuda dataset.dataset_ix=0 repr_net.ipm=wass repr_net.alpha=0.1 repr_net.tune_hparams=True prop_net_cov.tune_hparams=True mu_net_cov.tune_hparams=True exp.targets="['mu0', 'mu1', 'y0', 'y1', 'cate', 'rcate']"

Project based on the cookiecutter data science project template. #cookiecutterdatascience

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages