This repository is the official implementation of the paper "Deeply-Debiased Off-Policy Interval Estimation" (ICML 2021) in Python.
Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel procedure to construct an efficient, robust, and flexible CI on a target policy's value. Our method is justified by theoretical results and numerical experiments.
Method | Results |
---|---|
- Code files in the main folder:
- Methods:
_TRIPLE.py
: main function for the proposed method_IS.py
: code to implement the two IS-based competing methods
- Environment:
_Ohio_Simulator.py
: simulate for theDiabates
environment_cartpole.py
: simulate for theCartpole
environment, forked from OpenAI Gym, with slight modifications.
_util.py
: helper functions_analyze.py
: post-process simulation results
- Methods:
/density
: functions for estimating the two density ratio functions/coinDice
: code for the competing method "coinDice". Forked fromhttps://github.com/google-research/dice_rl
/target_policies
: checkpoints for the learned target policies/RL
: some useful RL functionsDQN.py
andFQI.py
: implementation of the target/behaviour policiesFQE.py
: function for estimating the initial Q functionmy_gym.py
: helper functions for trainingsampler.py
: samplers and replay buffers
/TOY
: code to generate the two plots for toy examplesTOY_coverage.ipynb
: for the plot showing the CI coverageTOY_TRIPLY.ipynb
: for the plot showing the triply robust property_plot.py
: helper functions for plotting_discrete.py
: TR method for discrete state space
/script
: scripts to run the experiments.
To reproduce our simulation experiment results, please follow the steps:
- install the required packages
- change the working directory to the main folder
- open the jupyer notebook and modify the hyper-parameters
- run and analyze the output results