Deeply-Debiased Off-Policy Interval Estimation (D2OPE)

This repository is the official implementation of the paper "Deeply-Debiased Off-Policy Interval Estimation" (ICML 2021) in Python.

Summary of the paper

Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel procedure to construct an efficient, robust, and flexible CI on a target policy's value. Our method is justified by theoretical results and numerical experiments.

Method	Results

File overview

Code files in the main folder:
1. Methods:
  1. _TRIPLE.py: main function for the proposed method
  2. _IS.py: code to implement the two IS-based competing methods
2. Environment:
  1. _Ohio_Simulator.py: simulate for the Diabates environment
  2. _cartpole.py: simulate for the Cartpole environment, forked from OpenAI Gym, with slight modifications.
3. _util.py: helper functions
4. _analyze.py: post-process simulation results
/density: functions for estimating the two density ratio functions
/coinDice: code for the competing method "coinDice". Forked from https://github.com/google-research/dice_rl
/target_policies: checkpoints for the learned target policies
/RL: some useful RL functions
1. DQN.py and FQI.py: implementation of the target/behaviour policies
2. FQE.py: function for estimating the initial Q function
3. my_gym.py: helper functions for training
4. sampler.py: samplers and replay buffers
/TOY: code to generate the two plots for toy examples
1. TOY_coverage.ipynb: for the plot showing the CI coverage
2. TOY_TRIPLY.ipynb: for the plot showing the triply robust property
3. _plot.py: helper functions for plotting
4. _discrete.py: TR method for discrete state space
/script: scripts to run the experiments.

Reproduce simulation results

To reproduce our simulation experiment results, please follow the steps:

install the required packages
change the working directory to the main folder
open the jupyer notebook and modify the hyper-parameters
run and analyze the output results

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
TOY		TOY
_RL		_RL
_density		_density
coinDice		coinDice
script		script
.DS_Store		.DS_Store
ICML_D2OPE_Slides.pdf		ICML_D2OPE_Slides.pdf
LICENSE		LICENSE
README.md		README.md
_IS.py		_IS.py
_Ohio_Simulator.py		_Ohio_Simulator.py
_TRIPLE.py		_TRIPLE.py
__init__.py		__init__.py
_analyze.py		_analyze.py
_cartpole.py		_cartpole.py
_util.py		_util.py
diagram.png		diagram.png
diagram_600_600.png		diagram_600_600.png
diagram_slides.key		diagram_slides.key
ohio_all.png		ohio_all.png
ohio_n.png		ohio_n.png
ohio_tau.png		ohio_tau.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deeply-Debiased Off-Policy Interval Estimation (D2OPE)

Summary of the paper

File overview

Reproduce simulation results

About

Releases

Packages

Languages

License

callmespring/D2OPE

Folders and files

Latest commit

History

Repository files navigation

Deeply-Debiased Off-Policy Interval Estimation (D2OPE)

Summary of the paper

File overview

Reproduce simulation results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages