Skip to content
Richard Weiss edited this page Jun 5, 2025 · 3 revisions

Home

Welcome to the dice_rl_TU_Vienna wiki! 🥳 Here we provide documentation for our policy evaluation API.


⚠️ A Technical Remark

Note that the policy value is defined as

$$ \rho^\pi \doteq (1 - \gamma) \mathcal E_\pi \left [ \sum_{t=0}^\infty \gamma^t r_t \right ] \quad \text{for } 0 < \gamma < 1 \quad \text{and} \quad \rho^\pi \doteq \lim_{H \to \infty} \mathcal E_\pi \left [ \frac{1}{H+1} \sum_{t=0}^H r_t \right ] \quad \text{for } \gamma = 1. $$

Here, $\gamma$ is the discount factor, $\pi$ is the evaluation policy, and $r_t$ is the reward at time $t$.

DICE-based methods are designed for infinite-horizon settings. If your environment terminates after a finite horizon, consider looping it or modeling termination with absorbing states to better reflect infinite-horizon assumptions.

Before using the library in depth, we strongly recommend reading the documentation carefully — especially the Background section — to understand key assumptions and concepts. You may also benefit from reviewing the example project linked below for a concrete application.


📚 Documentation Overview

Jump directly to:

  • Background — Key assumptions, estimators, and Bellman equations.
  • Dataset and Policies — Required dataset structure and policy representation.
  • Hyperparameters — Configuration details for DICE estimators.
  • Algorithms — List of implemented algorithms and their expected input formats.

🔬 Application Example

For a practical application of these estimators in the healthcare domain, see our related repository:

👉 dice_rl_sepsis — Code and experiments for the publication Evaluating Reinforcement-Learning-based Sepsis Treatments via Tabular and Continuous Stationary Distribution Correction Estimation.

Clone this wiki locally