Skip to content

Contains code simulations for the ICML 2023 paper "Leveraging Factored Action Spaces for Off-Policy Evaluation" by Aaman Rebello, Shengpu Tang, Sonali Parbhoo and Jenna Wiens.

Notifications You must be signed in to change notification settings

ai4ai-lab/Factored-Action-Spaces-for-OPE

Repository files navigation

Factored-Action-Spaces-for-OPE

Contains code simulations for the ICML 2023 paper "Leveraging Factored Action Spaces for Off-Policy Evaluation" by Aaman Rebello, Shengpu Tang, Jenna Wiens and Sonali Parbhoo.

This repo consists of four families of files:

  1. Jupyter Notebooks: 1_Step_MDP.ipynb and 4_State_MDP.ipynb. These JuPyter notebooks contain the code that generates off-policy data sets from the toy MDP problems, performs OPE estimates on them using the decomposed and non decomposed estimators, and then generates graphs. The data used to generate each graph is saved and displayed within cells of the notebook as a Python dictionary. All of the graphs from the paper are also displayed.

  2. Configs: The configs folder, containing two sub-folders: 1-step-MDP and 4-state-MDP, one for each factorisable toy MDP problem. Each sub-folder contains the configuration for its respective problem e.g. the transition probability matrix and reward matrix of the overall MDP, as well as factored versions of these for each factored action space, which are held within a sub-sub-folder called "factorisation". Each sub-folder also contains information mapping actions to factored actions and states to state abstractions based on the factored action space. Each sub-folder also defines a behavior policy and evaluation policy as well as factored versions of these within the "factorisations" sub-folder. In both of these problems, Theorem 1 from Tang et al holds for the MDP and both policies - the configurations must be rewritten for Theorem 1 to be violated.

  3. Python Files: These contain functions required to load, process and analyse the MDP and policy configs (load_discrete_MDP.py, discrete_MDP_helper_functions.py), generate (generate_dataset.py) and load (load_datasets.py) off-policy data from the MDPs and policies, and perform OPE estimates on off-policy data (policy_estimators.py). These scripts are all called from within the Jupyter notebooks.

  4. Dataset Placeholder: The [datasets][datasets] folder is designed to hold .npy files consisting of factored and non-factored off-policy data generated by MDP problems. In this way, data generated as numpy arrays in RAM can be stored in a more persistent form. The file load_datasets.py loads from the datasets folder.

About

Contains code simulations for the ICML 2023 paper "Leveraging Factored Action Spaces for Off-Policy Evaluation" by Aaman Rebello, Shengpu Tang, Sonali Parbhoo and Jenna Wiens.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published