Factored-Action-Spaces-for-OPE

Contains code simulations for the ICML 2023 paper "Leveraging Factored Action Spaces for Off-Policy Evaluation" by Aaman Rebello, Shengpu Tang, Jenna Wiens and Sonali Parbhoo.

This repo consists of four families of files:

Jupyter Notebooks: 1_Step_MDP.ipynb and 4_State_MDP.ipynb. These JuPyter notebooks contain the code that generates off-policy data sets from the toy MDP problems, performs OPE estimates on them using the decomposed and non decomposed estimators, and then generates graphs. The data used to generate each graph is saved and displayed within cells of the notebook as a Python dictionary. All of the graphs from the paper are also displayed.
Configs: The configs folder, containing two sub-folders: 1-step-MDP and 4-state-MDP, one for each factorisable toy MDP problem. Each sub-folder contains the configuration for its respective problem e.g. the transition probability matrix and reward matrix of the overall MDP, as well as factored versions of these for each factored action space, which are held within a sub-sub-folder called "factorisation". Each sub-folder also contains information mapping actions to factored actions and states to state abstractions based on the factored action space. Each sub-folder also defines a behavior policy and evaluation policy as well as factored versions of these within the "factorisations" sub-folder. In both of these problems, Theorem 1 from Tang et al holds for the MDP and both policies - the configurations must be rewritten for Theorem 1 to be violated.
Python Files: These contain functions required to load, process and analyse the MDP and policy configs (load_discrete_MDP.py, discrete_MDP_helper_functions.py), generate (generate_dataset.py) and load (load_datasets.py) off-policy data from the MDPs and policies, and perform OPE estimates on off-policy data (policy_estimators.py). These scripts are all called from within the Jupyter notebooks.
Dataset Placeholder: The [datasets][datasets] folder is designed to hold .npy files consisting of factored and non-factored off-policy data generated by MDP problems. In this way, data generated as numpy arrays in RAM can be stored in a more persistent form. The file load_datasets.py loads from the datasets folder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configs

configs

datasets

datasets

1_Step_MDP.ipynb

1_Step_MDP.ipynb

4_State_MDP.ipynb

4_State_MDP.ipynb

README.md

README.md

discrete_MDP_helper_functions.py

discrete_MDP_helper_functions.py

generate_dataset.py

generate_dataset.py

load_datasets.py

load_datasets.py

load_discrete_MDP.py

load_discrete_MDP.py

policy_estimators.py

policy_estimators.py

Repository files navigation

Factored-Action-Spaces-for-OPE

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
configs		configs
datasets		datasets
1_Step_MDP.ipynb		1_Step_MDP.ipynb
4_State_MDP.ipynb		4_State_MDP.ipynb
README.md		README.md
discrete_MDP_helper_functions.py		discrete_MDP_helper_functions.py
generate_dataset.py		generate_dataset.py
load_datasets.py		load_datasets.py
load_discrete_MDP.py		load_discrete_MDP.py
policy_estimators.py		policy_estimators.py

ai4ai-lab/Factored-Action-Spaces-for-OPE

Folders and files

Latest commit

History

Repository files navigation

Factored-Action-Spaces-for-OPE

About

Resources

Stars

Watchers

Forks

Languages