BUMEX: Bounded Uncertainty Model-based Exploration

This repository contains the implementation of BUMEX (Bounded Uncertainty Model-based Exploration), a reinforcement learning method that uses prior model knowledge to guide exploration and accelerate learning.

📄 Paper: Smart Exploration in Reinforcement Learning Using Bounded Uncertainty Models (2025 64th IEEE Conference on Decision and Control (CDC), Rio de Janeiro, Brazil)

Overview

BUMEX leverages bounded uncertainty models to compute Q-function bounds via convex optimization. These bounds can then be used to guide the RL agent's exploration in a clever way. We provide implementations of BUMEX alongside baseline exploration strategies for comparison.

Implemented exploration policies:

BUMEX (Exploring Policy): Our novel bounded uncertainty-based exploration
Epsilon-Greedy: Standard epsilon-greedy exploration
UCB1: Upper Confidence Bound algorithm
UCRL2: Optimistic reinforcement learning
Thompson Sampling (PSRL): Posterior sampling for RL

Quick Start

# Install dependencies
pip install numpy matplotlib cvxpy gymnasium scipy mosek

# Run single experiments
python experiments/frozen_lake.py
python experiments/cartpole.py  
python experiments/taxi.py

# Run statistical comparisons with Monte Carlo
python experiments/monte_carlo.py

# Analyze results in Jupyter notebook
jupyter notebook notebooks/results_comparison.ipynb

Repository Structure

├── experiments/               # Experiment runners
├── src/                       # Core implementations
│   ├── exploring_policy.py    # BUMEX implementation
│   ├── epsilon_greedy.py      # Baseline policies
│   ├── ucb1.py
│   ├── ucrl2.py
│   ├── thompson_sampling.py
│   ├── *_wrapper.py           # Environment interfaces
│   └── utils.py               # Utilities
├── config/                    # JSON configuration files  
└── notebooks/                 # Visualization of results

Environments

FrozenLake: Discrete grid world with stochastic transitions

Simple benchmark for tabular RL methods
Configurable grid size and slip probability
BUMEX uses model set that models adjacency in grid world as well as the exact reward function

CartPole: Pole balancing with continuous states

Finite state abstraction wrapper integrated to reduce to tabular RL
Physics-based samples used to generate uncertainty model set
Model bound caching system for computational speedup
BUMEX uses an uncertain transition model and the exact reward function

Taxi: Discrete pickup/delivery task

Slightly more complicated grid-world benchmark for tabular RL methods
BUMEX uses model set that models adjacency in grid world and has an uncertain reward model

Dependencies

numpy: Numerical computations
cvxpy: Convex optimization for Q-bounds computation
mosek: High-performance optimization solver (recommended for best performance)
gymnasium: RL environments
scipy: Spatial computations
matplotlib: Visualization

Note: While other solvers can be used with CVXPY, MOSEK provides significantly better performance for the convex optimization problems in BUMEX.

Current Status

The codebase is functional with key features implemented. Ongoing consolidation includes:

Code cleanup and documentation improvements
Performance optimizations
Enhanced error handling

Citation

If you use this code, please cite our paper:

@inproceedings{hulst2025,
  title={Smart Exploration in Reinforcement Learning Using Bounded Uncertainty Models},
  archivePrefix = {arXiv},
  arxivId = {2504.05978},
  eprint = {2504.05978},
  author={van Hulst, J. S. and Heemels, W P M H and Antunes, D J},
  booktitle={2025 64th IEEE Conference on Decision and Control (CDC)},
  publisher = {IEEE},
  url = {https://arxiv.org/abs/2504.05978},
  year={2025}
}

ArXiv preprint: https://arxiv.org/abs/2504.05978

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BUMEX: Bounded Uncertainty Model-based Exploration

Overview

Quick Start

Repository Structure

Environments

Dependencies

Current Status

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
config		config
experiments		experiments
notebooks		notebooks
src		src
README.md		README.md

JvHulst/BUMEX

Folders and files

Latest commit

History

Repository files navigation

BUMEX: Bounded Uncertainty Model-based Exploration

Overview

Quick Start

Repository Structure

Environments

Dependencies

Current Status

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages