JoinGym is an efficient and lightweight query optimization environment for reinforcement learning (RL). JoinGym also comes with a new dataset of intermediate result cardinalities for imdb/joingym
.
First, install Gymnasium in Python 3.9. Then, to install JoinGym, run
cd join-optimization
pip install -e .
You can verify that JoinGym is installed correctly by running test_env.py
provided in the main directory.
As shown in that test file, creating a JoinGym environment is as simple as
import gymnasium as gym
import join_optimization # register JoinGym
env = gym.make(
"join_optimization_left-v0",
db_schema=db_schema,
join_contents=join_contents,
)
where db_schema
is the database schema (e.g. imdb/schema.txt
) and join_contents
is a map from query_id
to the IR cardinalities of that query (e.g. imdb/joingym/q1_0.json
).
JoinGym adheres to the standard Gymnasium API, with two key methods.
- state, info = env.reset(options={query_id=x})
- next_state, reward, done, _, info = env.step(action)
We provide implementations of DQN, PPO, SAC and TD3 in the algorithms
folder. These implementations were modified from CleanRL to handle action masks and prioritized replay.
To get started, install
PyTorch, cpprb, and wandb.
Our scripts accept two flags.
First, --enable-bushy
can be used to enable bushy plans; otherwise, only left-deep plans are allowed by default. Second, --disable-cartesian-product
can be used to disable Cartesian product (CP)actions; otherwise, CPs will be allowed by default.
For example, to run PPO on the left-deep environment and without CPs,
python test_ppo.py --disable-cartesian-product
To run SAC on the bushy environment and with CPs,
python test_sac.py --enable-bushy
By default, these scripts will run JoinGym with our whole dataset of imdb/joingym
. You can modify the environment initialization to use a subset of queries, or to use queries from the Join Order Benchmark (JOB). Data for the JOB is stored in imdb/job
.
To cite this work, please use the following BibTex.
@inproceedings{
wang2024joingym,
title={JoinGym: An Efficient Join Order Selection Environment},
author={Wang$^\star$, Kaiwen and Wang$^\star$, Junxiong and Li, Yueying and Kallus, Nathan and Trummer, Immanuel and Sun, Wen},
booktitle={Reinforcement Learning Conference},
year={2024},
url={https://openreview.net/forum?id=dQahvyT8p4}
}