This repository consolidates code for experiments and evaluation of cooperative query decomposition.
To simplify onboarding for others, the Mohair repository has been added as
a submodule to this repository as subprojects/libmohair
. This is a break from my typical
convention to match meson's conventions for subprojects.
This section is a bootstrap guide for trying out the code in this repository.
Note that I may have forgotten to include some steps necessary for installation. If this is the case, let me know or file an issue in the mohair issue tracker.
The C++ code in this repository depends on Arrow, Substrait, and DuckDB. I am trying to simplify installation of dependencies, but for now this is only done for macosx using Homebrew.
I created a homebrew tap, which is located at drin/homebrew-hatchery:
# Opening my tap is optional
brew tap drin/hatchery
brew install apache-arrow-substrait
# In case my tap isn't tapped
# brew install drin/hatchery/apache-arrow-substrait
# and then the other formulas
brew install duckdb-substrait
# this is not yet working
# brew install skytether-mohair
To build C++
code, I use meson. To manage python
code, I use
poetry.
To build the C++
code:
brew install meson ninja git-lfs
git clone https://github.com/drin/mohair-experiments.git
pushd mohair
# Optional: If you're not doing development, then you can let meson handle subprojects
# git submodule init -- subprojects/libmohair
# git submodule update -- subprojects/libmohair
# This is needed to pull the duckdb database from git LFS. Note it's ~2G
git lfs install --local
git lfs pull
# "build-dir" is the name I use for my build directory
meson setup build-dir
# Use the build-dir for compilation artifacts; also, this should grab mohair if necessary
meson compile -C build-dir
Work in progress; this requires installing python API for duckdb and I want to figure out how to do it in the correct virtualenv.
A very simple way to sanity check that everything went well:
# This is compiled from `src/cpp/toolbox/explain-substrait.cpp`
./build-dir/explain-substrait
The binary does the following:
- Convert a hardcoded SQL query to substrait (using
duckdb::Connection::GetSubstrait
) - Convert the substrait plan into a duckdb physical plan (using TableFunction
translate_mohair
).
The TableFunction used in step 2 is registered by the substrait extension for duckdb which
is automatically loaded. The duckdb-substrait formula we install from the drin/hatchery
tap (see the libmohair subproject) compiles a customized substrait extension with a
(mildly) customized duckdb.