Skip to content

drin/mohair-experiments

Repository files navigation

Overview

This repository consolidates code for experiments and evaluation of cooperative query decomposition.

Organization

To simplify onboarding for others, the Mohair repository has been added as a submodule to this repository as subprojects/libmohair. This is a break from my typical convention to match meson's conventions for subprojects.

Getting Started

This section is a bootstrap guide for trying out the code in this repository.

Installation

Note that I may have forgotten to include some steps necessary for installation. If this is the case, let me know or file an issue in the mohair issue tracker.

Dependencies

The C++ code in this repository depends on Arrow, Substrait, and DuckDB. I am trying to simplify installation of dependencies, but for now this is only done for macosx using Homebrew.

I created a homebrew tap, which is located at drin/homebrew-hatchery:

# Opening my tap is optional
brew tap drin/hatchery
brew install apache-arrow-substrait

# In case my tap isn't tapped
# brew install drin/hatchery/apache-arrow-substrait

# and then the other formulas
brew install duckdb-substrait

# this is not yet working
# brew install skytether-mohair

Build systems

To build C++ code, I use meson. To manage python code, I use poetry.

Building C++

To build the C++ code:

brew install meson ninja git-lfs

git clone https://github.com/drin/mohair-experiments.git
pushd mohair

# Optional: If you're not doing development, then you can let meson handle subprojects
# git submodule init   -- subprojects/libmohair
# git submodule update -- subprojects/libmohair

# This is needed to pull the duckdb database from git LFS. Note it's ~2G
git lfs install --local
git lfs pull

# "build-dir" is the name I use for my build directory
meson setup      build-dir

# Use the build-dir for compilation artifacts; also, this should grab mohair if necessary
meson compile -C build-dir
Building Python

Work in progress; this requires installing python API for duckdb and I want to figure out how to do it in the correct virtualenv.

Testing the build

A very simple way to sanity check that everything went well:

# This is compiled from `src/cpp/toolbox/explain-substrait.cpp`
./build-dir/explain-substrait

The binary does the following:

  1. Convert a hardcoded SQL query to substrait (using duckdb::Connection::GetSubstrait)
  2. Convert the substrait plan into a duckdb physical plan (using TableFunction translate_mohair).

The TableFunction used in step 2 is registered by the substrait extension for duckdb which is automatically loaded. The duckdb-substrait formula we install from the drin/hatchery tap (see the libmohair subproject) compiles a customized substrait extension with a (mildly) customized duckdb.

About

Various experiments for a query processing paper focusing on mohair and cooperative query decomposition

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published