We are aiming to have some limited join support before that, using code that will be reusable later within the distributed SQL framework. But even that will involve quite a bit of code restructuring, TBH this doesn't seem like a good starter project for someone who isn't familiar with the codebase.
Have you considered writing a Presto connector for CockroachDB? Presto is a full distributed SQL query engine with pluggable connectors (data sources) and supports distributed joins, including joins between different connectors.
We support batch index joins for tables that are indexed on the join key and otherwise support broadcast and distributed hash joins.
@electrum Presto appears targeted at analytics, while CockroachDB is targeted at transactional workloads. Beyond that, Presto is written in Java while CockroachDB is written in Go. Calling out to Java for SQL execution doesn't seem good from a performance perspective for transactional workloads.
@petermattis You're correct, Presto is definitely targeted at analytics, although the engine itself is capable of low latency queries. We have an internal connector at Facebook based on a sharded MySQL backend that can do complex, multi-way index join queries for reporting workloads in hundreds of milliseconds: https://www.youtube.com/watch?v=Gf9JqvNNRZg
I'm definitely not suggesting calling out to or trying to use Presto within CoackroachDB -- as you say, that would be horrible for transactional workloads, nor is it technically feasible. However, it could be a good complement for other workloads like reporting, analytics, ETL, batch pipelines, combining heterogeneous data sources, etc., and might also serve as a stop gap.
Support JOIN in all of its wonderful incarnations. The initial implementation should focus on correctness on not worry about optimizing the join order based on table statistics.
The text was updated successfully, but these errors were encountered: