sql: support JOIN #2970

Closed
petermattis opened this Issue Oct 30, 2015 · 10 comments

Projects

None yet

6 participants

@petermattis
Contributor

Support JOIN in all of its wonderful incarnations. The initial implementation should focus on correctness on not worry about optimizing the join order based on table statistics.

@petermattis petermattis added the SQL label Oct 30, 2015
@petermattis petermattis added this to the 1.0 milestone Oct 30, 2015
@petermattis petermattis added enhancement and removed SQL labels Feb 13, 2016
@Freeaqingme

Has there been any work done on this feature behind the scenes? If not, is there perhaps some design documentation already available?

It could be fun to try to give this one a shot to contribute...

@RaduBerinde
Member

We have been thinking about the more general problem of how to distribute SQL computation across the cluster, there is an RFC at https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/distributed_sql.md

We are aiming to have some limited join support before that, using code that will be reusable later within the distributed SQL framework. But even that will involve quite a bit of code restructuring, TBH this doesn't seem like a good starter project for someone who isn't familiar with the codebase.

@Freeaqingme

Alright. I will let this one slide then, see if there's something interesting with the help-wanted label.

Tnx!

@dt
Member
dt commented May 2, 2016

@Freeaqingme
A few options come to mind that might be well suited for getting started with the codebase:

  • CREATE TABLE ... AS #2483
  • TEMP table support #5807
  • Add support for password-based auth to the pgwire protocol #6457
@electrum
electrum commented May 4, 2016

Have you considered writing a Presto connector for CockroachDB? Presto is a full distributed SQL query engine with pluggable connectors (data sources) and supports distributed joins, including joins between different connectors.

We support batch index joins for tables that are indexed on the join key and otherwise support broadcast and distributed hash joins.

@petermattis
Contributor

@electrum Presto appears targeted at analytics, while CockroachDB is targeted at transactional workloads. Beyond that, Presto is written in Java while CockroachDB is written in Go. Calling out to Java for SQL execution doesn't seem good from a performance perspective for transactional workloads.

@electrum
electrum commented May 5, 2016

@petermattis You're correct, Presto is definitely targeted at analytics, although the engine itself is capable of low latency queries. We have an internal connector at Facebook based on a sharded MySQL backend that can do complex, multi-way index join queries for reporting workloads in hundreds of milliseconds: https://www.youtube.com/watch?v=Gf9JqvNNRZg

I'm definitely not suggesting calling out to or trying to use Presto within CoackroachDB -- as you say, that would be horrible for transactional workloads, nor is it technically feasible. However, it could be a good complement for other workloads like reporting, analytics, ETL, batch pipelines, combining heterogeneous data sources, etc., and might also serve as a stop gap.

@electrum
electrum commented May 5, 2016

Unrelated, I really like that design document and all the rest of the documentation for the project. It's probably the best documented project I've seen and is a model for others to strive towards.

@petermattis
Contributor

CockroachDB speaks the postgres wire protocol and our SQL is similar to the PostgreSQL dialect. The existing presto-postgres connector might work (with some adjustments).

Thanks for the note about the documentation. Is it nice to to hear those efforts are being recognized and appreciated.

@tamird
Member
tamird commented Jul 2, 2016

I'm going to close this now #7202 is in. There's more work to be done, but the spirit of this issue is implemented.

@tamird tamird closed this Jul 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment