New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support JOIN #2970

Closed
petermattis opened this Issue Oct 30, 2015 · 10 comments

Comments

Projects
None yet
6 participants
@petermattis
Contributor

petermattis commented Oct 30, 2015

Support JOIN in all of its wonderful incarnations. The initial implementation should focus on correctness on not worry about optimizing the join order based on table statistics.

@petermattis petermattis added the SQL label Oct 30, 2015

@petermattis petermattis added this to the 1.0 milestone Oct 30, 2015

@jess-edwards jess-edwards referenced this issue Oct 30, 2015

Closed

Product Roadmap #2132

49 of 78 tasks complete

@mjibson mjibson referenced this issue Dec 2, 2015

Closed

sql: increase the sqllogic test coverage #3292

7 of 7 tasks complete

@petermattis petermattis added C-enhancement and removed SQL labels Feb 13, 2016

@Freeaqingme

This comment has been minimized.

Show comment
Hide comment
@Freeaqingme

Freeaqingme Apr 27, 2016

Has there been any work done on this feature behind the scenes? If not, is there perhaps some design documentation already available?

It could be fun to try to give this one a shot to contribute...

Freeaqingme commented Apr 27, 2016

Has there been any work done on this feature behind the scenes? If not, is there perhaps some design documentation already available?

It could be fun to try to give this one a shot to contribute...

@RaduBerinde

This comment has been minimized.

Show comment
Hide comment
@RaduBerinde

RaduBerinde Apr 27, 2016

Member

We have been thinking about the more general problem of how to distribute SQL computation across the cluster, there is an RFC at https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/distributed_sql.md

We are aiming to have some limited join support before that, using code that will be reusable later within the distributed SQL framework. But even that will involve quite a bit of code restructuring, TBH this doesn't seem like a good starter project for someone who isn't familiar with the codebase.

Member

RaduBerinde commented Apr 27, 2016

We have been thinking about the more general problem of how to distribute SQL computation across the cluster, there is an RFC at https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/distributed_sql.md

We are aiming to have some limited join support before that, using code that will be reusable later within the distributed SQL framework. But even that will involve quite a bit of code restructuring, TBH this doesn't seem like a good starter project for someone who isn't familiar with the codebase.

@Freeaqingme

This comment has been minimized.

Show comment
Hide comment
@Freeaqingme

Freeaqingme Apr 27, 2016

Alright. I will let this one slide then, see if there's something interesting with the help-wanted label.

Tnx!

Freeaqingme commented Apr 27, 2016

Alright. I will let this one slide then, see if there's something interesting with the help-wanted label.

Tnx!

@dt

This comment has been minimized.

Show comment
Hide comment
@dt

dt May 2, 2016

Member

@Freeaqingme
A few options come to mind that might be well suited for getting started with the codebase:

  • CREATE TABLE ... AS #2483
  • TEMP table support #5807
  • Add support for password-based auth to the pgwire protocol #6457
Member

dt commented May 2, 2016

@Freeaqingme
A few options come to mind that might be well suited for getting started with the codebase:

  • CREATE TABLE ... AS #2483
  • TEMP table support #5807
  • Add support for password-based auth to the pgwire protocol #6457
@electrum

This comment has been minimized.

Show comment
Hide comment
@electrum

electrum May 4, 2016

Have you considered writing a Presto connector for CockroachDB? Presto is a full distributed SQL query engine with pluggable connectors (data sources) and supports distributed joins, including joins between different connectors.

We support batch index joins for tables that are indexed on the join key and otherwise support broadcast and distributed hash joins.

electrum commented May 4, 2016

Have you considered writing a Presto connector for CockroachDB? Presto is a full distributed SQL query engine with pluggable connectors (data sources) and supports distributed joins, including joins between different connectors.

We support batch index joins for tables that are indexed on the join key and otherwise support broadcast and distributed hash joins.

@petermattis

This comment has been minimized.

Show comment
Hide comment
@petermattis

petermattis May 5, 2016

Contributor

@electrum Presto appears targeted at analytics, while CockroachDB is targeted at transactional workloads. Beyond that, Presto is written in Java while CockroachDB is written in Go. Calling out to Java for SQL execution doesn't seem good from a performance perspective for transactional workloads.

Contributor

petermattis commented May 5, 2016

@electrum Presto appears targeted at analytics, while CockroachDB is targeted at transactional workloads. Beyond that, Presto is written in Java while CockroachDB is written in Go. Calling out to Java for SQL execution doesn't seem good from a performance perspective for transactional workloads.

@electrum

This comment has been minimized.

Show comment
Hide comment
@electrum

electrum May 5, 2016

@petermattis You're correct, Presto is definitely targeted at analytics, although the engine itself is capable of low latency queries. We have an internal connector at Facebook based on a sharded MySQL backend that can do complex, multi-way index join queries for reporting workloads in hundreds of milliseconds: https://www.youtube.com/watch?v=Gf9JqvNNRZg

I'm definitely not suggesting calling out to or trying to use Presto within CoackroachDB -- as you say, that would be horrible for transactional workloads, nor is it technically feasible. However, it could be a good complement for other workloads like reporting, analytics, ETL, batch pipelines, combining heterogeneous data sources, etc., and might also serve as a stop gap.

electrum commented May 5, 2016

@petermattis You're correct, Presto is definitely targeted at analytics, although the engine itself is capable of low latency queries. We have an internal connector at Facebook based on a sharded MySQL backend that can do complex, multi-way index join queries for reporting workloads in hundreds of milliseconds: https://www.youtube.com/watch?v=Gf9JqvNNRZg

I'm definitely not suggesting calling out to or trying to use Presto within CoackroachDB -- as you say, that would be horrible for transactional workloads, nor is it technically feasible. However, it could be a good complement for other workloads like reporting, analytics, ETL, batch pipelines, combining heterogeneous data sources, etc., and might also serve as a stop gap.

@electrum

This comment has been minimized.

Show comment
Hide comment
@electrum

electrum May 5, 2016

Unrelated, I really like that design document and all the rest of the documentation for the project. It's probably the best documented project I've seen and is a model for others to strive towards.

electrum commented May 5, 2016

Unrelated, I really like that design document and all the rest of the documentation for the project. It's probably the best documented project I've seen and is a model for others to strive towards.

@petermattis

This comment has been minimized.

Show comment
Hide comment
@petermattis

petermattis May 5, 2016

Contributor

CockroachDB speaks the postgres wire protocol and our SQL is similar to the PostgreSQL dialect. The existing presto-postgres connector might work (with some adjustments).

Thanks for the note about the documentation. Is it nice to to hear those efforts are being recognized and appreciated.

Contributor

petermattis commented May 5, 2016

CockroachDB speaks the postgres wire protocol and our SQL is similar to the PostgreSQL dialect. The existing presto-postgres connector might work (with some adjustments).

Thanks for the note about the documentation. Is it nice to to hear those efforts are being recognized and appreciated.

@tamird

This comment has been minimized.

Show comment
Hide comment
@tamird

tamird Jul 2, 2016

Collaborator

I'm going to close this now #7202 is in. There's more work to be done, but the spirit of this issue is implemented.

Collaborator

tamird commented Jul 2, 2016

I'm going to close this now #7202 is in. There's more work to be done, but the spirit of this issue is implemented.

@tamird tamird closed this Jul 2, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment