New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support correlated subqueries #3288

Open
mjibson opened this Issue Dec 2, 2015 · 27 comments

Comments

@mjibson
Copy link
Member

mjibson commented Dec 2, 2015

https://github.com/cockroachdb/sqllogictest/blob/a88396b84bb1fe62edf2e072e585d68c0b2ecdca/test/select1.test#L109

Fails with:

logic_test.go:420: ../../sqllogictest/test/select1.test:109: expected success, but found pq: qualified name "x.b" not found

I believe this is because the subquery references both x and t1 (where t1 is from the parent FROM clause), but only x is in the visible tables list.


The description above was conflating two bugs into one. This is a real bug, but it isn't caused by the linked query.

@petermattis

This comment has been minimized.

Copy link
Contributor

petermattis commented Dec 2, 2015

Yeah, this is what I meant the other day when I stated that only basic subqueries work. Subqueries can utilize variables from their surrounding context, but we don't handle that yet.

@tamird

This comment has been minimized.

Copy link
Collaborator

tamird commented Dec 2, 2015

For reference, the query:

query IIIII nosort
SELECT a+b*2+c*3+d*4+e*5,
       CASE WHEN a<b-3 THEN 111 WHEN a<=b THEN 222
        WHEN a<b+3 THEN 333 ELSE 444 END,
       abs(b-c),
       (a+b+c+d+e)/5,
       a+b*2+c*3
  FROM t1
 WHERE (e>c OR e<d)
   AND d>e
   AND EXISTS(SELECT 1 FROM t1 AS x WHERE x.b<t1.b)
 ORDER BY 4,2,1,3,5

Subqueries can utilize variables from their surrounding context, but we don't handle that yet.

I don't think that's what's happening here.

@tamird

This comment has been minimized.

Copy link
Collaborator

tamird commented Dec 2, 2015

Closing in favour of #3291.

@tamird tamird closed this Dec 2, 2015

@petermattis

This comment has been minimized.

Copy link
Contributor

petermattis commented Dec 2, 2015

You're correct that this is not a correlated subquery, but we need to support those as well.

@petermattis petermattis reopened this Dec 2, 2015

@petermattis petermattis changed the title sql: table aliasing in subqueries needs to inherit from parent tables sql: support correlated subqueries Dec 2, 2015

@bdarnell

This comment has been minimized.

Copy link
Member

bdarnell commented Feb 5, 2016

Yikes! Correlated subqueries look scary (I had no idea such a thing was possible) and seem impossible to support efficiently except in limited cases (when they can be transformed into joins, which is what amazon redshift seems to do). I'm entirely comfortable with not supporting them.

@petermattis

This comment has been minimized.

Copy link
Contributor

petermattis commented Feb 5, 2016

Apparently correlated subqueries can always be transformed into joins. At least, I recall reading literature that stated that. The point of using a correlated subquery instead of a join is that the query can sometimes be written more naturally that way. I don't think we should support correlated subqueries before we support joins and then we should only support them by transforming them into joins.

@petermattis petermattis added this to the 1.0 milestone Feb 12, 2016

@petermattis petermattis removed the SQL label Feb 13, 2016

@knz knz added the A-sql-semantics label Feb 21, 2016

@mjibson

This comment has been minimized.

Copy link
Member

mjibson commented Aug 1, 2016

@knz is this done?

@knz

This comment has been minimized.

Copy link
Member

knz commented Aug 1, 2016

Not yet, wanna take it?

@mjibson

This comment has been minimized.

Copy link
Member

mjibson commented Aug 1, 2016

I was just making sure this wasn't done with the other JOIN work and that the issue should remain open.

@jordanlewis

This comment has been minimized.

Copy link
Member

jordanlewis commented Jan 18, 2017

This feature is required by the ActiveRecord query issued in #12783.

In case it's useful, here is a simple query that requires this feature that Postgres supports and we don't. In Postgres:

jordan=# SELECT nspname, (SELECT n.nspname) FROM pg_namespace n LIMIT 1;
 nspname  | nspname
----------+----------
 pg_toast | pg_toast
(1 row)

In CockroachDB:

root@:26257> SELECT nspname, (SELECT n.nspname) FROM pg_namespace n LIMIT 1;
pq: source name "n" not found in FROM clause
@jordanlewis

This comment has been minimized.

Copy link
Member

jordanlewis commented Jan 19, 2017

The example I posted seems to be a limited subcase of generalized correlated queries. Since ActiveRecord only needs that limited subcase, I've opened #12993 to track this particular subcase.

@jordanlewis

This comment has been minimized.

Copy link
Member

jordanlewis commented Jan 19, 2017

I take it back. The query in #12783 does actually need full correlated subquery support.

@knz knz assigned andy-kimball and unassigned jordanlewis and knz Apr 9, 2018

@knz knz added the A-sql-planning label May 9, 2018

timshannon added a commit to lexLibrary/lexLibrary that referenced this issue Jun 7, 2018

ivan added a commit to ivan/postgrex that referenced this issue Jul 9, 2018

ivan added a commit to ivan/postgrex that referenced this issue Jul 11, 2018

knz added a commit to knz/cockroach that referenced this issue Jul 11, 2018

sql: help the user understand unsupported correlation
Prior to this patch, a client trying to use an unsupported correlated
query would encounter an obscure error like "column v does not exist"
or "no data source matches prefix".

Given that the new optimizer code can determine whether a query is
correlated, we can use this information to enhance the error message.

Before:

```
> select * from pg_class a where exists (select * from pg_class b where a.oid = b.oid);
pq: no data source matches prefix: a

> select * from pg_class a where exists (select * from kv where v::oid = oid);
pq: column "oid" does not exist
```

After:

```
> select * from pg_class a where exists (select * from pg_class b where a.oid = b.oid);
pq: no data source matches prefix: a
HINT: some correlated subqueries are not supported yet - see
      cockroachdb#3288

> select * from pg_class a where exists (select * from kv where v::oid = oid);
pq: column "oid" does not exist
HINT: some correlated subqueries are not supported yet - see
      cockroachdb#3288
```

Note: some correlated queries do not benefit from this improvement,
specifically those for which the optimizer code aborts early before it
has detected correlation (e.g. because of some other unrelated
feature).

Release note (sql change): CockroachDB will now report a hint in the
error message if it encounters a correlated query that it does not
support yet.

knz added a commit to knz/cockroach that referenced this issue Jul 11, 2018

sql: help the user understand unsupported correlation
Prior to this patch, a client trying to use an unsupported correlated
query would encounter an obscure error like "column v does not exist"
or "no data source matches prefix".

Given that the new optimizer code can determine whether a query is
correlated, we can use this information to enhance the error message.

Before:

```
> select * from pg_class a where exists (select * from pg_class b where a.oid = b.oid);
pq: no data source matches prefix: a

> select * from pg_class a where exists (select * from kv where v::oid = oid);
pq: column "oid" does not exist
```

After:

```
> select * from pg_class a where exists (select * from pg_class b where a.oid = b.oid);
pq: no data source matches prefix: a
HINT: some correlated subqueries are not supported yet - see
      cockroachdb#3288

> select * from pg_class a where exists (select * from kv where v::oid = oid);
pq: column "oid" does not exist
HINT: some correlated subqueries are not supported yet - see
      cockroachdb#3288
```

Note: some correlated queries do not benefit from this improvement,
specifically those for which the optimizer code aborts early before it
has detected correlation (e.g. because of some other unrelated
feature).

Release note (sql change): CockroachDB will now report a hint in the
error message if it encounters a correlated query that it does not
support yet.

craig bot pushed a commit that referenced this issue Jul 11, 2018

Merge #27396
27396: sql: help the user understand unsupported correlation r=knz a=knz

Fixes #24684.

Prior to this patch, a client trying to use an unsupported correlated
query would encounter an obscure error like "column v does not exist"
or "no data source matches prefix".

Given that the new optimizer code can determine whether a query is
correlated, we can use this information to enhance the error message.

Before:

```
> select * from pg_class a where exists (select * from pg_class b where a.oid = b.oid);
pq: no data source matches prefix: a

> select * from pg_class a where exists (select * from kv where v::oid = oid);
pq: column "oid" does not exist
```

After:

```
> select * from pg_class a where exists (select * from pg_class b where a.oid = b.oid);
pq: no data source matches prefix: a
HINT: some correlated subqueries are not supported yet - see
      #3288

> select * from pg_class a where exists (select * from kv where v::oid = oid);
pq: column "oid" does not exist
HINT: some correlated subqueries are not supported yet - see
      #3288
```

Note: some correlated queries do not benefit from this improvement,
specifically those for which the optimizer code aborts early before it
has detected correlation (e.g. because of some other unrelated
feature).

Release note (sql change): CockroachDB will now report a hint in the
error message if it encounters a correlated query that it does not
support yet.


Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>

knz added a commit to knz/cockroach that referenced this issue Jul 11, 2018

sql: help the user understand unsupported correlation
Prior to this patch, a client trying to use an unsupported correlated
query would encounter an obscure error like "column v does not exist"
or "no data source matches prefix".

Given that the new optimizer code can determine whether a query is
correlated, we can use this information to enhance the error message.

Before:

```
> select * from pg_class a where exists (select * from pg_class b where a.oid = b.oid);
pq: no data source matches prefix: a

> select * from pg_class a where exists (select * from kv where v::oid = oid);
pq: column "oid" does not exist
```

After:

```
> select * from pg_class a where exists (select * from pg_class b where a.oid = b.oid);
pq: no data source matches prefix: a
HINT: some correlated subqueries are not supported yet - see
      cockroachdb#3288

> select * from pg_class a where exists (select * from kv where v::oid = oid);
pq: column "oid" does not exist
HINT: some correlated subqueries are not supported yet - see
      cockroachdb#3288
```

Note: some correlated queries do not benefit from this improvement,
specifically those for which the optimizer code aborts early before it
has detected correlation (e.g. because of some other unrelated
feature).

Release note (sql change): CockroachDB will now report a hint in the
error message if it encounters a correlated query that it does not
support yet.

craig bot pushed a commit that referenced this issue Jul 11, 2018

Merge #27390 #27396
27390: sql: properly reject nested generators in ROWS FROM r=knz a=knz

Fixes #27389.

Prior to this patch, invalid uses of SRFs as arguments to other
functions in ROWS FROM were not properly rejected, and were only
caught at evaluation time (i.e. much too late).

This patch fixes it by rejecting these uses early.

Release note (bug fix): invalid uses of set-generating functions in
FROM clauses are now reported with the same error code as PostgreSQL.

27396: sql: help the user understand unsupported correlation r=knz a=knz

Fixes #24684.

Prior to this patch, a client trying to use an unsupported correlated
query would encounter an obscure error like "column v does not exist"
or "no data source matches prefix".

Given that the new optimizer code can determine whether a query is
correlated, we can use this information to enhance the error message.

Before:

```
> select * from pg_class a where exists (select * from pg_class b where a.oid = b.oid);
pq: no data source matches prefix: a

> select * from pg_class a where exists (select * from kv where v::oid = oid);
pq: column "oid" does not exist
```

After:

```
> select * from pg_class a where exists (select * from pg_class b where a.oid = b.oid);
pq: no data source matches prefix: a
HINT: some correlated subqueries are not supported yet - see
      #3288

> select * from pg_class a where exists (select * from kv where v::oid = oid);
pq: column "oid" does not exist
HINT: some correlated subqueries are not supported yet - see
      #3288
```

Note: some correlated queries do not benefit from this improvement,
specifically those for which the optimizer code aborts early before it
has detected correlation (e.g. because of some other unrelated
feature).

Release note (sql change): CockroachDB will now report a hint in the
error message if it encounters a correlated query that it does not
support yet.


Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
@tim-o

This comment has been minimized.

Copy link
Contributor

tim-o commented Aug 12, 2018

Zendesk ticket #2713 has been linked to this issue.

@andy-kimball

This comment has been minimized.

Copy link
Contributor

andy-kimball commented Aug 24, 2018

Correlated subquery support is now part of 2.1, so closing this issue down.

@Freeaqingme

This comment has been minimized.

Copy link

Freeaqingme commented Sep 9, 2018

I'm running v2.1.0-beta.20180904, hoping to get Symfony+Doctrine ORM working with Cockroachdb. The query below, however, returns this error message:

SQLSTATE[42P01]: Undefined table: 7 ERROR: no data source matches prefix: a HINT: some correlated subqueries are not supported yet - see https://github.com/cockroachdb/cockroachissues/3288

Can you provide guidance? I'd assume that either the correlated subquery stuff somehow didn't make it into v2.1.0-beta.20180904, or that the error message is also used for a different issue.

  An exception occurred while executing 'SELECT                                              
                      a.attnum,                                                              
                      quote_ident(a.attname) AS field,                                       
                      t.typname AS type,                                                     
                      format_type(a.atttypid, a.atttypmod) AS complete_type, (SELECT tc.col  
  lcollate FROM pg_catalog.pg_collation tc WHERE tc.oid = a.attcollation) AS collation,      
                      (SELECT t1.typname FROM pg_catalog.pg_type t1 WHERE t1.oid = t.typbas  
  etype) AS domain_type,                                                                     
                      (SELECT format_type(t2.typbasetype, t2.typtypmod) FROM                 
                        pg_catalog.pg_type t2 WHERE t2.typtype = 'd' AND t2.oid = a.atttypi  
  d) AS domain_complete_type,                                                                
                      a.attnotnull AS isnotnull,                                             
                      (SELECT 't'                                                            
                       FROM pg_index                                                         
                       WHERE c.oid = pg_index.indrelid                                       
                          AND pg_index.indkey[0] = a.attnum                                  
                          AND pg_index.indisprimary = 't'                                    
                      ) AS pri,                                                              
                      (SELECT pg_get_expr(adbin, adrelid)                                    
                       FROM pg_attrdef                                                       
                       WHERE c.oid = pg_attrdef.adrelid                                      
                          AND pg_attrdef.adnum=a.attnum                                      
                      ) AS default,                                                          
                      (SELECT pg_description.description                                     
                          FROM pg_description WHERE pg_description.objoid = c.oid AND a.att  
  num = pg_description.objsubid                                                              
                      ) AS comment                                                           
                      FROM pg_attribute a, pg_class c, pg_type t, pg_namespace n             
                      WHERE n.nspname NOT IN ('pg_catalog', 'information_schema', 'pg_toast  
  ') AND c.relname = 'backward_dependencies' AND n.nspname = 'crdb_internal'                 
                          AND a.attnum > 0                                                   
                          AND a.attrelid = c.oid                                             
                          AND a.atttypid = t.oid                                             
                          AND n.oid = c.relnamespace                                         
                      ORDER BY a.attnum': 

/cc @andy-kimball @knz

@mjibson

This comment has been minimized.

Copy link
Member

mjibson commented Sep 10, 2018

sqlfmt formatted for the interested

@knz knz reopened this Sep 10, 2018

@knz

This comment has been minimized.

Copy link
Member

knz commented Sep 10, 2018

I think the issue should stay open until all correlated subqueries are supported, including the apply operator for those that cannot be decorrelated.

@andy-kimball Lauren and I are planning to make a writeup in the docs of the limits of decorrelation currently. Perhaps we could link to that from here.

@andy-kimball

This comment has been minimized.

Copy link
Contributor

andy-kimball commented Sep 10, 2018

The posted query has two issues that prevent decorrelation:

  1. Use of a subscript expression [0]; this is not currently supported by the cost-based optimizer and so triggers fallback to the heuristic planner.
  2. Absence of unique index assertions on the pg_catalog tables, so the optimizer cannot statically prove that correlated subqueries return <= 1 result.
@Freeaqingme

This comment has been minimized.

Copy link

Freeaqingme commented Sep 10, 2018

Thanks for clarifying.

Typically, I'd be more than happy to simply rewrite a query or two. In this case though, this query is generated by Doctrine when it tries to see what the current schema looks like when creating migration scripts (./doctrine migrations:diff)

As such, there's not an easy work-around where I can just rewrite one or two of my own queries. For the time being I'll just use mysql instead, but figured this bit of context might help in determining the appropriate priority for support of the posted query..

@petermattis petermattis removed this from the 2.1 milestone Oct 5, 2018

@knz knz added this to To do in SQL Planning via automation Oct 15, 2018

@knz knz added this to the 2.2 milestone Oct 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment