refactor SQL AST and fix relational scoping bugs#6316
Conversation
This commit refactors the SQL AST nodes to be more closely aligned with actual SQL syntax. There is now a sum type for SQL query elements so as to separate them out from ast.Op. The new ast.SQLQuery struct wraps this generic sum type and includes with/order-by/limit elements. We also reworked how schema tracking works to separate the concepts of closing a relational scope from unfurling a SQL relation back in pipe dataflow.
|
@mccanne: A SQL query based on one of the sqllogictests that's been working at tip of On tip of On this branch at commit 79f27f8: (If you've been running the sqllogictests via the target in the super Makefile and wondering why this didn't surface there, that's because this was from among the 1+ million random "fuzz"-style tests I'm currently working to turn into a nightly job since they take hours to run in Actions But I just did a one-off manual run on a beefy AWS instance since you'd sounded like your branch might need some extra testing.) |
|
@philrz this change to drop the automatic table alias for a file was deliberate since it was causing a problem with the new heuristic that you can't select a column that has the same name as the table alias for dynamic sources (so the query semantics don't change when going from no schema to schema). I'm pushing a change that puts back the automatic table alias for files that have schemas. This way, it will work for parquet and won't interfere with the dynamic heuristic. |
|
Thanks for the explanation @mccanne. Indeed, with the benefit of that change, I've verified that issue no longer occurs. With that one out of the way, I went to re-run the big set of tests again and it unfortunately uncovered something else, but it's not unique to your branch. It looks like it made its way into |
compiler/parser/parser.peg
Outdated
| SelectOp | ||
| = query:SQLQuery { | ||
| return &ast.SQLOp{ |
There was a problem hiding this comment.
Nit: Maybe rename this rule to SQLOp so it matches the AST node?
compiler/parser/parser.peg
Outdated
| orderby:OptOrderByClause | ||
| loff:OptSQLLimitOffset { | ||
| op := body.(ast.Op) | ||
| queryExpr := &ast.SQLQuery{ |
There was a problem hiding this comment.
Nit: Maybe just call this query?
| queryExpr := &ast.SQLQuery{ | |
| query := &ast.SQLQuery{ |
compiler/parser/parser.peg
Outdated
| SelectSetOperation | ||
| = first:SimpleSelect rest:(SetOp _ SimpleSelect)* { | ||
| out := first.(ast.Op) | ||
| SQLSetExpr |
There was a problem hiding this comment.
Nit: The expr suffix here feels off given how it's used elsewhere in this file but I don't have anything better to suggest.
compiler/sfmt/ast.go
Outdated
|
|
||
| default: | ||
| c.open("unknown operator: %T", p) | ||
| c.close() |
There was a problem hiding this comment.
Nit:
| default: | |
| c.open("unknown operator: %T", p) | |
| c.close() | |
| default: | |
| panic(p) |
compiler/sfmt/ast.go
Outdated
| c.write("materialized ") | ||
| } | ||
| c.open("(") | ||
| //c.first, c.head = true, true |
There was a problem hiding this comment.
| //c.first, c.head = true, true |
compiler/sfmt/ast.go
Outdated
| if query.From != nil { | ||
| c.head = true | ||
| c.op(p.From) | ||
| c.op(query.From) //XXX sb table expr |
| {v:2} | ||
| // === | ||
| {x:{y:2}} | ||
| {x:2} |
There was a problem hiding this comment.
This is surprising.
This is even more surprising:
$ go run ./cmd/super -c 'select (values {y:0,z:1})'
{"( values {y:0,z:1}\n)":0}Why does the new behavior make sense?
There was a problem hiding this comment.
I think this is the correct behavior...
; super -c "values {y:1+1}"
{y:2}
; super -c "select 1+1 as y"
{y:2}
; super -c "select (select 1+1 as y) as x"
{x:2}
; super -c "select (values {y:1+1}) as x"
{x:2}
There was a problem hiding this comment.
The subquery logic needs a bit more work. I have a branch that is waiting on merging this one...
super/compiler/semantic/expr.go
Line 1140 in fa404ad
This commit refactors the SQL AST nodes to be more closely aligned with actual SQL syntax. There is now a sum type for SQL query elements so as to separate them out from ast.Op and the new ast.SQLQuery struct wraps this generic sum type with with/order-by/limit elements.
We also reworked how schema tracking works to separate the concepts of closing a relational scope from unfurling a SQL relation back in pipe dataflow.
Fixes #6106