Skip to content

refactor SQL AST and fix relational scoping bugs#6316

Merged
mccanne merged 3 commits intomainfrom
sql-clause
Oct 24, 2025
Merged

refactor SQL AST and fix relational scoping bugs#6316
mccanne merged 3 commits intomainfrom
sql-clause

Conversation

@mccanne
Copy link
Collaborator

@mccanne mccanne commented Oct 23, 2025

This commit refactors the SQL AST nodes to be more closely aligned with actual SQL syntax. There is now a sum type for SQL query elements so as to separate them out from ast.Op and the new ast.SQLQuery struct wraps this generic sum type with with/order-by/limit elements.

We also reworked how schema tracking works to separate the concepts of closing a relational scope from unfurling a SQL relation back in pipe dataflow.

Fixes #6106

This commit refactors the SQL AST nodes to be more closely aligned
with actual SQL syntax.  There is now a sum type for SQL query elements
so as to separate them out from ast.Op. The new ast.SQLQuery struct
wraps this generic sum type and includes with/order-by/limit elements.

We also reworked how schema tracking works to separate the concepts
of closing a relational scope from unfurling a SQL relation back in
pipe dataflow.
@philrz
Copy link
Contributor

philrz commented Oct 23, 2025

@mccanne: A SQL query based on one of the sqllogictests that's been working at tip of main is hitting an error on this branch. Here's repro details.

On tip of main:

$ super -version
Version: 8012b6d91

$ super -f parquet -o tab0.parquet -c "
values
  {col0:89,col1:91,col2:82},
  {col0:35,col1:97,col2:1},
  {col0:24,col1:86,col2:33}" &&
super -c "SELECT col0 AS col1 FROM tab0.parquet WHERE NOT col2 IN ( tab0.col0 );"

{col1:89}
{col1:35}
{col1:24}

On this branch at commit 79f27f8:

$ super -version
Version: 79f27f890

$ super -f parquet -o tab0.parquet -c "
values
  {col0:89,col1:91,col2:82},
  {col0:35,col1:97,col2:1},
  {col0:24,col1:86,col2:33}" &&
super -c "SELECT col0 AS col1 FROM tab0.parquet WHERE NOT col2 IN ( tab0.col0 );"

column "tab0": does not exist at line 1, column 59:
SELECT col0 AS col1 FROM tab0.parquet WHERE NOT col2 IN ( tab0.col0 );
                                                          ~~~~~~~~~

(If you've been running the sqllogictests via the target in the super Makefile and wondering why this didn't surface there, that's because this was from among the 1+ million random "fuzz"-style tests I'm currently working to turn into a nightly job since they take hours to run in Actions But I just did a one-off manual run on a beefy AWS instance since you'd sounded like your branch might need some extra testing.)

@mccanne
Copy link
Collaborator Author

mccanne commented Oct 23, 2025

@philrz this change to drop the automatic table alias for a file was deliberate since it was causing a problem with the new heuristic that you can't select a column that has the same name as the table alias for dynamic sources (so the query semantics don't change when going from no schema to schema). I'm pushing a change that puts back the automatic table alias for files that have schemas. This way, it will work for parquet and won't interfere with the dynamic heuristic.

@philrz
Copy link
Contributor

philrz commented Oct 23, 2025

Thanks for the explanation @mccanne. Indeed, with the benefit of that change, I've verified that issue no longer occurs.

With that one out of the way, I went to re-run the big set of tests again and it unfortunately uncovered something else, but it's not unique to your branch. It looks like it made its way into main recently and onward to your branch. I just opened #6318 to track that and gave @mattnibs a heads up.

Comment on lines +1904 to +1906
SelectOp
= query:SQLQuery {
return &ast.SQLOp{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe rename this rule to SQLOp so it matches the AST node?

orderby:OptOrderByClause
loff:OptSQLLimitOffset {
op := body.(ast.Op)
queryExpr := &ast.SQLQuery{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe just call this query?

Suggested change
queryExpr := &ast.SQLQuery{
query := &ast.SQLQuery{

SelectSetOperation
= first:SimpleSelect rest:(SetOp _ SimpleSelect)* {
out := first.(ast.Op)
SQLSetExpr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The expr suffix here feels off given how it's used elsewhere in this file but I don't have anything better to suggest.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLBodySetOp

Comment on lines +676 to +679

default:
c.open("unknown operator: %T", p)
c.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
default:
c.open("unknown operator: %T", p)
c.close()
default:
panic(p)

c.write("materialized ")
}
c.open("(")
//c.first, c.head = true, true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//c.first, c.head = true, true

if query.From != nil {
c.head = true
c.op(p.From)
c.op(query.From) //XXX sb table expr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this comment about?

{v:2}
// ===
{x:{y:2}}
{x:2}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is surprising.

This is even more surprising:

$ go run ./cmd/super -c 'select (values {y:0,z:1})'
{"(  values {y:0,z:1}\n)":0}

Why does the new behavior make sense?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the correct behavior...

; super -c "values {y:1+1}"
{y:2}
; super -c "select 1+1 as y"
{y:2}
; super -c "select (select 1+1 as y) as x" 
{x:2}
; super -c "select (values {y:1+1}) as x"
{x:2}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subquery logic needs a bit more work. I have a branch that is waiting on merging this one...

func scalarSubqueryCheck(n ast.Node) *sem.ValuesOp {

@mccanne mccanne merged commit 97f26ff into main Oct 24, 2025
3 checks passed
@mccanne mccanne deleted the sql-clause branch October 24, 2025 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

panic: semSQLOp: SQL pipes can't have parents

3 participants