Skip to content

Consistent AST encoding of parentheses to enable correct spans for function arguments, subqueries etc. #1676

@graup

Description

@graup

First off, I'm relatively new to both this codebase and Rust in general. But I thought it might be useful to start this topic anyway.

Epic: #1548

There are a bunch of places where parentheses can appear in SQL. Currently these are not handled consistently in the AST. This makes it difficult or impossible to attach Spans to all of them. @Nyrox who did much of the work introducing Spanned mentioned using AttachedToken to encode opening and closing parenthesis tokens in the AST. I think it would be good to come up with a standard way that can be applied to all of those nodes. Also it seems in many instances this might be a breaking change.

Attaching parentheses tokens to the AST is important for accurate spans: The span for the Function node CALL(something) should go all the way from C to ), not just CALL or even worse, CALL(something. Currently there's no way to encode this information.

On the other hand, parentheses are not semantically meaningful for the AST, so I don't want to needlessly complicate the structure.

One thing that's a bit vague to me is if the parentheses should be attached to the 'thing' itself or its container? E.g. there is Function -> FunctionArguments -> FunctionArgumentsList, to which of those do the parens belong? Similarly, Expr::InSubquery has a subquery field; do the parens belong to the subquery or to the InSubquery? (Reading Fmt::Display for Expr::InSubquery would imply the parens belong to it since it renders them and not the subquery which is just an Expr – but the subquery field could be a new Subquery type, which then would be able to render its own parens... 🤔)

To make matters even more complicated, some of these parentheses are optional (see FunctionArguments and also the somewhat related OneOrManyWithParens).

Here are some nodes that need to know about their parentheses (I might have missed some; to be systematic I'd probably go through every mention of LParen/RParen in the parser).

in ast/mod.rs

  • Expr::Subquery
  • Expr::InSubquery
  • Expr::InUnnest
  • Expr::Exists
  • Expr::AllOp.right
  • FunctionArguments
  • FunctionArgumentList
  • Statement::Execute.has_parentheses

in ast/query.rs

  • Cte.closing_paren_token

Here's an initial idea: instead of adding opening_paren_token and closing_paren_token everywhere, how about creating a new struct ParensPair { opening: AttachedToken, closing: AttachedToken }? Or might we augment OneOrManyWithParens for this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions