Skip to content

Limited Inline Comment Support in AST #2065

@RPG-Alex

Description

@RPG-Alex

Context

Currently, datafusion-sqlparser-rs strips out all inline comments (-- … and /* … */). There is existing support for COMMENT ON … statements, and it would be good to have support for inline comments as well, given they often used for documenting sql. However, these comments are currently lost while parsing.

There is an open issue on lossless AST, but it has been open for about 5 years. Moreover this topic is referenced in this PR. So there is a precedent to move forward with inline comment support.

Objective/Motivation

Capturing inline comments in the AST would be helpful for:

  1. Documentation Generation: This is the primary purpose of this issue, to extend the existing ability to use COMMENT ON support, for downstream documentation generation based on the parsed AST.
  2. Linting Coverage: Going along with 1, being able to track inline comments would be useful for checking for completeness of documentation, and especially helpful in places where documentation in the form of comments is necessary as inline comments and not COMMENT ON comments.

Example of proposed usage:

Inline Comment Associated with a table:
-- Table for storing values for users
CREATE TABLE IF NOT EXISTS users (
  id BIGINT PRIMARY KEY
  -- Stores the username as text, required value for creating a user account.
  name TEXT NOT NULL
);

Expectation: The AST node for the users table would include a comment: Option<String> that would then be populated with the comment located on the previous line: Table for storing values for users. Additionally the column name would have an associated comment: Option<String> that would contain Stores the username as text, required value for creating a user account.. Therefore all tables and columns would carry a comment: Option<String> derived from the trailing -- based comment preceding the field.

NOTE: For now the assumption is parsing the comments immediately preceding the line the field is on, though comments can be arranged as such as well:

CREATE TABLE IF NOT EXISTS users ( -- Table for storing values for users
  id BIGINT PRIMARY KEY
  name TEXT NOT NULL  -- Stores the username as text, required value for creating a user account.
);

It might prove fruitful to allow parsing both types of comments, and perhaps specifying when parsing which are relevant and which are not.

Inline Comments out of Scope
-- migration script v2025-01

CREATE TABLE IF NOT EXISTS users (
  id BIGINT PRIMARY KEY
  -- Stores the username as text, required value for creating a user account.
  name TEXT NOT NULL
);

In the above example there is a line separating the comment and the CREATE TABLE line, so this comment should be disregarded and considered out of scope for comment: Option<String> value.

Proposed Behavior

Add an optional parse mode: Parser::parse_sql_with_comments or Parser::parse_sql_with_inline_comments that retains the inline comments and associates them with nearby fields/schema objects, within the AST.

For CreateTable and ColumnDef add a comment: Option<String> that will parse and store the associated comment based on the -- (and potentially multi line comment /* */) that immediately precedes the comment, and/or possibly when the comment is on the same line as shown above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions