-
Notifications
You must be signed in to change notification settings - Fork 668
Description
Context
Currently, datafusion-sqlparser-rs
strips out all inline comments (-- …
and /* … */
). There is existing support for COMMENT ON …
statements, and it would be good to have support for inline comments as well, given they often used for documenting sql. However, these comments are currently lost while parsing.
There is an open issue on lossless AST, but it has been open for about 5 years. Moreover this topic is referenced in this PR. So there is a precedent to move forward with inline comment support.
Objective/Motivation
Capturing inline comments in the AST would be helpful for:
- Documentation Generation: This is the primary purpose of this issue, to extend the existing ability to use
COMMENT ON
support, for downstream documentation generation based on the parsed AST. - Linting Coverage: Going along with 1, being able to track inline comments would be useful for checking for completeness of documentation, and especially helpful in places where documentation in the form of comments is necessary as inline comments and not
COMMENT ON
comments.
Example of proposed usage:
Inline Comment Associated with a table:
-- Table for storing values for users
CREATE TABLE IF NOT EXISTS users (
id BIGINT PRIMARY KEY
-- Stores the username as text, required value for creating a user account.
name TEXT NOT NULL
);
Expectation: The AST node for the users
table would include a comment: Option<String>
that would then be populated with the comment located on the previous line: Table for storing values for users
. Additionally the column name
would have an associated comment: Option<String>
that would contain Stores the username as text, required value for creating a user account.
. Therefore all tables and columns would carry a comment: Option<String>
derived from the trailing --
based comment preceding the field.
NOTE: For now the assumption is parsing the comments immediately preceding the line the field is on, though comments can be arranged as such as well:
CREATE TABLE IF NOT EXISTS users ( -- Table for storing values for users
id BIGINT PRIMARY KEY
name TEXT NOT NULL -- Stores the username as text, required value for creating a user account.
);
It might prove fruitful to allow parsing both types of comments, and perhaps specifying when parsing which are relevant and which are not.
Inline Comments out of Scope
-- migration script v2025-01
CREATE TABLE IF NOT EXISTS users (
id BIGINT PRIMARY KEY
-- Stores the username as text, required value for creating a user account.
name TEXT NOT NULL
);
In the above example there is a line separating the comment and the CREATE TABLE
line, so this comment should be disregarded and considered out of scope for comment: Option<String>
value.
Proposed Behavior
Add an optional parse mode: Parser::parse_sql_with_comments
or Parser::parse_sql_with_inline_comments
that retains the inline comments and associates them with nearby fields/schema objects, within the AST.
For CreateTable
and ColumnDef
add a comment: Option<String>
that will parse and store the associated comment based on the --
(and potentially multi line comment /* */
) that immediately precedes the comment, and/or possibly when the comment is on the same line as shown above.