-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Describe the bug
Latest release (between 49.0.2 and 50.0.0) seems to introduce a regression in SQL parser that causes it to be less tolerant of column names that collide with SQL keywords like limit and offset.
To Reproduce
Using datafusion-cli:
> create or replace table x as (
select 1 as offset
);
> select * from x;
+--------+
| offset |
+--------+
| 1 |
+--------+
-- NOTE: Disambiguates correctly
> select offset from x;
+--------+
| offset |
+--------+
| 1 |
+--------+
-- NOTE: Adding a literal in the front breaks disambiguation
> select 'a' as a, offset from x;
🤔 Invalid statement: SQL error: ParserError("Expected: end of statement, found: x at Line: 1, Column: 30")
-- NOTE: Adding a literal in the back - does not!
> select offset, 'a' as a from x;
+--------+---+
| offset | a |
+--------+---+
| 1 | a |
+--------+---+
1 row(s) fetched.
-- NOTE: Quoting helps, but is unfortunate
> select 'a' as a, "offset" from x;
+---+--------+
| a | offset |
+---+--------+
| a | 1 |
+---+--------+
1 row(s) fetched. In Datafusion 49.0.2 disambiguation works fine in all cases, so it's a regression.
Same behaviour reproduces for other keywords like limit.
Expected behavior
SQL and its many dialects are rich with keywords, so context-aware parsing really helps to keep queries free of excessive quoting.
It so happens that offset column is present in hundreds of our datasets, so if this behavior change is highly impactful and I hope can be reversed.
Additional context
The cause might be in sqlparser crate (between versions 0.55.0 and 0.58.0). Will attempt to isolate the issue there and create a linked ticket.