Fix tokenizer over-consuming character after ->> operator#84
Open
RagingKore wants to merge 1 commit intoTylerBrinks:mainfrom
Open
Fix tokenizer over-consuming character after ->> operator#84RagingKore wants to merge 1 commit intoTylerBrinks:mainfrom
RagingKore wants to merge 1 commit intoTylerBrinks:mainfrom
Conversation
…s#70) TokenizeLongArrow called _state.Next() after confirming the second '>' and then delegated to ConsumeForBinOp, which calls _state.Next() again. The result was that the character immediately following `->>` was silently eaten. When that character was the opening single quote of a string literal, subsequent tokenization walked past the closing quote and raised "Unterminated string literal". This only surfaced when `->>` was written without whitespace before the following token (e.g. `meta->>'x'`). The existing tests all used `meta ->> 'x'`, where the swallowed character happened to be the harmless space. The fix drops the redundant _state.Next(), aligning `->>` with the already-correct pattern used by `#>>` in TokenizeHash. Regression tests are added against the dialects that genuinely support `->>` (PostgreSQL, DuckDB, MySQL, SQLite, Redshift, Generic). `->>` is a PostgreSQL-originated extension and is not part of ANSI/ISO SQL, so dialects that use other JSON extraction mechanisms (Snowflake, BigQuery, MS SQL Server, Hive, Databricks, Oracle, ANSI) are intentionally excluded.
This was referenced Apr 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #70.
Reproduction (from the issue)
Throws:
Root cause
TokenizeLongArrowcalled_state.Next()after confirming the second>, then handed off toConsumeForBinOpwhich calls_state.Next()again. The character immediately following->>was therefore silently consumed. When it was the opening'of a string literal, subsequent tokenization walked past the closing quote and raised"Unterminated string literal".This only surfaced when
->>was written without whitespace before the following token (e.g.meta->>'description'). Every existing test usesmeta ->> 'description', where the eaten character was the harmless space — which is why it slipped through.The fix drops the redundant
_state.Next(), aligning->>with the already-correct#>>pattern inTokenizeHash.Tests
Added
LongArrowJsonExtractionTestscovering tokenization, the parser-level AST, and the exact multi-line SQL from the bug report.The new tests are deliberately scoped to the dialects that genuinely support
->>as a JSON extraction operator: PostgreSQL, DuckDB, MySQL (since 5.7.13), SQLite (since 3.38), Redshift, and Generic.->>is a PostgreSQL-originated extension and is not part of ANSI/ISO SQL. Dialects that use other JSON extraction mechanisms (Snowflake'scol:path, BigQuery'sJSON_VALUE, MS SQL'sJSON_VALUE, Hive/Databricks'get_json_object, Oracle, ANSI) are intentionally excluded.