fix(tokenizer): allow $ as PG identifier continuation char#43
Conversation
Per the PostgreSQL lexical syntax docs, unquoted identifiers may contain `$` after the first character (e.g. `schema$1` is a single identifier, not `schema` followed by the placeholder `$1`). The tokenizer was stopping at `$` for both dialects, which broke ProxySQL's set_parser_algorithm=3 path for inputs like `SET search_path = schema$1` -- the walker only saw `schema` and the trailing `$1` fell through as a separate token. The first-character constraint is preserved: `$<letter>` at the start of a token still emits TK_ERROR (covers $user, $bareword, etc., which are not valid PG tokens at that position). Numeric placeholders (`$1`) and dollar-quoted strings (`$$...$$`) are unaffected -- their branches in next_token_impl() run before the identifier scanner. MySQL behaviour is unchanged: `$` still terminates an unquoted MySQL identifier (MySQL doesn't allow `$` in identifiers without backticks). Tests: added 4 cases in test_set.cpp covering PG mid-ident `$`, multi-`$` idents, PG `$<word>` still erroring, and MySQL unchanged.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe tokenizer's identifier scanning loop was refactored to permit ChangesPostgreSQL $ identifier continuation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
ParserSQL 1.0.7 (PR ProxySQL/ParserSQL#43, tag v1.0.7) allows `$` as a PostgreSQL identifier continuation char, per the PG lexical-syntax spec. Before this, `SET search_path = schema$1` truncated to `schema` and the trailing `$1` fell through as a placeholder, leaving case 169 of pgsql-set_parameter_validation_test-t failing under set_parser_algorithm=3 even after the walker-side fixes in commit b54aaa5. End-to-end validated locally: setparser_parsersql_test-t now passes its byte-exact regression cases for `SET search_path = schema$1` and `SET search_path = my$schema$2_name` (292/292 total). reg_test_4072-show-warnings-t: drop the slow-consumer sleep from 9us to 8us. The 9us margin re-flaked across three groups (legacy-g2, legacy-g2-genai, mysql84-g2) on the prior CI run. 8us shaves ~20% off the sleep floor (~16s vs ~20s at 10us) while keeping enough back-pressure to still exercise the original #4072 reproducer. Doc comment updated with the new ratio and the noted history of the 9us attempt. SHA256 parsersql-1.0.7.tar.gz: c4029f6bf0a1774ecbcb95eb842fe6aa682a8ba36ec2badf49241d1ff61c5608
Summary
$as an identifier continuation char in PostgreSQL mode (per PG lexical-syntax docs).$<letter>at token start still emitsTK_ERROR.Why
ProxySQL's
set_parser_algorithm=3path was breaking on inputs likeSET search_path = schema$1— the walker only sawschemaand the trailing$1fell through as a placeholder, producingpgsql-set_parameter_validation_test-tfailures. The PG spec explicitly allows$in unquoted identifiers after the first character.Test plan
tests/test_set.cppcovering: PG mid-ident$, multi-$idents, PG$<word>still erroring, MySQL behaviour unchangedmake test— 1258 tests pass, 0 failsetparser_parsersql_test-t— strict byte-exact tests forSET search_path = schema$1andSET search_path = my$schema$2_nameboth passSummary by CodeRabbit
Release Notes
New Features
$characters (after the first character), e.g.,schema$1is recognized as a single identifier.Tests