Skip to content

fix(tokenizer): allow $ as PG identifier continuation char#43

Merged
renecannao merged 1 commit into
mainfrom
fix/pg-dollar-in-ident
May 23, 2026
Merged

fix(tokenizer): allow $ as PG identifier continuation char#43
renecannao merged 1 commit into
mainfrom
fix/pg-dollar-in-ident

Conversation

@renecannao
Copy link
Copy Markdown
Collaborator

@renecannao renecannao commented May 23, 2026

Summary

  • Tokenizer now allows $ as an identifier continuation char in PostgreSQL mode (per PG lexical-syntax docs).
  • First-character constraint preserved: $<letter> at token start still emits TK_ERROR.
  • MySQL behaviour unchanged.

Why

ProxySQL's set_parser_algorithm=3 path was breaking on inputs like SET search_path = schema$1 — the walker only saw schema and the trailing $1 fell through as a placeholder, producing pgsql-set_parameter_validation_test-t failures. The PG spec explicitly allows $ in unquoted identifiers after the first character.

Test plan

  • 4 new tests in tests/test_set.cpp covering: PG mid-ident $, multi-$ idents, PG $<word> still erroring, MySQL behaviour unchanged
  • make test — 1258 tests pass, 0 fail
  • End-to-end validated in ProxySQL: spliced libsqlparser.a into deps and ran setparser_parsersql_test-t — strict byte-exact tests for SET search_path = schema$1 and SET search_path = my$schema$2_name both pass

Summary by CodeRabbit

Release Notes

  • New Features

    • PostgreSQL identifiers can now contain $ characters (after the first character), e.g., schema$1 is recognized as a single identifier.
  • Tests

    • Added regression tests for PostgreSQL identifier parsing and MySQL compatibility verification.

Review Change Stack

Per the PostgreSQL lexical syntax docs, unquoted identifiers may contain
`$` after the first character (e.g. `schema$1` is a single identifier,
not `schema` followed by the placeholder `$1`). The tokenizer was
stopping at `$` for both dialects, which broke ProxySQL's
set_parser_algorithm=3 path for inputs like `SET search_path = schema$1`
-- the walker only saw `schema` and the trailing `$1` fell through as a
separate token.

The first-character constraint is preserved: `$<letter>` at the start of
a token still emits TK_ERROR (covers $user, $bareword, etc., which are
not valid PG tokens at that position). Numeric placeholders (`$1`) and
dollar-quoted strings (`$$...$$`) are unaffected -- their branches in
next_token_impl() run before the identifier scanner.

MySQL behaviour is unchanged: `$` still terminates an unquoted MySQL
identifier (MySQL doesn't allow `$` in identifiers without backticks).

Tests: added 4 cases in test_set.cpp covering PG mid-ident `$`,
multi-`$` idents, PG `$<word>` still erroring, and MySQL unchanged.
@renecannao renecannao merged commit 4f9fe62 into main May 23, 2026
7 of 8 checks passed
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 23, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3ae83bdd-9bb7-4450-b3e3-992df528adaa

📥 Commits

Reviewing files that changed from the base of the PR and between 9226f87 and 78c5908.

📒 Files selected for processing (2)
  • include/sql_parser/tokenizer.h
  • tests/test_set.cpp

📝 Walkthrough

Walkthrough

The tokenizer's identifier scanning loop was refactored to permit $ as a continuation character in PostgreSQL identifiers only after the first character, enabling tokens like schema$1 to be recognized as single identifiers. Regression tests for PostgreSQL SET parsing and a MySQL guardrail ensure the feature works correctly and does not affect other SQL dialects.

Changes

PostgreSQL $ identifier continuation

Layer / File(s) Summary
Tokenizer $ continuation logic
include/sql_parser/tokenizer.h
The identifier-character loop in scan_identifier_or_keyword computes an is_cont flag and extends PostgreSQL rules to accept $ as a continuation character only when it occurs after the first character, allowing multi-part identifiers like schema$1.
Regression and guardrail tests
tests/test_set.cpp
Adds PostgreSQL SET parsing regression tests for $ continuation (e.g., schema$1, my$schema$2_name) and error cases (e.g., reserved $word tokens), plus MySQL guardrail tests to verify $ continuation does not leak into MySQL parsing.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

A tokenizer hops with PostgreSQL cheer,
Allowing dollars mid-identifiers here:
schema$one now scans as a whole,
While MySQL stands guard at the goal. 🐰💰

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/pg-dollar-in-ident

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

renecannao added a commit to sysown/proxysql that referenced this pull request May 23, 2026
ParserSQL 1.0.7 (PR ProxySQL/ParserSQL#43, tag v1.0.7) allows `$` as a
PostgreSQL identifier continuation char, per the PG lexical-syntax spec.
Before this, `SET search_path = schema$1` truncated to `schema` and the
trailing `$1` fell through as a placeholder, leaving case 169 of
pgsql-set_parameter_validation_test-t failing under set_parser_algorithm=3
even after the walker-side fixes in commit b54aaa5. End-to-end
validated locally: setparser_parsersql_test-t now passes its byte-exact
regression cases for `SET search_path = schema$1` and
`SET search_path = my$schema$2_name` (292/292 total).

reg_test_4072-show-warnings-t: drop the slow-consumer sleep from 9us to
8us. The 9us margin re-flaked across three groups (legacy-g2,
legacy-g2-genai, mysql84-g2) on the prior CI run. 8us shaves ~20% off the
sleep floor (~16s vs ~20s at 10us) while keeping enough back-pressure to
still exercise the original #4072 reproducer. Doc comment updated with
the new ratio and the noted history of the 9us attempt.

SHA256 parsersql-1.0.7.tar.gz: c4029f6bf0a1774ecbcb95eb842fe6aa682a8ba36ec2badf49241d1ff61c5608
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant