feat: add StringTokenScannerSymbols for configurable multi-character delimiters (fixes #195)#1303
Open
jmoraleda wants to merge 2 commits intoHubSpot:masterfrom
Open
feat: add StringTokenScannerSymbols for configurable multi-character delimiters (fixes #195)#1303jmoraleda wants to merge 2 commits intoHubSpot:masterfrom
jmoraleda wants to merge 2 commits intoHubSpot:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Here's the revised PR description:
Title:
feat: add StringTokenScannerSymbols for configurable multi-character delimiters (fixes #195)Description:
Closes #195.
Python's Jinja2 allows full customization of the six delimiter strings via its
Environmentconstructor (block_start_string,block_end_string,variable_start_string,variable_end_string,comment_start_string,comment_end_string), plusline_statement_prefixandline_comment_prefix. Jinjava had no equivalent, making it impossible to use Jinja-style templating in contexts where{{,{%, or{#appear as literal content (e.g. LaTeX documents, some JSON schemas, or Kubernetes YAML with Helm-style markers).What this PR adds:
A new
StringTokenScannerSymbolsclass with a builder API that allows all six delimiter strings to be configured independently, with no constraint on length or shared prefix characters:Changes:
StringTokenScannerSymbols(new) — builder-configuredTokenScannerSymbolsimplementation. Uses Unicode Private Use Area sentinel characters as internal token-kind discriminators soToken.newToken()dispatches correctly without changes toToken.TokenScanner— adds a string-matching scan path (getNextTokenStringBased()) activated whensymbols.isStringBased()is true. The original char-based path is completely unchanged. Also supportslineStatementPrefixandlineCommentPrefix, matching Python Jinja2 semantics including indented prefixes.TokenScannerSymbols— addsisStringBased()(defaultfalse), six delimiter-length accessors (getTagStartLength()etc.), and two optional line-prefix accessors (getLineStatementPrefix(),getLineCommentPrefix()). All default implementations preserve existing behaviour.TagToken,ExpressionToken,NoteToken— replaced hardcoded delimiter offsets with calls to the new length accessors onsymbols. This is a correctness fix that affects allTokenScannerSymbolsimplementations, not justStringTokenScannerSymbols:ExpressionToken.parse()was callingWhitespaceUtils.unwrap(image, "{{", "}}")with literal strings regardless of the configured symbols, meaning any custom char-based subclass (like the one inCustomTokenScannerSymbolsTest) would silently fail to strip its expression delimiters. The fix usessymbols.getExpressionStart()andsymbols.getExpressionEnd()instead.Backward compatibility:
The char-based scan path and all existing
TokenScannerSymbolssubclasses are completely unaffected. The new length accessors onTokenScannerSymbolsdefault togetTheCorrespondingString().length(), which forDefaultTokenScannerSymbolsalways returns2. The full test suite passes without modification.