Allow Unicode characters in `Selector`s #510

axunonb · 2025-11-06T15:15:25Z

Changes

Selectors in Placeholders may now contain most Unicode characters when ParserSettings.SelectorCharFilter = SelectorFilterType.VisualUnicodeChars.

Disallowed characters are:

Characters with special functions: {}[]()\.?,
Unicode characters are allowed in a selector, except 68 non-visual characters: Control Characters (U+0000–U+001F, U+007F), Format Characters (Category: Cf), Directional Formatting (Category: Cf), Invisible Separator, Common Combining Marks (Category: Mn), Whitespace Characters (non-glyph spacing).
Merged from feat: Filter Selector chars by allowlist or blocklilst #511:
Add class CharSet. It represents a set of characters that supports efficient storage and lookup for both ASCII and non-ASCII characters. It is used in the Parser as allow list or block list. The speed for parsing Placeholdera decreases by ~25% compared to v3.2.0 to v3.6.1.
Update Parser to use CharSet and handle the defined FilterType
Refactor ParserSettings: Re-order members, update internal properties to better align with class CharSet.

Example:

const string expected = "The Value";
var settings = new SmartSettings 
    { Parser = new ParserSettings { SelectorCharFilter = SelectorFilterType.VisualUnicodeChars } };
var smart = Smart.CreateDefaultSmartFormat(settings);
// Use the Unicode string as a selector of the placeholder
var template = "{Chinese 汉字测试}";
// Instead of the Dictionary, any other type supporting Unicode characters can be used
var result = smart.Format(template, new Dictionary<string, string> { { "Chinese 汉字测试", expected } });
Assert.That(result, Is.EqualTo(expected));

ParserSettings.SelectorCharFilter = SelectorFilterType.Alphanumeric is the default and allows alphanumeric characters plus _ and -.

Benchmark

after implementing class CharSet in Parser
Parser.ParseFormat("{SomePlaceholder1}{SomePlaceholder2}{SomePlaceholder3}{SomePlaceholder4}{SomePlaceholder5}");

Method	N	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Allocated	Alloc Ratio
v3.6.1	1000	1,065 us	231 us	12.7 us	7.48	0.02	-	406.25 KB	3.25
This PR	1000	776 us	81 us	4.5 us	5.65	0.03	-	406.25 KB	3.25

27% faster

Resolves #454

codecov · 2025-11-06T15:22:49Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98%. Comparing base (058f615) to head (ab623c1).

Additional details and impacted files

@@         Coverage Diff          @@
##           main   #510    +/-   ##
====================================
+ Coverage    97%    98%    +1%     
====================================
  Files        99    100     +1     
  Lines      3431   3558   +127     
====================================
+ Hits       3339   3484   +145     
+ Misses       92     74    -18

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Refactored internal `ParserSettings` to convert instance-level properties and methods to static or const members.

karljj1

Looks great :)

axunonb · 2025-11-07T11:57:25Z

Thanks for the review.

The PR in its present form represents a major breaking change because it fundamentally alters how format strings are parsed:
Any existing application that uses format strings with previously illegal characters (like spaces or non-ASCII Unicode) and relied on the parser to throw an error or fail parsing will now have those format strings parse successfully.

[Edit]
Resolved with #511 merged in this PR

src/SmartFormat/Core/Settings/ParserSettings.cs

Implement proposals from review: * SelectorFilterType.Alphanumeric: alphanumeric characters (upper and lower case), plus '_' and '-' * SelectorFilterType.VisualUnicodeChars: All Unicode characters are allowed in a selector, except 68 non-visual characters: Control Characters (U+0000–U+001F, U+007F), Format Characters (Category: Cf), Directional Formatting (Category: Cf), Invisible Separator, Common Combining Marks (Category: Mn), Whitespace Characters (non-glyph spacing).

imprima

Excellent

sonarqubecloud · 2025-11-11T12:18:36Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Allow Unicode characters in Selectors

a43748e

Resolves #454

axunonb requested a review from karljj1 November 6, 2025 15:15

Refactor internal ParserSettings to use static or const members

6b1dbec

Refactored internal `ParserSettings` to convert instance-level properties and methods to static or const members.

axunonb force-pushed the pr/unicode-in-selectors branch from a9ef0c5 to 6b1dbec Compare November 6, 2025 17:21

karljj1 previously approved these changes Nov 7, 2025

View reviewed changes

feat: Filter Selector chars by allowlist or blocklilst (#511)

b787b49

axunonb dismissed karljj1’s stale review via b787b49 November 9, 2025 19:43

axunonb requested a review from karljj1 November 9, 2025 19:52

imprima reviewed Nov 11, 2025

View reviewed changes

src/SmartFormat/Core/Settings/ParserSettings.cs Outdated Show resolved Hide resolved

imprima reviewed Nov 11, 2025

View reviewed changes

src/SmartFormat/Core/Settings/ParserSettings.cs Outdated Show resolved Hide resolved

axunonb requested a review from imprima November 11, 2025 09:58

karljj1 previously approved these changes Nov 11, 2025

View reviewed changes

Make NonVisualUnicodeCharacters read-only

8cb11ce

axunonb dismissed karljj1’s stale review via 8cb11ce November 11, 2025 12:15

imprima approved these changes Nov 11, 2025

View reviewed changes

axunonb merged commit 3a4ad6f into main Nov 11, 2025
3 of 5 checks passed

axunonb deleted the pr/unicode-in-selectors branch November 11, 2025 12:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Allow Unicode characters in `Selector`s #510

Allow Unicode characters in `Selector`s #510

Uh oh!

axunonb commented Nov 6, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 6, 2025 •

edited

Loading

Uh oh!

karljj1 left a comment

Uh oh!

axunonb commented Nov 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

imprima left a comment

Uh oh!

Uh oh!

sonarqubecloud bot commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Allow Unicode characters in Selectors #510

Allow Unicode characters in Selectors #510

Uh oh!

Conversation

axunonb commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Benchmark

Uh oh!

codecov bot commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

karljj1 left a comment

Choose a reason for hiding this comment

Uh oh!

axunonb commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

imprima left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sonarqubecloud bot commented Nov 11, 2025

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Allow Unicode characters in `Selector`s #510

Allow Unicode characters in `Selector`s #510

axunonb commented Nov 6, 2025 •

edited

Loading

codecov bot commented Nov 6, 2025 •

edited

Loading

axunonb commented Nov 7, 2025 •

edited

Loading