Skip to content

allowing thousand separators + fix the consumed indicator#111

Merged
CarlVerret merged 2 commits intomasterfrom
lemire/allowthousands
Apr 27, 2026
Merged

allowing thousand separators + fix the consumed indicator#111
CarlVerret merged 2 commits intomasterfrom
lemire/allowthousands

Conversation

@lemire
Copy link
Copy Markdown
Collaborator

@lemire lemire commented Apr 27, 2026

This addresses two issues. The first is a small bug in the way we report characters_consumed from TryParseNumber: in the UTF-16 path of FastFloatParser and the UTF-8 path of FastDoubleParser, we were advancing the start pointer past leading whitespace before parsing the number, then setting characters_consumed = pns.characters_consumed — which is measured from the new start, so the leading whitespace was silently dropped from the count. The two sibling paths already tracked a leading_spaces counter and did the addition; this PR brings the other two in line. While I was there I also turned the ThrowArgumentException() on empty input in FastFloatParser.TryParseNumber(byte*, ...) into a return false, since Try* should not throw.

The second change is motivated by issue #110, where FastFloatParser.ParseFloat("1,234,567.89") returns 1 while float.Parse("1,234,567.89", CultureInfo.InvariantCulture) returns 1234567.9. The reason is that the BCL’s Parse(string, IFormatProvider) overload defaults to NumberStyles.Float | NumberStyles.AllowThousands, and we did not honour AllowThousands at all. To make matching that behaviour possible without changing our defaults, this PR adds an optional thousands_separator parameter (default ',' for the char overloads and (byte)',' for the byte overloads) on every public Parse* / TryParse* entry point. When NumberStyles.AllowThousands is in styles and the thousands separator differs from the decimal separator, ParsedNumberString will accept a separator that lies strictly between two digits in the
integer part of the number; otherwise the previous behaviour is preserved. The European convention (thousands_separator: '.', decimal_separator: ',') works as you would expect. Default behaviour is unchanged: without AllowThousands, parsing "1,234" still stops at the comma and consumes one character.

The change to ParsedNumberString.ParseNumberString was deliberately small. I switched the integer-part loop from raw pointer arithmetic to a counter so that the reported digit count is correct in the presence of separators, and I extended the rare reparse path used when there are more than nineteen digits so that it skips separators while computing the truncated mantissa and exponent. The fractional, exponent, sign, and SIMD code paths are untouched.

Tests cover both fixes. There are new theories for characters_consumed correctness across the string, char*, and byte* overloads of both parsers, including a "walk through the buffer" test that would have caught the original bug. There are also tests for the new parameter: BCL-parity for typical inputs like 1,234,567.89, the European convention, no-op behaviour when AllowThousands is absent, the byte* path, and rejection of a leading separator. The 24-digit case (1,234,567,890,123,456,789,012,345) exercises the long-mantissa reparse path.

A short benchmark comparison (BenchmarkDotNet ShortRunJob, three iterations on canada/mesh/synthetic) shows no regression.

Note that this does not change the default styles for ParseFloat(string) to include AllowThousands — doing so would change the meaning of inputs that today parse to a partial result, which seemed too aggressive to do in the same change. The PR #110 test case will succeed once the caller opts in by passing NumberStyles.Float | NumberStyles.AllowThousands.

@lemire lemire requested a review from CarlVerret April 27, 2026 15:18
@lemire
Copy link
Copy Markdown
Collaborator Author

lemire commented Apr 27, 2026

@nietras Would that work?

@CarlVerret CarlVerret merged commit 08263c1 into master Apr 27, 2026
1 check passed
@nietras
Copy link
Copy Markdown
Contributor

nietras commented Apr 29, 2026

Would that work?

@lemire I would think so, will try to test in Sep soon hopefully. Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants