Replace dragonbox with zmij for float-to-string conversion#100650
Replace dragonbox with zmij for float-to-string conversion#100650Algunenano merged 11 commits intoClickHouse:masterfrom
Conversation
|
Workflow [PR], commit [e0e5d09] Summary: ❌
AI ReviewSummaryThis PR replaces ClickHouse Rules
Final VerdictStatus: ✅ Approve |
|
I like this change... ;) ping me when ready ;p |
|
Perf analysis: So I think it's genuinely worth exploring this and investigate how complex it it to adjust zmij to match our current output |
Replace the dragonbox library with zmij (https://github.com/vitaut/zmij) for converting floating-point numbers to their shortest decimal string representation. zmij is 1.5x-3x faster than dragonbox on x86-64. The output format differs from dragonbox in several ways: - Positive exponents include a '+' sign: "e+16" instead of "e16" - Two-digit exponent padding: "e-09" instead of "e-9" - Fixed/scientific threshold: [-4, 15] instead of [-6, 20] - NaN sign is preserved: "-nan" instead of "nan" These are all valid shortest representations that round-trip correctly. Test reference files need to be updated to reflect the new format. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
d2e6260 to
2d066c3
Compare
Take master's improved integer fast path (rounding for Float32 exp 25-30 and Float64 exp 54-62) and replace `jkj::dragonbox::to_chars_n` calls with `zmij::detail::write`. Remove dead `result` variable and unused error code. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix. |
|
The failures of "Flaky check" in "functions_bad_arguments" will be fixed by #101994. |
|
Before merging, label as a backward incompatible change. |
Why? The output is identical in all cases as far as I know. The expectation, unless there is a bug, is that the end user won't see any difference |
5404eda to
9b62ba2
Compare
|
This is a really good change IMO: |
Ergus
left a comment
There was a problem hiding this comment.
At the end it was simpler than what I expected
ClickHouse#100650 (comment) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LLVM Coverage Report
Changed lines: 100.00% (12/12) | lost baseline coverage: 6 line(s) · Uncovered code |
|
The |
|
The flaky check failure is fixed in #102148, let's update the branch. |
d513a38
Replace the dragonbox library with zmij for converting floating-point numbers to their shortest decimal string representation. Both are based on the Schubfach algorithm, but zmij includes several optimizations (fewer multiplications, SIMD support, fewer branches) that make it 1.5x–3x faster on x86-64:
(Benchmarked with clang -O3 -march=native, best of 5 iterations, 10M values per distribution)
zmij is by Victor Zverovich (author of
{fmt}), MIT-licensed, zero dependencies.Adjusted
The output format from zmij differs from our dragonbox version in several ways — all valid shortest representations that round-trip correctly:
e16e+16e-9e-09[-6, 20][-4, 15]nan-nanIt was adjusted in this commit
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Replace dragonbox with zmij for 1.5x-3x faster float-to-string conversion.
Documentation entry for user-facing changes