Skip to content

Replace dragonbox with zmij for float-to-string conversion#100650

Merged
Algunenano merged 11 commits intoClickHouse:masterfrom
Algunenano:zmij-float-to-string
Apr 10, 2026
Merged

Replace dragonbox with zmij for float-to-string conversion#100650
Algunenano merged 11 commits intoClickHouse:masterfrom
Algunenano:zmij-float-to-string

Conversation

@Algunenano
Copy link
Copy Markdown
Member

@Algunenano Algunenano commented Mar 24, 2026

Replace the dragonbox library with zmij for converting floating-point numbers to their shortest decimal string representation. Both are based on the Schubfach algorithm, but zmij includes several optimizations (fewer multiplications, SIMD support, fewer branches) that make it 1.5x–3x faster on x86-64:

Test                                        Dragonbox         zmij    Speedup
Float64 uniform [0, 1e6)                      26.23 ns       9.07 ns      2.89x
Float64 small [0, 1)                          25.60 ns      11.17 ns      2.29x
Float64 scientific [1e-100, 1e100)            20.17 ns      13.77 ns      1.46x
Float64 integers (as double)                  22.19 ns       8.94 ns      2.48x
Float32 uniform [0, 1e6)                      18.78 ns      10.37 ns      1.81x

(Benchmarked with clang -O3 -march=native, best of 5 iterations, 10M values per distribution)

zmij is by Victor Zverovich (author of {fmt}), MIT-licensed, zero dependencies.

Adjusted

The output format from zmij differs from our dragonbox version in several ways — all valid shortest representations that round-trip correctly:

Difference dragonbox zmij
Positive exponent sign e16 e+16
Exponent zero-padding e-9 e-09
Fixed/scientific threshold [-6, 20] [-4, 15]
NaN sign nan -nan

It was adjusted in this commit

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Replace dragonbox with zmij for 1.5x-3x faster float-to-string conversion.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@Algunenano Algunenano added the ci-performance performance only label Mar 24, 2026
@Algunenano Algunenano self-assigned this Mar 24, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Mar 24, 2026

Workflow [PR], commit [e0e5d09]

Summary:

job_name test_name status info comment
Stateless tests (amd_msan, flaky check) failure
02835_drop_user_during_session FAIL cidb IGNORED
02835_drop_user_during_session FAIL cidb IGNORED
02835_drop_user_during_session FAIL cidb IGNORED
02835_drop_user_during_session FAIL cidb IGNORED
02835_drop_user_during_session FAIL cidb IGNORED
Stress test (arm_release) failure
Server died FAIL cidb, issue ISSUE EXISTS
Logical error: Can't adjust last granule because it has A rows, but try to subtract B rows (num_read_rows = C, total_rows_per_granule = D, rows_per_granule = [E], debug: max_rows=F, rows_from_read=G, rows_from_finalize_loop=H, rows_from_finalize_post=I, ranges_processed=J, skipped_marks=K, use_query_condition_cache=L, can_read_incomplete_granules=M) (STID: 4089-428f) FAIL cidb IGNORED

AI Review

Summary

This PR replaces dragonbox with zmij for float-to-string conversion, updates build wiring (submodule, CMake, fast-test submodule list), removes dragonbox-specific test/example code, and adds a dedicated stateless formatting test. I did not find additional correctness, safety, performance, or ClickHouse-rule violations in the current PR head beyond issues already raised earlier in inline discussion.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added pr-performance Pull request with some performance improvements submodule changed At least one submodule changed in this PR. labels Mar 24, 2026
@Ergus
Copy link
Copy Markdown
Member

Ergus commented Mar 24, 2026

I like this change... ;) ping me when ready ;p

@Algunenano
Copy link
Copy Markdown
Member Author

Perf analysis:

● Summary

  AMD (x86)

  ┌───────────┬───────────┬──────────────────────────────┬────────────────────────────────────────────────────────┐
  │ Magnitude │ Direction │             Test             │                        Verdict                         │
  ├───────────┼───────────┼──────────────────────────────┼────────────────────────────────────────────────────────┤
  │ 2.18x     │ faster    │ float_formatting #26         │ New — improvement (0 slower/faster in 209 master runs) │
  ├───────────┼───────────┼──────────────────────────────┼────────────────────────────────────────────────────────┤
  │ 1.94x     │ faster    │ float_formatting #30         │ New — improvement (0/225)                              │
  ├───────────┼───────────┼──────────────────────────────┼────────────────────────────────────────────────────────┤
  │ 1.89x     │ faster    │ float_formatting #23         │ New — improvement (0/229)                              │
  ├───────────┼───────────┼──────────────────────────────┼────────────────────────────────────────────────────────┤
  │ 1.59x     │ faster    │ float_formatting #28         │ New — improvement (0/228)                              │
  ├───────────┼───────────┼──────────────────────────────┼────────────────────────────────────────────────────────┤
  │ 1.48x     │ faster    │ apply_patch_parts_join #6    │ Flaky (9 slower + 109 unstable / 714)                  │
  ├───────────┼───────────┼──────────────────────────────┼────────────────────────────────────────────────────────┤
  │ 1.48x     │ faster    │ apply_patch_parts_join #4    │ Flaky (9 slower + 104 unstable / 714)                  │
  ├───────────┼───────────┼──────────────────────────────┼────────────────────────────────────────────────────────┤
  │ 1.45x     │ faster    │ float_formatting #10         │ New — improvement (0/237)                              │
  ├───────────┼───────────┼──────────────────────────────┼────────────────────────────────────────────────────────┤
  │ 1.38x     │ faster    │ float_parsing #2             │ New — improvement (0/713)                              │
  ├───────────┼───────────┼──────────────────────────────┼────────────────────────────────────────────────────────┤
  │ 1.28x     │ faster    │ float_parsing #4             │ New — improvement (0/713)                              │
  ├───────────┼───────────┼──────────────────────────────┼────────────────────────────────────────────────────────┤
  │ 1.28x     │ faster    │ float_parsing #3             │ New — improvement (0/713)                              │
  ├───────────┼───────────┼──────────────────────────────┼────────────────────────────────────────────────────────┤
  │ 1.81x     │ slower    │ uniq #49                     │ Rarely on master (1/106)                               │
  ├───────────┼───────────┼──────────────────────────────┼────────────────────────────────────────────────────────┤
  │ 1.43x     │ slower    │ reading_from_file #0         │ Flaky (13 slower + 351 unstable / 713)                 │
  ├───────────┼───────────┼──────────────────────────────┼────────────────────────────────────────────────────────┤
  │ 1.41x     │ slower    │ array_fill #2                │ Flaky (3 slower + 17 unstable / 713)                   │
  ├───────────┼───────────┼──────────────────────────────┼────────────────────────────────────────────────────────┤
  │ 1.21x     │ slower    │ destroy_aggregate_states #1  │ Flaky (8 slower + 10 unstable / 713)                   │
  ├───────────┼───────────┼──────────────────────────────┼────────────────────────────────────────────────────────┤
  │ 1.20x     │ slower    │ formats_columns_sampling #9  │ Rarely on master (2/651)                               │
  ├───────────┼───────────┼──────────────────────────────┼────────────────────────────────────────────────────────┤
  │ 1.19x     │ faster    │ hashed_dictionary_sharded #3 │ Flaky? (0 slower but unrelated to PR)                  │
  ├───────────┼───────────┼──────────────────────────────┼────────────────────────────────────────────────────────┤
  │ 1.17x     │ slower    │ prefetch_in_aggregation #10  │ Rarely on master (0/509) — investigate                 │
  ├───────────┼───────────┼──────────────────────────────┼────────────────────────────────────────────────────────┤
  │ 1.14x     │ slower    │ set_index #1                 │ Rarely on master (0/715) — investigate                 │
  ├───────────┼───────────┼──────────────────────────────┼────────────────────────────────────────────────────────┤
  │ 1.14x     │ faster    │ function_tokens #0           │ Flaky (14 slower / 715)                                │
  ├───────────┼───────────┼──────────────────────────────┼────────────────────────────────────────────────────────┤
  │ 1.11x     │ slower    │ formats_columns_nullable #0  │ Too few runs (0/76)                                    │
  └───────────┴───────────┴──────────────────────────────┴────────────────────────────────────────────────────────┘

  ARM (Graviton)

  ┌───────────┬───────────┬──────────────────────────────────────┬──────────────────────────────────────────────────┐
  │ Magnitude │ Direction │                 Test                 │                     Verdict                      │
  ├───────────┼───────────┼──────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ 2.34x     │ faster    │ float_formatting #24                 │ New — improvement (3 slower / 465, never faster) │
  ├───────────┼───────────┼──────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ 2.24x     │ faster    │ destroy_aggregate_states #0          │ Flaky (28 slower + 485 unstable / 1420)          │
  ├───────────┼───────────┼──────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ 2.23x     │ faster    │ float_formatting #23                 │ New — improvement (5/473)                        │
  ├───────────┼───────────┼──────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ 1.98x     │ faster    │ float_formatting #8                  │ New — improvement (3/454)                        │
  ├───────────┼───────────┼──────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ 1.79x     │ faster    │ float_formatting #6                  │ New — improvement (2/454)                        │
  ├───────────┼───────────┼──────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ 1.74x     │ faster    │ float_formatting #29                 │ New — improvement (2/453)                        │
  ├───────────┼───────────┼──────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ 1.60x     │ faster    │ float_formatting #27                 │ New — improvement (1/446)                        │
  ├───────────┼───────────┼──────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ 1.45x     │ faster    │ float_formatting #9                  │ New — improvement (2/465)                        │
  ├───────────┼───────────┼──────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ 1.36x     │ faster    │ set_index #3                         │ Flaky (163 slower / 1419)                        │
  ├───────────┼───────────┼──────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ 1.26x     │ faster    │ float_parsing #2                     │ New — improvement (0/1419)                       │
  ├───────────┼───────────┼──────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ 1.24x     │ faster    │ float_parsing #4                     │ New — improvement (0/1419)                       │
  ├───────────┼───────────┼──────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ 1.23x     │ faster    │ float_parsing #3                     │ New — improvement (0/1419)                       │
  ├───────────┼───────────┼──────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ 1.24x     │ slower    │ destroy_aggregate_states #1          │ Flaky (15 slower + 46 unstable / 1420)           │
  ├───────────┼───────────┼──────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ 1.18x     │ faster    │ vectorize_aggregation_combinators #0 │ Flaky (99 slower / 1419)                         │
  ├───────────┼───────────┼──────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ 1.12x     │ slower    │ ascii #10                            │ Flaky (10/1007)                                  │
  ├───────────┼───────────┼──────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ 1.11x     │ slower    │ window_functions_downstream          │ Flaky (7 slower + 16 unstable / 1420)            │
  ├───────────┼───────────┼──────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ 1.11x     │ slower    │ function_tokens #0                   │ Flaky (30/1419)                                  │
  └───────────┴───────────┴──────────────────────────────────────┴──────────────────────────────────────────────────┘

  Verdict

  Genuine improvements: 8 on AMD, 10 on ARM. All float_formatting and float_parsing speedups are real — never seen as faster on master. The
   float_formatting improvements are up to 2.3x on ARM and 2.2x on AMD, directly from replacing dragonbox with zmij.

  Genuine regressions: 0 confirmed. The uniq #49 (1.81x slower on AMD) has only 106 master runs with 1 slower — borderline.
  prefetch_in_aggregation #10 and set_index #1 are at 1.14-1.17x with 0 history — small enough to be noise but worth monitoring.

  Dismissed as flaky: 10 entries across both architectures.

So I think it's genuinely worth exploring this and investigate how complex it it to adjust zmij to match our current output

@Algunenano
Copy link
Copy Markdown
Member Author

@Ergus If you like this kind of changes you might also like #100649 and #100671

Algunenano and others added 2 commits April 1, 2026 15:22
Replace the dragonbox library with zmij (https://github.com/vitaut/zmij)
for converting floating-point numbers to their shortest decimal string
representation. zmij is 1.5x-3x faster than dragonbox on x86-64.

The output format differs from dragonbox in several ways:
- Positive exponents include a '+' sign: "e+16" instead of "e16"
- Two-digit exponent padding: "e-09" instead of "e-9"
- Fixed/scientific threshold: [-4, 15] instead of [-6, 20]
- NaN sign is preserved: "-nan" instead of "nan"

These are all valid shortest representations that round-trip correctly.
Test reference files need to be updated to reflect the new format.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Algunenano Algunenano force-pushed the zmij-float-to-string branch from d2e6260 to 2d066c3 Compare April 1, 2026 14:13
@Algunenano Algunenano removed the ci-performance performance only label Apr 1, 2026
Algunenano and others added 3 commits April 1, 2026 16:24
Take master's improved integer fast path (rounding for Float32 exp 25-30
and Float64 exp 54-62) and replace `jkj::dragonbox::to_chars_n` calls
with `zmij::detail::write`. Remove dead `result` variable and unused
error code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Algunenano
Copy link
Copy Markdown
Member Author

● Here are the results:           
                                                                                                                                           
  AMD (x86-64)                                                                                                                             
   
  ┌───────┬────────┬─────────────────────────────┬───────────────────────────┬────────────────────┐                                        
  │  Mag  │  Dir   │            Test             │ Master slower/total (30d) │      Verdict       │
  ├───────┼────────┼─────────────────────────────┼───────────────────────────┼────────────────────┤                                        
  │ 1.93x │ faster │ float_formatting #24        │ 0/189                     │ New — improvement  │
  ├───────┼────────┼─────────────────────────────┼───────────────────────────┼────────────────────┤
  │ 1.88x │ faster │ float_formatting #23        │ 0/220                     │ New — improvement  │                                        
  ├───────┼────────┼─────────────────────────────┼───────────────────────────┼────────────────────┤                                        
  │ 1.62x │ slower │ apply_patch_parts_join #5   │ 17/652 + 104 unstable     │ Flaky              │                                        
  ├───────┼────────┼─────────────────────────────┼───────────────────────────┼────────────────────┤                                        
  │ 1.30x │ faster │ float_formatting #8         │ 0/203                     │ New — improvement  │
  ├───────┼────────┼─────────────────────────────┼───────────────────────────┼────────────────────┤                                        
  │ 1.29x │ faster │ float_parsing #3            │ 0/652                     │ New — improvement  │
  ├───────┼────────┼─────────────────────────────┼───────────────────────────┼────────────────────┤                                        
  │ 1.28x │ faster │ float_parsing #4            │ 0/652                     │ New — improvement  │
  ├───────┼────────┼─────────────────────────────┼───────────────────────────┼────────────────────┤                                        
  │ 1.25x │ faster │ float_parsing #2            │ 0/652                     │ New — improvement  │
  ├───────┼────────┼─────────────────────────────┼───────────────────────────┼────────────────────┤                                        
  │ 1.25x │ faster │ polymorphic_parts_s #0      │ 0/653                     │ New — improvement  │
  ├───────┼────────┼─────────────────────────────┼───────────────────────────┼────────────────────┤                                        
  │ 1.21x │ faster │ apply_patch_parts_join #7   │ 14/652 + 104 unstable     │ Flaky (but faster) │
  ├───────┼────────┼─────────────────────────────┼───────────────────────────┼────────────────────┤                                        
  │ 1.20x │ faster │ index_bulk_filtering #0     │ 6/652 + 15 unstable       │ Flaky (but faster) │
  ├───────┼────────┼─────────────────────────────┼───────────────────────────┼────────────────────┤                                        
  │ 1.19x │ faster │ complex_array_creation #0   │ 2/652                     │ Rarely on master   │
  ├───────┼────────┼─────────────────────────────┼───────────────────────────┼────────────────────┤                                        
  │ 1.18x │ faster │ formats_columns_sampling #9 │ 2/590                     │ Rarely on master   │
  ├───────┼────────┼─────────────────────────────┼───────────────────────────┼────────────────────┤                                        
  │ 1.16x │ faster │ complex_array_creation #1   │ 0/652                     │ New — improvement  │
  ├───────┼────────┼─────────────────────────────┼───────────────────────────┼────────────────────┤                                        
  │ 1.14x │ faster │ function_tokens #0          │ 17/653                    │ Flaky (but faster) │
  ├───────┼────────┼─────────────────────────────┼───────────────────────────┼────────────────────┤                                        
  │ 1.12x │ faster │ float_formatting #28        │ 0/216                     │ New — improvement  │
  ├───────┼────────┼─────────────────────────────┼───────────────────────────┼────────────────────┤                                        
  │ 1.11x │ slower │ ascii #3                    │ 29/446                    │ Flaky              │
  └───────┴────────┴─────────────────────────────┴───────────────────────────┴────────────────────┘                                        
                               
  ARM (aarch64)                                                                                                                            
                               
  ┌───────┬────────┬───────────────────────────────────────────────┬───────────────────────────┬────────────────────┐
  │  Mag  │  Dir   │                     Test                      │ Master slower/total (30d) │      Verdict       │
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤
  │ 2.83x │ slower │ sort_patterns #1                              │ 0/1302 + 117 unstable     │ Unstable           │
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤
  │ 2.19x │ slower │ destroy_aggregate_states #0                   │ 16/1305 + 433 unstable    │ Flaky/Unstable     │                      
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤                      
  │ 2.11x │ faster │ float_formatting #23                          │ 3/427                     │ New — improvement  │                      
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤                      
  │ 1.84x │ faster │ float_formatting #12                          │ 2/408                     │ New — improvement  │
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤                      
  │ 1.44x │ slower │ apply_patch_parts_join #5                     │ 21/1304 + 199 unstable    │ Flaky              │
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤                      
  │ 1.44x │ slower │ set_index #1                                  │ 159/1302                  │ Flaky              │
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤                      
  │ 1.37x │ faster │ reading_from_file #0                          │ 23/1305 + 565 unstable    │ Flaky/Unstable     │
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤                      
  │ 1.32x │ faster │ float_formatting #29                          │ 1/437                     │ New — improvement  │
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤                      
  │ 1.30x │ slower │ set_index #3                                  │ 292/1302                  │ Flaky              │
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤                      
  │ 1.25x │ slower │ apply_patch_parts_join #7                     │ 18/1304 + 199 unstable    │ Flaky              │
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤                      
  │ 1.24x │ slower │ window_functions_downstream_multithreading #4 │ 25/1303                   │ Flaky              │
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤                      
  │ 1.23x │ faster │ float_parsing #4                              │ 0/1303                    │ New — improvement  │
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤                      
  │ 1.23x │ faster │ join_convert_outer_to_inner #3                │ 19/1303                   │ Flaky (but faster) │
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤                      
  │ 1.22x │ faster │ float_parsing #3                              │ 0/1303                    │ New — improvement  │
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤                      
  │ 1.22x │ faster │ float_parsing #2                              │ 0/1303                    │ New — improvement  │
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤                      
  │ 1.19x │ faster │ storage_join_direct_join #1                   │ 24/1303                   │ Flaky (but faster) │
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤                      
  │ 1.18x │ faster │ join_convert_outer_to_inner #1                │ 20/1303                   │ Flaky (but faster) │
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤                      
  │ 1.16x │ faster │ small_block_contention #0                     │ 14/1302 + 69 unstable     │ Flaky              │
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤                      
  │ 1.16x │ slower │ ascii #0                                      │ 107/950                   │ Flaky              │
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤                      
  │ 1.13x │ faster │ re2_regex_caching #5                          │ 12/1302 + 69 unstable     │ Flaky (but faster) │
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤                      
  │ 1.13x │ faster │ re2_regex_caching #3                          │ 13/1302 + 72 unstable     │ Flaky (but faster) │
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤                      
  │ 1.12x │ faster │ function_tokens #0                            │ 31/1303                   │ Flaky (but faster) │
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤                      
  │ 1.11x │ slower │ window_functions_downstream_multithreading #0 │ 7/1303                    │ Flaky              │
  ├───────┼────────┼───────────────────────────────────────────────┼───────────────────────────┼────────────────────┤                      
  │ 1.10x │ slower │ insert_values_with_expressions #1             │ 87/1303                   │ Flaky              │
  └───────┴────────┴───────────────────────────────────────────────┴───────────────────────────┴────────────────────┘                      
                               
  Summary                                                                                                                                  
   
  Genuine regressions: 0 on both architectures. Every "slower" entry is a known flaky test on master.                                      
                               
  Genuine improvements:                                                                                                                    
  - AMD: 9 new improvements — float_formatting #24 (1.93x), #23 (1.88x), #8 (1.30x), #28 (1.12x); float_parsing #2-4 (1.25-1.29x);
  complex_array_creation #1 (1.16x); polymorphic_parts_s #0 (1.25x)                                                                        
  - ARM: 6 new improvements — float_formatting #23 (2.11x), #12 (1.84x), #29 (1.32x); float_parsing #2-4 (1.22-1.23x)
                                                                                                                                           
  The float_parsing speedups are an indirect effect — shorter zmij output means less work for the parser when round-tripping. The          
  polymorphic_parts_s and complex_array_creation improvements are likely from the same cause (faster toString in INSERT paths).            
                                                                                                                                           
✻ Churned for 1m 3s                                                                                                                        
                               
❯ > The                                                                                                                                    
  polymorphic_parts_s and complex_array_creation improvements are likely from the same cause (faster toString in INSERT paths). <<< Please 
 check the queries, they don't use toString                                                                                                
                                                                                                                                           
● You're right, let me look at those queries:                                                                                              
                                                                                                                                         
  - polymorphic_parts_s #0: INSERT INTO hits_wide(UserID) SELECT rand() FROM numbers(50)                                                   
  - complex_array_creation #1: SELECT count() FROM zeros(10000000) WHERE NOT ignore([[], [zero]])                                          
                                                                                                                                           
  Neither involves float formatting at all. These are 0 slower on master out of 652-653 runs, which makes them look like real changes, but 
  they have nothing to do with our PR. They must be improvements from other code that landed in master between the base commit and the test
   run. Since our PR is rebased on an older master, the "new" binary includes recent master changes too.

@Algunenano Algunenano requested a review from Ergus April 2, 2026 10:43
@Algunenano Algunenano assigned Ergus and unassigned Algunenano Apr 2, 2026
@Algunenano Algunenano marked this pull request as ready for review April 2, 2026 10:43
@alexey-milovidov
Copy link
Copy Markdown
Member

The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix.

Comment thread tests/queries/0_stateless/04075_float_to_string_format.sql Outdated
@Algunenano Algunenano requested review from Ergus and removed request for Ergus April 7, 2026 18:25
@alexey-milovidov
Copy link
Copy Markdown
Member

The failures of "Flaky check" in "functions_bad_arguments" will be fixed by #101994.

Comment thread tests/queries/0_stateless/04075_float_to_string_format.sql Outdated
@alexey-milovidov
Copy link
Copy Markdown
Member

Before merging, label as a backward incompatible change.

@Algunenano
Copy link
Copy Markdown
Member Author

Before merging, label as a backward incompatible change.

Why? The output is identical in all cases as far as I know. The expectation, unless there is a bug, is that the end user won't see any difference

@Algunenano Algunenano force-pushed the zmij-float-to-string branch from 5404eda to 9b62ba2 Compare April 8, 2026 10:55
@Algunenano
Copy link
Copy Markdown
Member Author

This is a really good change IMO:

  ┌──────────────────────┬─────────────────────────────────────┬────────────┬────────┬───────┬───────┐                                                                                                                                                                                   
  │         Test         │                Query                │ Arch/Shard │ Change │  Old  │  New  │
  ├──────────────────────┼─────────────────────────────────────┼────────────┼────────┼───────┼───────┤                                                                                                                                                                                   
  │ float_formatting #24 │ toString(0xFFFFFFFF / rand64())     │ amd/6      │ -47.9% │ 414ms │ 216ms │                                                                                                                                                                                   
  ├──────────────────────┼─────────────────────────────────────┼────────────┼────────┼───────┼───────┤                                                                                                                                                                                   
  │ float_formatting #30 │ toString(reinterpretAsFloat64(...)) │ amd/6      │ -46.4% │ 486ms │ 261ms │                                                                                                                                                                                   
  ├──────────────────────┼─────────────────────────────────────┼────────────┼────────┼───────┼───────┤                                                                                                                                                                                   
  │ float_formatting #29 │ toString(reinterpretAsFloat64(...)) │ amd/6      │ -29.1% │ 484ms │ 343ms │                                                                                                                                                                                   
  ├──────────────────────┼─────────────────────────────────────┼────────────┼────────┼───────┼───────┤                                                                                                                                                                                   
  │ float_formatting #11 │ toString(number / 7)                │ amd/6      │ -32.3% │ 291ms │ 197ms │
  ├──────────────────────┼─────────────────────────────────────┼────────────┼────────┼───────┼───────┤                                                                                                                                                                                   
  │ float_formatting #10 │ toString(number / 3)                │ amd/6      │ -27.9% │ 239ms │ 172ms │
  ├──────────────────────┼─────────────────────────────────────┼────────────┼────────┼───────┼───────┤                                                                                                                                                                                   
  │ float_formatting #8  │ toString(toFloat64(number))         │ amd/6      │ -22.7% │ 280ms │ 216ms │
  ├──────────────────────┼─────────────────────────────────────┼────────────┼────────┼───────┼───────┤                                                                                                                                                                                   
  │ float_parsing #3     │ toFloat64(toString(rand()...))      │ amd/3      │ -22.4% │ 101ms │ 79ms  │
  ├──────────────────────┼─────────────────────────────────────┼────────────┼────────┼───────┼───────┤                                                                                                                                                                                   
  │ float_parsing #4     │ toFloat64(toString(0xFFFF...))      │ amd/3      │ -22.4% │ 101ms │ 79ms  │
  ├──────────────────────┼─────────────────────────────────────┼────────────┼────────┼───────┼───────┤                                                                                                                                                                                   
  │ float_parsing #2     │ toFloat64(toString(1/rand()...))    │ amd/3      │ -20.7% │ 83ms  │ 66ms  │
  ├──────────────────────┼─────────────────────────────────────┼────────────┼────────┼───────┼───────┤                                                                                                                                                                                   
  │ float_parsing #4     │ toFloat64(toString(0xFFFF...))      │ arm/3      │ -18.8% │ 135ms │ 109ms │
  ├──────────────────────┼─────────────────────────────────────┼────────────┼────────┼───────┼───────┤                                                                                                                                                                                   
  │ float_parsing #2     │ toFloat64(toString(1/rand()...))    │ arm/3      │ -18.2% │ 117ms │ 95ms  │
  ├──────────────────────┼─────────────────────────────────────┼────────────┼────────┼───────┼───────┤                                                                                                                                                                                   
  │ float_parsing #3     │ toFloat64(toString(rand()...))      │ arm/3      │ -17.9% │ 140ms │ 115ms │
  └──────────────────────┴─────────────────────────────────────┴────────────┴────────┴───────┴───────┘                                                                                                                                                                                   
                               
  All improvements, no real regressions. The biggest wins are on amd64 float formatting (~1.5-1.9x), with ~1.2x gains on arm for the float_parsing queries (which include a toString step that benefits from zmij).

Copy link
Copy Markdown
Member

@Ergus Ergus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the end it was simpler than what I expected

Comment thread tests/queries/0_stateless/04075_float_to_string_format.sql
ClickHouse#100650 (comment)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread src/CMakeLists.txt
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 9, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.00% 84.00% +0.00%
Functions 90.90% 90.90% +0.00%
Branches 76.50% 76.50% +0.00%

Changed lines: 100.00% (12/12) | lost baseline coverage: 6 line(s) · Uncovered code

Full report · Diff report

@alexey-milovidov
Copy link
Copy Markdown
Member

The Can't adjust last granule error in CI is a known issue. The fix is in #101641

@alexey-milovidov
Copy link
Copy Markdown
Member

The flaky check failure is fixed in #102148, let's update the branch.

@Algunenano Algunenano added this pull request to the merge queue Apr 10, 2026
Merged via the queue into ClickHouse:master with commit d513a38 Apr 10, 2026
160 of 163 checks passed
@Algunenano Algunenano deleted the zmij-float-to-string branch April 10, 2026 11:12
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-performance Pull request with some performance improvements pr-synced-to-cloud The PR is synced to the cloud repo submodule changed At least one submodule changed in this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants