Fix sparse serialization replacing -0.0 with +0.0#100983
Conversation
ColumnVector::isDefaultAt used == operator to check for default values, but IEEE 754 defines -0.0 == +0.0 as true. This caused sparse serialization to treat -0.0 as a default value, losing the sign bit on deserialization. Use memcmp for floating-point types (float, double, BFloat16) to perform bitwise comparison, correctly distinguishing -0.0 from +0.0. Closes ClickHouse#98637
|
Workflow [PR], commit [26ddd57] Summary: ✅ AI ReviewSummaryThis PR fixes sparse serialization so Missing context
ClickHouse Rules
Final Verdict
|
1473882 to
5406aa1
Compare
Replace memcmp-based bit comparison with std::bit_cast to an unsigned integer type of matching size. This makes the intent clearer and avoids the need for a temporary zero variable. Co-authored-by: tuanpach <tuanpach@users.noreply.github.com>
5406aa1 to
ff41e35
Compare
|
The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix. |
|
The failures of "Flaky check" in "functions_bad_arguments" will be fixed by #101994. |
|
The MSan stress test failure (MemorySanitizer: use-of-uninitialized-value, STID 4179-5154 or 4148-3044) is a known pre-existing issue unrelated to this PR. Fix: #102158 |
|
The flaky check failure is fixed in #102148, let's update the branch. |
LLVM Coverage Report
Changed lines: 100.00% (5/5) · Uncovered code |
|
Hi — this PR may need backporting to Affected code: Why: The bug is in If this should be backported, consider adding |
|
@clickgapai it shouldn't. Please suggest this only when backporting is necessary (i.e. critical problem) |
|
Thank you for the feedback, @rienath. You're absolutely right. I suggested backporting this fix to stable branches, but that was premature. This is a correctness issue with sparse serialization of floating-point values (the The fix should land in the next major release. Backporting should only happen if:
I'll be more conservative about backport suggestions going forward — reserve them for genuine emergencies, not just "nice to have" correctness improvements. Appreciate you keeping me honest on this. |
Changelog category
Bug Fix
Changelog entry
Fix sparse serialization losing the sign of negative zero (-0.0) for BFloat16, Float32, and Float64 columns.
What this PR does
ColumnVector::isDefaultAtused the==operator to check whether a value is the default (zero). However, IEEE 754 defines-0.0 == +0.0astrue, so-0.0was incorrectly treated as a default value by sparse serialization. On deserialization, the default slot was filled with+0.0, losing the sign bit.This PR changes
isDefaultAtto usememcmpfor floating-point types (float,double,BFloat16) so that-0.0and+0.0are distinguished by their bit representation.Root cause:
Fix:
Reproducer:
Closes #98637