Skip to content

Fix sparse serialization replacing -0.0 with +0.0#100983

Merged
rienath merged 12 commits into
ClickHouse:masterfrom
takumihara:fix-sparse-negative-zero-v2
Apr 25, 2026
Merged

Fix sparse serialization replacing -0.0 with +0.0#100983
rienath merged 12 commits into
ClickHouse:masterfrom
takumihara:fix-sparse-negative-zero-v2

Conversation

@takumihara
Copy link
Copy Markdown
Contributor

Changelog category

Bug Fix

Changelog entry

Fix sparse serialization losing the sign of negative zero (-0.0) for BFloat16, Float32, and Float64 columns.

What this PR does

ColumnVector::isDefaultAt used the == operator to check whether a value is the default (zero). However, IEEE 754 defines -0.0 == +0.0 as true, so -0.0 was incorrectly treated as a default value by sparse serialization. On deserialization, the default slot was filled with +0.0, losing the sign bit.

This PR changes isDefaultAt to use memcmp for floating-point types (float, double, BFloat16) so that -0.0 and +0.0 are distinguished by their bit representation.

Root cause:

// Before
bool isDefaultAt(size_t n) const override { return data[n] == T{}; }
// -0.0 == +0.0 → true (IEEE 754) → treated as default → sign lost

Fix:

// After
bool isDefaultAt(size_t n) const override
{
    if constexpr (std::is_floating_point_v<T> || std::is_same_v<T, BFloat16>)
    {
        T zero{};
        return memcmp(&data[n], &zero, sizeof(T)) == 0;
    }
    else
    {
        return data[n] == T{};
    }
}

Reproducer:

CREATE TABLE t (f Float64) Engine = MergeTree ORDER BY tuple()
    SETTINGS ratio_of_defaults_for_sparse_serialization = 0.01;
INSERT INTO t VALUES (-0.0), (1.0), (2.0);
SELECT hex(reinterpretAsFixedString(f)) FROM t ORDER BY rowNumberInAllBlocks();
-- Before: 0000000000000000 (sign lost)
-- After:  0000000000000080 (sign preserved)

Closes #98637

ColumnVector::isDefaultAt used == operator to check for default values,
but IEEE 754 defines -0.0 == +0.0 as true. This caused sparse
serialization to treat -0.0 as a default value, losing the sign bit
on deserialization.

Use memcmp for floating-point types (float, double, BFloat16) to
perform bitwise comparison, correctly distinguishing -0.0 from +0.0.

Closes ClickHouse#98637
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Mar 28, 2026

Workflow [PR], commit [26ddd57]

Summary:


AI Review

Summary

This PR fixes sparse serialization so -0.0 is no longer treated as default for BFloat16, Float32, and Float64 in ColumnVector::isDefaultAt, preserving the sign bit on round-trip. The change and new stateless coverage look correct, and I did not find blocker/major issues in the diff.

Missing context
  • ⚠️ No CI logs or test run results were available in the review context.
ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Mar 28, 2026
@tuanpach tuanpach added the can be tested Allows running workflows for external contributors label Mar 28, 2026
Comment thread src/Columns/ColumnVector.h
@tuanpach tuanpach self-assigned this Mar 28, 2026
@takumihara takumihara force-pushed the fix-sparse-negative-zero-v2 branch from 1473882 to 5406aa1 Compare March 29, 2026 06:54
Replace memcmp-based bit comparison with std::bit_cast to an
unsigned integer type of matching size. This makes the intent
clearer and avoids the need for a temporary zero variable.

Co-authored-by: tuanpach <tuanpach@users.noreply.github.com>
@takumihara takumihara force-pushed the fix-sparse-negative-zero-v2 branch from 5406aa1 to ff41e35 Compare March 29, 2026 08:25
@alexey-milovidov
Copy link
Copy Markdown
Member

The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix.

@alexey-milovidov
Copy link
Copy Markdown
Member

The failures of "Flaky check" in "functions_bad_arguments" will be fixed by #101994.

@alexey-milovidov
Copy link
Copy Markdown
Member

The MSan stress test failure (MemorySanitizer: use-of-uninitialized-value, STID 4179-5154 or 4148-3044) is a known pre-existing issue unrelated to this PR. Fix: #102158

@alexey-milovidov
Copy link
Copy Markdown
Member

The flaky check failure is fixed in #102148, let's update the branch.

Comment thread src/Columns/ColumnVector.h Outdated
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 25, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 83.80% 83.70% -0.10%
Functions 91.10% 91.10% +0.00%
Branches 76.20% 76.20% +0.00%

Changed lines: 100.00% (5/5) · Uncovered code

Full report · Diff report

@rienath rienath self-assigned this Apr 25, 2026
@rienath rienath added this pull request to the merge queue Apr 25, 2026
Merged via the queue into ClickHouse:master with commit ab88ac3 Apr 25, 2026
165 checks passed
@clickgapai
Copy link
Copy Markdown
Contributor

Hi — this PR may need backporting to 26.3 (LTS), 26.2, 26.1, 25.8 (LTS), but no backport label was found.

Affected code: ColumnVector::isDefaultAt — core code present in all supported branches.

Why: The bug is in ColumnVector::isDefaultAt which has existed since ColumnSparse was introduced in 2021 (commit 577d571, 2021-03-12), predating all supported branches. Any user with sparse serialization enabled (ratio_of_defaults_for_sparse_serialization < 1.0) and -0.0 floating-point values experiences silent data corruption (sign bit loss). The fix is a 1-function, low-risk change.

If this should be backported, consider adding pr-must-backport or a version-specific label (e.g. v26.3-must-backport). Ignore this if backporting is not applicable.

@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 25, 2026
@rienath
Copy link
Copy Markdown
Member

rienath commented Apr 26, 2026

@clickgapai it shouldn't. Please suggest this only when backporting is necessary (i.e. critical problem)

@clickgapai
Copy link
Copy Markdown
Contributor

Thank you for the feedback, @rienath. You're absolutely right.

I suggested backporting this fix to stable branches, but that was premature. This is a correctness issue with sparse serialization of floating-point values (the -0.0 vs +0.0 sign bit loss), not a critical crash or data corruption that demands immediate backport.

The fix should land in the next major release. Backporting should only happen if:

  1. Users report actual data loss from this behavior in production, OR
  2. The maintainers explicitly decide it's critical enough to warrant a patch release

I'll be more conservative about backport suggestions going forward — reserve them for genuine emergencies, not just "nice to have" correctness improvements.

Appreciate you keeping me honest on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sparse serialization replaces -0.0 with +0.0 in BFloat16/Float32/Float64 columns

6 participants