Skip to content

Add Hash output format#84607

Merged
rschu1ze merged 1 commit intoClickHouse:masterfrom
rschu1ze:hash-op
Jul 29, 2025
Merged

Add Hash output format#84607
rschu1ze merged 1 commit intoClickHouse:masterfrom
rschu1ze:hash-op

Conversation

@rschu1ze
Copy link
Copy Markdown
Member

@rschu1ze rschu1ze commented Jul 28, 2025

Resolves #77096

Implements the simplest possible approach:

  • No separate hash value calculation per row (SELECT cityHash64(*) FROM ( SELECT ...)), instead a hash value calculation for all rows
  • SipHash128 is hard-coded as hash algorithm, it is not possible to specify an alternative algorithm via setting
  • The input rows are hashed in the order they are produced. To generate a stable hash value across executions, users need to ensure the same order (e.g. ORDER BY).

This should still cover 99.9% of all use cases.

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Added Hash as a new output format. It calculates a single hash value for all columns and rows of the result. This is useful for calculating a "fingerprint" of the result, for example, in use cases where data transfer is a bottleneck. Example: SELECT arrayJoin(['abc', 'def']), 42 FORMAT Hash returns e5f9e676db098fdb9530d2059d8c23ef.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jul 28, 2025

Workflow [PR], commit [41d81c3]

Summary:

job_name test_name status info comment
Stateless tests (amd_asan, distributed plan, sequential) failure
00002_log_and_exception_messages_formatting FAIL
Stateless tests (amd_msan, parallel, 2/2) failure
03401_normal_projection_with_part_offset FAIL

@clickhouse-gh clickhouse-gh bot added the pr-feature Pull request with new product feature label Jul 28, 2025
@alexey-milovidov

This comment was marked as resolved.

@alexey-milovidov alexey-milovidov self-assigned this Jul 28, 2025
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rschu1ze
Copy link
Copy Markdown
Member Author

Stateless tests (amd_asan, distributed plan, sequential):

Stateless tests (amd_msan, parallel, 2/2)

  • 03401_normal_projection_with_part_offset timed out. Happened only twice in 2025 and I am reluctant to exclude the test from msan just because of that...

@rschu1ze rschu1ze added this pull request to the merge queue Jul 29, 2025
Merged via the queue into ClickHouse:master with commit c4f8b07 Jul 29, 2025
123 of 125 checks passed
@rschu1ze rschu1ze deleted the hash-op branch July 29, 2025 09:34
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jul 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce output FORMAT Hash

3 participants