Skip to content

Support Interval data types in Arrow Format#99519

Merged
alexey-milovidov merged 20 commits intoClickHouse:masterfrom
petern48:interval_arrow_streams
Mar 25, 2026
Merged

Support Interval data types in Arrow Format#99519
alexey-milovidov merged 20 commits intoClickHouse:masterfrom
petern48:interval_arrow_streams

Conversation

@petern48
Copy link
Copy Markdown
Contributor

@petern48 petern48 commented Mar 15, 2026

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Users can now write Clickhouse interval datatypes to the Arrow Format

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Previously, queries that try to write interval data into Arrow format would encounter the following error.

Error on processing query: Code: 50. DB::Exception: The type 'IntervalNanosecond' of a column 'CAST('3', 'IntervalNanosecond')' is not supported for conversion into Arrow data format. (UNKNOWN_TYPE) (version 26.1.3.52 (official build))
(query: SELECT 3::IntervalNanosecond FORMAT ArrowStream;)

Now, these queries work successfully.

SELECT CAST('3', 'IntervalNanosecond')
FORMAT Arrow

Closes: #97849

@petern48 petern48 marked this pull request as ready for review March 15, 2026 06:29
Comment thread tests/queries/0_stateless/04036_arrow_interval_types.sh
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Mar 16, 2026
@alexey-milovidov alexey-milovidov self-assigned this Mar 16, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Mar 16, 2026

Workflow [PR], commit [10cdbe9]

Summary:


AI Review

Summary

This PR adds Arrow support for ClickHouse Interval types in both directions: sub-second intervals (Nanosecond/Microsecond/Millisecond/Second) are exported as Arrow DURATION and imported back as interval types, while larger intervals remain INT64. The implementation and tests are coherent with that design, and I did not find correctness, safety, concurrency, or performance issues requiring changes.

Missing context
  • ⚠️ Full CI evidence is not yet available (Style check, Build (arm_tidy), Fast test, and Code Review are still in progress), so this review is based on code and tests in the diff.
ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-improvement Pull request with some product improvements label Mar 16, 2026
12::IntervalQuarter AS q,
13::IntervalYear AS y
FORMAT $fmt
" | ${CLICKHOUSE_LOCAL} -q "SELECT * FROM file('-', '$fmt')"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ This test only checks round-trip values, so it would still pass if the Arrow schema/type mapping regressed (for example, if IntervalNanosecond stopped being exported as Arrow DURATION(NANO) and became plain INT64).

Could we also assert the inferred types when reading back (e.g. via toTypeName/DESCRIBE on file('-', '$fmt')) so the test guards the new type mapping behavior, not only numeric payload preservation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a really great catch. There was existing code that made it return int64 for all interval types. Just fixed it.

@petern48 petern48 marked this pull request as draft March 17, 2026 15:17
@petern48 petern48 marked this pull request as ready for review March 18, 2026 22:38
@petern48
Copy link
Copy Markdown
Contributor Author

@alexey-milovidov I believe the remaining failures are unrelated. Can we move forward regardless, or do you need me to keep retrying until it's all green?

The test_polymorphic_parts failures are occurring in other PRs. The test failures for Integration tests (amd_llvm_coverage, 5/5) are all from timeouts, and are also occurring in other PRs.

@alexey-milovidov
Copy link
Copy Markdown
Member

Sorry, can't merge with failed checks.

@alexey-milovidov
Copy link
Copy Markdown
Member

But the PR is good and approved, so as soon as we will fix them, we can merge.

@petern48 petern48 marked this pull request as draft March 20, 2026 06:32
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Mar 24, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.10% +0.00%
Functions 24.40% 24.40% +0.00%
Branches 76.60% 76.70% +0.10%

PR changed lines: PR changed-lines coverage: 92.59% (75/81, 0 noise lines excluded)
Diff coverage report
Uncovered code

@alexey-milovidov alexey-milovidov marked this pull request as ready for review March 25, 2026 00:09
@alexey-milovidov alexey-milovidov merged commit 6d5e51f into ClickHouse:master Mar 25, 2026
152 of 153 checks passed
@alexey-milovidov
Copy link
Copy Markdown
Member

Thanks, this is a great change!

@petern48 petern48 deleted the interval_arrow_streams branch March 25, 2026 00:14
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 25, 2026
@EmeraldShift
Copy link
Copy Markdown
Contributor

EmeraldShift commented Apr 8, 2026

Any chance this can be backported to 26.3 LTS? It will be quite sad to not have this in the LTS release for a long time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Interval types in format ArrowStream

4 participants