Skip to content

Support Nullable(Tuple) for Arrow, ArrowStream, ORC, legacy Parquet formats#101272

Merged
nihalzp merged 38 commits intoClickHouse:masterfrom
nihalzp:support-arrow-orc-nullable-tuple
Apr 15, 2026
Merged

Support Nullable(Tuple) for Arrow, ArrowStream, ORC, legacy Parquet formats#101272
nihalzp merged 38 commits intoClickHouse:masterfrom
nihalzp:support-arrow-orc-nullable-tuple

Conversation

@nihalzp
Copy link
Copy Markdown
Member

@nihalzp nihalzp commented Mar 30, 2026

The following queries now work.

-- Setup
SET allow_experimental_nullable_tuple_type = 1;
SET engine_file_truncate_on_insert = 1;

Arrow

INSERT INTO TABLE FUNCTION file('test.arrow', 'Arrow', 'c0 Nullable(Tuple(UInt32, String))') VALUES ((1, 'a')), (NULL), ((3, 'c'));
SELECT c0 FROM file('test.arrow', 'Arrow', 'c0 Nullable(Tuple(UInt32, String))');

Arrow

INSERT INTO TABLE FUNCTION file('test.arrowstream', 'ArrowStream', 'c0 Nullable(Tuple(UInt32, String))') VALUES ((1, 'a')), (NULL), ((3, 'c'));
SELECT c0 FROM file('test.arrowstream', 'ArrowStream', 'c0 Nullable(Tuple(UInt32, String))');

ORC

INSERT INTO TABLE FUNCTION file('test.orc', 'ORC', 'c0 Nullable(Tuple(UInt32, String))') VALUES ((1, 'a')), (NULL), ((3, 'c'));
SELECT c0 FROM file('test.orc', 'ORC', 'c0 Nullable(Tuple(UInt32, String))');

Legacy Parquet (Arrow-based reader):

INSERT INTO TABLE FUNCTION file('test.parquet', 'Parquet', 'c0 Nullable(Tuple(UInt32, String))') VALUES ((1, 'a')), (NULL), ((3, 'c'));
SELECT c0 FROM file('test.parquet', 'Parquet', 'c0 Nullable(Tuple(UInt32, String))') SETTINGS input_format_parquet_use_native_reader_v3 = 0;

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Support Nullable(Tuple) for Arrow, ArrowStream, ORC, legacy Parquet formats.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Mar 30, 2026

Workflow [PR], commit [7d950a7]

Summary:


AI Review

Summary

This PR adds support for Nullable(Tuple(...)) round-trips in Arrow, ArrowStream, ORC, and legacy Parquet, including null-map propagation for nested tuple elements and broader stateless coverage. The core implementation looks solid, but one important edge-case remains under-tested: reading with LowCardinality(Nullable(Tuple(...))) type hints. Verdict: request changes until this gap is covered.

Missing context
  • ⚠️ No CI report/log links were provided in the review request, so I could not validate runtime regressions/failures beyond code and tests in this checkout.
Findings
  • ⚠️ Majors
    • [tests/queries/0_stateless/04064_tuple_inside_nullable_arrow_orc_roundtrip.sql:1+, tests/queries/0_stateless/04065_tuple_inside_nullable_parquet_roundtrip.sql:1+] Regression coverage still misses LowCardinality(Nullable(Tuple(...))) read-hint scenarios.
    • Impact: current changes explicitly touch LowCardinality + nullable handling paths; without dedicated tests this can silently regress.
    • Suggested fix: add at least one read-hint case per affected reader path (Arrow/ArrowStream/ORC and legacy Parquet) with LowCardinality(Nullable(Tuple(...))), including rows with both struct-level NULL and inner nullable values.
Tests
  • ⚠️ Add regression tests for LowCardinality(Nullable(Tuple(...))) read hints across updated readers to lock in the behavior this PR targets.
ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Final Verdict
  • Status: ⚠️ Request changes
  • Minimum required actions:
    • Add targeted regression coverage for LowCardinality(Nullable(Tuple(...))) read-hint paths in the new format tests.

@clickhouse-gh clickhouse-gh Bot added the pr-improvement Pull request with some product improvements label Mar 30, 2026
Comment thread src/Processors/Formats/Impl/ArrowColumnToCHColumn.cpp Outdated
Comment thread src/Processors/Formats/Impl/NativeORCBlockInputFormat.cpp Outdated
@nihalzp nihalzp requested a review from Avogar March 30, 2026 21:55
Comment thread src/Processors/Formats/Impl/NativeORCBlockInputFormat.cpp Outdated
Comment thread src/DataTypes/NestedUtils.cpp Outdated
@Avogar Avogar self-assigned this Apr 2, 2026
@alexey-milovidov
Copy link
Copy Markdown
Member

The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix.

Copy link
Copy Markdown
Member

@Avogar Avogar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but the comments about LowCardinality(Nullable) from AI review sounds valid to me, let's try to fix it and add a test

Copy link
Copy Markdown
Member

@Avogar Avogar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's just fix the test 02384_nullable_low_cardinality_as_dict_in_arrow in flaky check. It cannot be run in parallel with itself as it uses file table function with constant file name. Let's use unique file name using currentDataBase. Or better to rewrite the test to a bash test with clickhouse-local to avoid keeping trash files in user_files directory

Comment thread tests/queries/0_stateless/02384_nullable_low_cardinality_as_dict_in_arrow.sh Outdated
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 15, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.00% 84.00% +0.00%
Functions 90.90% 90.90% +0.00%
Branches 76.50% 76.50% +0.00%

Changed lines: 97.87% (138/141) · Uncovered code

Full report · Diff report

@nihalzp nihalzp added this pull request to the merge queue Apr 15, 2026
Merged via the queue into ClickHouse:master with commit fc17de3 Apr 15, 2026
161 checks passed
@nihalzp nihalzp deleted the support-arrow-orc-nullable-tuple branch April 15, 2026 17:21
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants