Skip to content

delta-kernel-rs: switch to default-engine-native-tls, keep MSan exclusion#96856

Merged
alexey-milovidov merged 23 commits into
ClickHouse:masterfrom
abonander:master
Jun 13, 2026
Merged

delta-kernel-rs: switch to default-engine-native-tls, keep MSan exclusion#96856
alexey-milovidov merged 23 commits into
ClickHouse:masterfrom
abonander:master

Conversation

@abonander

@abonander abonander commented Feb 13, 2026

Copy link
Copy Markdown
Contributor

Changelog category (leave one):

  • Build/Testing/Packaging Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

  • Build delta-kernel-rs with the default-engine-native-tls feature so that its own HTTP client (reqwest) can reuse the OpenSSL that ClickHouse already links in. This is only a partial native-tls enablement: object_store still forces reqwest/rustls-tls-native-roots, so reqwest ends up with both the native-tls and rustls backends. delta-kernel-rs remains disabled under MSan for now because ring is still compiled in via object_store (both transitively through rustls and directly for AWS request HMAC signing), and its hand-written assembly does not generate the symbols MSan requires (tracked upstream at Pluggable Crypto / Update reqwest 0.13 apache/arrow-rs-object-store#585).

@clickhouse-gh

clickhouse-gh Bot commented Feb 13, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [aa76a0b]

Summary:


AI Review

Summary

This PR changes the delta-kernel-rs build wiring from default-engine-rustls to default-engine-native-tls, keeps the MSan exclusion because ring is still present through object_store/rustls, and adds the OpenSSL include path needed by openssl-sys. I found no new correctness, packaging, or metadata issues worth an inline review comment.

Missing context / blind spots
  • ⚠️ I could not run a local build because the relevant submodules are not initialized in this checkout and cmake is unavailable. The current Praktika report fetch showed no failed jobs, but GitHub checks for aa76a0be3379849517188824c10f205b4d190703 were still in progress when reviewed; completed CI would close the remaining build-validation gap.
Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added pr-build Pull request with build/testing/packaging improvement submodule changed At least one submodule changed in this PR. labels Feb 13, 2026
@alexey-milovidov alexey-milovidov self-assigned this Feb 14, 2026
@abonander

abonander commented Feb 16, 2026

Copy link
Copy Markdown
Contributor Author

It looks like ring is still being dragged in via the object_store dependency, which uses it in one place to calculate HMACs: https://github.com/apache/arrow-rs-object-store/blob/030f29d59ec055741e5f4de55e8a004d8fa44d5b/src/util.rs#L46

We can fork it to replace that with the openssl crate as an optional feature, but it would be a breaking change to object_store because ring is buried in the cloud feature: https://docs.rs/crate/object_store/latest/features#cloud

It looks like it also depends on rustls for RSA signing in the GCP implementation, which we could also replace with openssl: https://github.com/apache/arrow-rs-object-store/blob/030f29d59ec055741e5f4de55e8a004d8fa44d5b/src/gcp/credential.rs#L153-L166

Actually, rustls-pki-types is a lower-level dependency that implements PKI parsing for rustls, so it should still be safe to depend on because it doesn't do any cryptography itself. Instead, this code directly uses ring for RSA signing as well.

@alexey-milovidov

Copy link
Copy Markdown
Member

Let's send patches to upstream to allow configuring these libraries that way.

@abonander

Copy link
Copy Markdown
Contributor Author

It turns out there is an open PR to object-store to make the crypto pluggable: apache/arrow-rs-object-store#585

And someone even provided an example of how to integrate it with OpenSSL: apache/arrow-rs-object-store#585 (comment)

This would then move the responsibility to implement it back to our fork of delta-kernel-rs, so at least we'd have some more control over it.

There's been no activity on the PR since February 5th, however. I'm going to try following up.

alexey-milovidov and others added 2 commits May 15, 2026 22:54
…ops `ring`

The switch to `default-engine-native-tls` removes `ring` from the TLS
stack but not from `object_store` itself - `object_store`'s `cloud`
feature (required by `aws`, `azure`, `gcp`, `http`) still pulls in
`ring` for HMAC. Because `ring`'s hand-written assembly does not emit
the symbols MSan expects, the MSan build still fails on link.

CI: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=96856&sha=4f2a80afef07e5b6165794a3091634294cee5f99&name_0=PR&name_1=Build%20%28amd_msan%29
PR: ClickHouse#96856

Re-add the MSan exclusion so the build passes; keep the `native-tls`
switch and the `OPENSSL_INCLUDE_DIR` / `CFLAGS` fix - those are
independently useful. Tracking upstream:
apache/arrow-rs-object-store#585

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread contrib/delta-kernel-rs-cmake/CMakeLists.txt
@alexey-milovidov alexey-milovidov changed the title enable DeltaLake engine under MSan, use native-tls feature delta-kernel-rs: switch to default-engine-native-tls, keep MSan exclusion May 17, 2026
alexey-milovidov and others added 2 commits May 19, 2026 22:54
This commit is a no-op against the public repo CI (which uses the target
branch's workflows). Keeping these files at the fork's previous state so
the merge of master can be pushed without the `workflow` OAuth scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member

@groeneai, the CH Inc sync check is failing on the following three jobs in the sync PR https://github.com/ClickHouse/clickhouse-private/pull/49493, all of which appear unrelated to this build-system-only change (the diff is purely contrib/delta-kernel-rs-cmake/CMakeLists.txt plus the vendored Rust crate update for default-engine-native-tls):

Could you investigate / link the tracking issues here? Merged origin/master into the branch in 877fb74b8e2 to refresh CI.

@groeneai

Copy link
Copy Markdown
Contributor

@alexey-milovidov all three are pre-existing flakes that are unrelated to this PR. Public CIDB confirms:

1. Build (arm_tidy) - chronic CI infrastructure flake. Public CIDB:

day        failures  distinct_prs
2026-05-19 27        17
2026-05-18 18        13
2026-05-17 9         7
2026-05-16 27        10
2026-05-15 10        9
2026-05-14 9         7
2026-05-13 18        15

118 failure rows across 59 distinct PRs over the last 7 days, 0 on master, sustained for 14+ days. Today's distribution: #105131 x18, #104816 x6, #105010 x5, #100399 x4, #105020 x3, plus 13 other unrelated PRs, none touching delta-kernel-rs or any CMake contrib. This is the chronic clang-tidy-cache.py / Build ClickHouse arm_tidy noise we see on essentially every sync PR.

2. Integration tests E2E -> test_e2e_catalogs/test.py::test_*[onelake] - private-only test (not in the OSS repo). Public CIDB has zero FAIL rows for this test. The only public hits are 34 OK rows from my merged PR #104531 (Iceberg REST pagination), which run on a public stand-in. The OneLake glue itself lives in clickhouse-private, so the triage and fix have to happen there. @scanhex12 is already iterating on this surface: #104803 (Fix onelake insert test, open) and #105302 (Fix azure init params, open). Suggest pinging the CH Inc Cloud team or @scanhex12 on the private sync run.

3. Integration tests (amd_msan, 2/6) -> test_shared_merge_tree_coordinated_merges/test.py::test_coordinated_merges_no_contention - also private-only (SharedMergeTree is Cloud). Public CIDB has zero rows for this test. The PR keeps delta-kernel-rs excluded under MSan (only flips the TLS feature flag), so this MSan test cannot be exercising delta-kernel code. Please escalate to the CH Inc SharedMergeTree team to confirm the actual failure mode in the sync.

Net: none of the three are caused by this PR. Suggest retriggering the sync or asking the CH Inc Cloud team to clear it once they confirm the two private tests are pre-existing.

Session: cron:clickhouse-ci-task-worker:20260519-231500

…msan

# Conflicts:
#	.github/workflows/pull_request.yml
Workflow files in this branch were stale or had been deleted as a
workaround for GitHub's workflow-scope check. Restoring them to match
`master` so the PR diff no longer contains those unrelated changes.
@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Jun 5, 2026
@alexey-milovidov

Copy link
Copy Markdown
Member

Merged current master into the branch (it was ~709 commits behind, merge-base 2026-06-02). The only conflict-free merge brought no .github/ changes, so the workflow-scope situation is untouched. The net PR diff vs master remains exactly the contrib/delta-kernel-rs-cmake/CMakeLists.txt change (9+/3-).

Verified the build change locally on aarch64 (ENABLE_DELTA_KERNEL_RS=ON, SANITIZE=OFF): a clean rebuild of delta_kernel_ffi with the default-engine-native-tls feature and the corrected OPENSSL_INCLUDE_DIR / CFLAGS include path compiles successfully against current master (Finished release profile, only 4 pre-existing unnecessary unsafe block warnings in the vendored FFI, no errors). The static archive grew from ~332 MB to ~382 MB, consistent with replacing the rustls stack with native-tls + statically-linked OpenSSL.

No unresolved review threads; the sole clickhouse-gh thread is resolved. Added the can be tested label so CI runs against the refreshed branch — the prior PR red was just the can_be_tested label gate, and @groeneai already confirmed the three CH Inc sync jobs are pre-existing flakes unrelated to this build-system-only change.

@clickhouse-gh

clickhouse-gh Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

📊 Cloud Performance Report

✅ AI verdict: no_change — no significant changes across 39 queries analysed

This change is build-only: it switches the delta-kernel-rs TLS backend from rustls to native-tls and fixes OpenSSL include paths in CMake. It affects only Delta Lake / object-store TLS plumbing and does not alter the server's query-execution path. The flagged improvements on ClickBench Q4 (-11.2%) and Q15 (-13.6%) cannot plausibly be caused by it, so both are downgraded to inconclusive as run-to-run variance; Q18's borderline -5.7% is likewise noise. No actionable performance change in this PR.

clickbench

⚠️ 3 inconclusive

Flagged queries (3 of 43)
Query Verdict Baseline med (ms) PR med (ms) Change q-value Hint
⚠️ 4 not_sure 268 238 -11.2% <0.0001 PR only swaps delta-kernel TLS backend in CMake; cannot touch Q4's execution path. -11% is run-to-run variance.
⚠️ 15 not_sure 250 216 -13.6% <0.0001 Build-only delta-kernel change; unrelated to this SELECT. -13.6% is run-to-run variance, not a PR effect.
⚠️ 18 not_sure 1392 1312 -5.7% <0.0001 Already not_sure; borderline -5.7% on a query untouched by a build-only change. Noise.

q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on.

tpch_adapted_1_official

🟢 No significant changes

Debug info
  • StressHouse run: 1df2c50e-9080-4552-9078-62ef75e671d5
  • MIRAI run: dc2554f5-5db3-4c4e-9688-2ef66dc0cc42
  • PR check IDs:
    • clickbench_665921_1780866475
    • clickbench_665935_1780866475
    • tpch_adapted_1_official_665946_1780866475
    • tpch_adapted_1_official_665953_1780866475

…rg::alter

PR ClickHouse#106102 (merged 2026-06-05 17:43 UTC) refactored Iceberg::alter to
declare last_version and compression_method outside an if/else that
assigns them in both branches. Clang-tidy cppcoreguidelines-init-variables
flags the bare declarations and Build (arm_tidy) is built with
-warnings-as-errors, so the build fails for every PR whose CI ran on
master after this commit.

CIDB shows 30+ unrelated PRs hitting this in the last 2 days
(e.g. ClickHouse#89360, ClickHouse#103540, ClickHouse#106120, ClickHouse#106386, ClickHouse#106404, ClickHouse#106522, ClickHouse#105102,
ClickHouse#106590).

Initialize both variables to safe defaults at declaration. They are
unconditionally overwritten in both if and else branches before use,
so behavior is unchanged. The new defaults are only relevant if a
future code path skips both assignments, in which case last_version=0
and compression_method=CompressionMethod::None are sane no-op values
(the same defaults the old structured-binding form would produce
through default-construction of the destructured aggregate).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit d72cac5)
Comment thread contrib/delta-kernel-rs-cmake/CMakeLists.txt Outdated
alexey-milovidov and others added 2 commits June 7, 2026 18:56
The previous comment claimed that switching to `default-engine-native-tls`
"removes `ring` from the TLS stack". That is misleading: `object_store` forces
`reqwest/rustls-tls-native-roots` onto `reqwest`, and Cargo unifies features, so
`reqwest` is built with both the `native-tls` and `rustls` backends. `ring`
therefore remains — both transitively via `rustls` and directly via
`object_store` (AWS request HMAC signing). Rewrite the comment to describe this
as a partial `native-tls` enablement, which is why the MSan exclusion stays.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member

Picked this up via /continue-pr:

  • Merged current master (the branch was ~309 commits behind). The merge collapsed the unrelated Iceberg::alter arm_tidy init-vars commit, which is now already on master — the PR diff is back to build-config only (contrib/delta-kernel-rs-cmake/CMakeLists.txt).
  • Addressed the remaining Bugbot thread about the Rustls/ring wiring. Verified against the vendored Cargo.lock that reqwest is built with both native-tls and rustls (the latter forced by object_store), so ring is not removed. Rewrote the MSan-exclusion comment to describe this accurately as a partial native-tls enablement, and updated the changelog entry in the description to match.

CI was fully green on the previous head; a fresh run will start on the new commit.

@clickhouse-gh

clickhouse-gh Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 0.00% 84.50% +84.50%
Functions 0.00% 92.30% +92.30%
Branches 0.00% 77.20% +77.20%

Full report

@alexey-milovidov alexey-milovidov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost noop, but still makes sense.

@alexey-milovidov alexey-milovidov added this pull request to the merge queue Jun 13, 2026
Merged via the queue into ClickHouse:master with commit 56c04d6 Jun 13, 2026
166 checks passed
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-build Pull request with build/testing/packaging improvement pr-synced-to-cloud The PR is synced to the cloud repo submodule changed At least one submodule changed in this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants