feat: Enable native c2r by default, add debug asserts by andygrove · Pull Request #3649 · apache/datafusion-comet

andygrove · 2026-03-08T21:17:15Z

Which issue does this PR close?

Closes #.

Rationale for this change

~I see a ~~10% speedup in TPC-H @ 1TB with this feature enabled.~~ edit: this was actually due to using native_datafusion scan. However, I see no perf regression from this PR, and it reduces GC pressure by avoiding many java allocations.

What changes are included in this PR?

Change config default
Add debug asserts before unsafe code
Update golden files

How are these changes tested?

comphead · 2026-03-09T15:37:50Z

native/core/src/execution/jni_api.rs

 ) -> jni::sys::jobject {
    try_unwrap_or_throw(&e, |mut env| {
        // Get the context
+        debug_assert!(c2r_handle != 0, "columnarToRowConvert: c2r_handle is null");


can c2r_handle be negative?

jlong is a signed 64-bit integer, so yes, c2r_handle can technically be negative since memory addresses above 2^63 appear as negative values when stored in a signed long. However, the only invalid value we need to check for is 0 (null pointer). Negative values are valid memory addresses on 64-bit systems.

Sorry for the AI response. Claude didn't ask me first 😞

comphead · 2026-03-09T15:38:12Z

native/core/src/execution/jni_api.rs

    c2r_handle: jlong,
 ) {
    try_unwrap_or_throw(&e, |_env| {
+        debug_assert!(c2r_handle != 0, "columnarToRowClose: c2r_handle is null");


comphead · 2026-03-09T15:39:55Z

native/core/src/execution/columnar_to_row.rs

                let src_start = start_idx * $elem_size;
+                debug_assert!(
+                    src_start + byte_len <= values_slice.len() * $elem_size,
+                    "bulk_copy_range: source slice out of bounds: src_start={}, byte_len={}, values_len={}",


should we output elem.size as well?

Good idea, done.

comphead · 2026-03-09T15:43:08Z

native/Cargo.toml

 inherits = "release"
 lto = false          # Skip LTO for faster linking
 codegen-units = 16   # Parallel codegen (faster compile, slightly larger binary)
+debug-assertions = true


should it be always true, even for releases?

we wouldn't want to enable the assertions for releases because it would defeat the point of having the unsafe code that skips bounds checks

Oh I see, I thought it is enabled for release and it was my concern, for ci it is fine.

andygrove · 2026-03-09T16:05:52Z

moving to draft while I run new benchmarks to confirm the benefit

andygrove · 2026-03-09T21:13:07Z

It looks like a debug assertion did fail, but no detailed were logged. I am enabling RUST_BACKTRACE next.

Enable RUST_BACKTRACE=1 in all CI workflows to get full stack traces when debug assertions or other panics occur. Also add elem_size to the bulk_copy_range debug assert message for better diagnostics.

andygrove · 2026-03-09T22:41:25Z

I am still not seeing a stack trace or reason for the failure.

2026-03-09T22:34:13.1918758Z [info] - SPARK-32069: Improve error message on reading unexpected directory (316 milliseconds)
2026-03-09T22:34:13.2165413Z 22:34:13.215 WARN org.apache.hadoop.hive.common.FileUtils: File file:/__w/datafusion-comet/datafusion-comet/apache-spark/sql/hive/target/tmp/hive_execution_test_group/warehouse-afa5d96b-8cb8-47a1-b7c5-386c60c294b3/src does not exist; Force to delete it.
2026-03-09T22:34:13.2166765Z 
2026-03-09T22:34:13.2167568Z 22:34:13.216 ERROR org.apache.hadoop.hive.common.FileUtils: Failed to delete file:/__w/datafusion-comet/datafusion-comet/apache-spark/sql/hive/target/tmp/hive_execution_test_group/warehouse-afa5d96b-8cb8-47a1-b7c5-386c60c294b3/src
2026-03-09T22:34:13.2169091Z 
2026-03-09T22:34:13.3359262Z [info] HashAggregationQueryWithControlledFallbackSuite:
2026-03-09T22:34:14.1173565Z thread caused non-unwinding panic. aborting.
2026-03-09T22:34:49.4551378Z Warning: Unable to read from client, please check on client for further details of the problem.

andygrove · 2026-03-09T22:45:20Z

Added panic = "unwind" to help with this. It looks like the same assertion is failing in main branch.

parthchandra

lgtm

andygrove · 2026-03-10T00:02:19Z

test failure is an existing issue in main branch and is fixed in #3652.

I will rebase this PR once that is merged

comphead

Thanks @andygrove the CI fails presumably because of OOM issues

comphead · 2026-03-10T00:06:01Z

test failure is an existing issue in main branch and is fixed in #3652.

I will rebase this PR once that is merged

The tests passed on CI, it says sbt.ForkMain 40713 failed with exit code 134

andygrove · 2026-03-10T00:06:56Z

test failure is an existing issue in main branch and is fixed in #3652.
I will rebase this PR once that is merged

The tests passed on CI, it says sbt.ForkMain 40713 failed with exit code 134

A debug assert is triggered because of unaligned memory access.

andygrove · 2026-03-10T00:10:59Z

test failure is an existing issue in main branch and is fixed in #3652.
I will rebase this PR once that is merged

The tests passed on CI, it says sbt.ForkMain 40713 failed with exit code 134

A debug assert is triggered because of unaligned memory access.

Actually, I am a bit confused now. The error is:

Comet native panic: panicked at core/src/execution/shuffle/spark_unsafe/row.rs:293:29:
misaligned pointer dereference: address must be a multiple of 0x8 but is 0xc1a80000c

I confirmed that the changes in #3652. do fix this, but not sure why this only shows up when enabling debug assertions 😕

andygrove · 2026-03-10T00:11:33Z

test failure is an existing issue in main branch and is fixed in #3652.
I will rebase this PR once that is merged

The tests passed on CI, it says sbt.ForkMain 40713 failed with exit code 134

A debug assert is triggered because of unaligned memory access.

Actually, I am a bit confused now. The error is:
Comet native panic: panicked at core/src/execution/shuffle/spark_unsafe/row.rs:293:29:
misaligned pointer dereference: address must be a multiple of 0x8 but is 0xc1a80000c
I confirmed that the changes in #3652. do fix this, but not sure why this only shows up when enabling debug assertions 😕

I suppose the debug assertion must be in Rust

parthchandra · 2026-03-10T00:32:02Z

The expression causing panic may be in the debug_assert. It would then show up only in debug. The issue would be missed silently!

andygrove added 3 commits March 8, 2026 15:16

Enable native c2r

bd21d70

change nodeName

529a655

am update golden files

264360e

andygrove marked this pull request as ready for review March 8, 2026 23:18

andygrove requested review from comphead, mbutrovich and parthchandra March 8, 2026 23:18

enable debug assertions in CI profile

d949cac

comphead reviewed Mar 9, 2026

View reviewed changes

andygrove marked this pull request as draft March 9, 2026 15:48

andygrove marked this pull request as ready for review March 9, 2026 16:31

feat: enable RUST_BACKTRACE in CI and add elem_size to debug assert

504ea28

Enable RUST_BACKTRACE=1 in all CI workflows to get full stack traces when debug assertions or other panics occur. Also add elem_size to the bulk_copy_range debug assert message for better diagnostics.

feat: set panic=unwind in CI profile to capture debug assertion messages

59c0214

parthchandra approved these changes Mar 9, 2026

View reviewed changes

comphead approved these changes Mar 10, 2026

View reviewed changes

Merge remote-tracking branch 'apache/main' into enable-native-c2r

44fe20c

andygrove merged commit 98b4484 into apache:main Mar 10, 2026
139 of 159 checks passed

Conversation

andygrove commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andygrove commented Mar 9, 2026

Uh oh!

andygrove commented Mar 9, 2026

Uh oh!

andygrove commented Mar 9, 2026

Uh oh!

andygrove commented Mar 9, 2026

Uh oh!

parthchandra left a comment

Choose a reason for hiding this comment

Uh oh!

andygrove commented Mar 10, 2026

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

comphead commented Mar 10, 2026

Uh oh!

andygrove commented Mar 10, 2026

Uh oh!

andygrove commented Mar 10, 2026

Uh oh!

andygrove commented Mar 10, 2026

Uh oh!

parthchandra commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andygrove commented Mar 8, 2026 •

edited

Loading