Skip to content

feat: add JVM UDF framework for native execution#4232

Open
andygrove wants to merge 2 commits intoapache:mainfrom
andygrove:jvm-udf-framework
Open

feat: add JVM UDF framework for native execution#4232
andygrove wants to merge 2 commits intoapache:mainfrom
andygrove:jvm-udf-framework

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented May 5, 2026

Which issue does this PR close?

Part of #4193

Rationale for this change

This PR adds the core JVM UDF framework that enables Comet to invoke JVM-side UDF implementations operating on Arrow data via JNI. This allows us to quickly implement expressions with 100% Spark compatibility without re-implementing them in native Rust code — we call existing Java/Spark code, but operate on Arrow data, avoiding an expensive transition falling back to Spark.

What changes are included in this PR?

The framework consists of:

JVM side:

  • CometUDF trait — interface that JVM UDF implementations must satisfy
  • CometUdfBridge — JNI entry point that native execution calls to invoke a UDF; handles class instantiation caching, Arrow FFI import/export, and result validation
  • CometLambdaRegistry — thread-safe registry bridging plan-time Spark expressions to execution-time UDF lookup

Native (Rust) side:

  • JvmScalarUdfExpr — DataFusion PhysicalExpr that delegates evaluation to a JVM-side CometUDF via JNI and the Arrow C Data Interface
  • CometUdfBridge JNI handle in jni-bridge — caches class/method references
  • JvmScalarUdf protobuf message — serde format for transmitting UDF invocations from plan to execution

Planner integration:

  • ExprStruct::JvmScalarUdf handling in the native planner

This is the framework only — individual expression implementations (e.g., array_exists) will be added in follow-up PRs.

How are these changes tested?

  • Rust compilation verified (cargo check passes for all affected crates)
  • End-to-end testing will come with the first expression implementation in a follow-up PR

Add a framework that allows Comet to invoke JVM-side UDF implementations
operating on Arrow data via JNI, avoiding expensive fallback to Spark while
maintaining 100% Spark compatibility for expressions not yet implemented
natively in Rust.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw @andygrove can we use this framework for regexp udfs?

@andygrove
Copy link
Copy Markdown
Member Author

Btw @andygrove can we use this framework for regexp udfs?

Yes, there is example in #4170

It is perfect for regexp because we get 100% compatibility with almost no effoert, enabled by default

@comphead
Copy link
Copy Markdown
Contributor

comphead commented May 5, 2026

I'm also wondering can we use this framework for user udfs 🤔 currently this is a huge drawback in Comet that for user defined function we fallback as there is no way to transpile custom user code to native side, can this framework be offered to the user as an alternative. depending on UDF complexity it may or may not be easy to rewrite custom user code from Spark UDF to Comet Java UDF. For example I anticipate some problems if the user works on the row level, i.e update some specific values in the row and in Arrow Java it might be more complicated but still promising

@andygrove
Copy link
Copy Markdown
Member Author

I'm also wondering can we use this framework for user udfs 🤔 currently this is a huge drawback in Comet that for user defined function we fallback as there is no way to transpile custom user code to native side, can this framework be offered to the user as an alternative. depending on UDF complexity it may or may not be easy to rewrite custom user code from Spark UDF to Comet Java UDF. For example I anticipate some problems if the user works on the row level, i.e update some specific values in the row and in Arrow Java it might be more complicated but still promising

I am already working on enable this in #4233

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants