Skip to content

Add Java vectorized scalar function support (1.5)#648

Merged
staticlibs merged 1 commit intoduckdb:v1.5-variegatafrom
staticlibs:scalar_functions_15
Apr 12, 2026
Merged

Add Java vectorized scalar function support (1.5)#648
staticlibs merged 1 commit intoduckdb:v1.5-variegatafrom
staticlibs:scalar_functions_15

Conversation

@staticlibs
Copy link
Copy Markdown
Collaborator

This is a backport of the PR #630 to v1.5-variegata stable branch.

Summary

This PR adds the implementation of Java Scalar Functions (UDFs) in
duckdb-java, using a vectorized callback model for execution.

It introduces function registration, callback bridging, typed vector
read/write APIs, documentation, and test coverage for supported types.

What this PR adds

  • New public API on DuckDBConnection:
    • registerScalarFunction(String name, String[] parameterTypes, String returnType, DuckDBVectorizedScalarFunction function)
  • New callback contract and vector APIs: - DuckDBVectorizedScalarFunction - DuckDBDataChunkReader - DuckDBReadableVector - DuckDBWritableVector
  • JNI/C bridge needed to connect Java callbacks to DuckDB native scalar callback execution
  • SQL type parsing helper used by the string-based Java registration API
  • Scalar UDF documentation (UDF.MD) and README reference
  • Dedicated test suite (TestScalarFunctions) plus binding-level regression tests

Main design decisionsV

1) Prioritize Java-side logic

The design keeps most registration and type wiring logic in Java,
with JNI used only for unavoidable callback bridging
responsibilities.

2) Keep JNI additions minimal and essential

JNI is limited to:

  • native callback pointer/state installation
  • JVM thread attach/detach from DuckDB execution threads
  • callback lifecycle and error propagation
  • required helpers for logical type parsing and safe VARCHAR extraction

3) Performance-focused vector path

The UDF execution path uses dedicated typed vector classes
(DuckDBReadableVector/DuckDBWritableVector) instead of generic
JDBC row/object paths, to reduce overhead in callback hot loops:

  • primitive typed access/write APIs
  • direct output vector writes
  • explicit null-mask handling
  • reduced boxing/unboxing and object allocation

Correctness and hardening included

  • DECIMAL output validates declared precision/scale
  • VARCHAR helper validates row bounds
  • VARCHAR null rows are guarded in Java and JNI
  • Vector code uses ByteOrder.nativeOrder() consistently
  • UBIGINT read/write is endian-correct

Testing

  • Added broad scalar UDF coverage in TestScalarFunctions

This is a backport of the PR duckdb#630 to `v1.5-variegata` stable branch.

 ## Summary

  This PR adds the implementation of Java Scalar Functions (UDFs) in
  duckdb-java, using a vectorized callback model for execution.

  It introduces function registration, callback bridging, typed vector
  read/write APIs, documentation, and test coverage for supported types.

  ## What this PR adds

  - New public API on DuckDBConnection:
      - registerScalarFunction(String name, String[] parameterTypes,
      String returnType, DuckDBVectorizedScalarFunction function)
  - New callback contract and vector APIs:
      - DuckDBVectorizedScalarFunction
      - DuckDBDataChunkReader
      - DuckDBReadableVector
      - DuckDBWritableVector
  - JNI/C bridge needed to connect Java callbacks to DuckDB native
  scalar callback execution
  - SQL type parsing helper used by the string-based Java registration
  API
  - Scalar UDF documentation (UDF.MD) and README reference
  - Dedicated test suite (TestScalarFunctions) plus binding-level
  regression tests

  ## Main design decisionsV

  ### 1) Prioritize Java-side logic

  The design keeps most registration and type wiring logic in Java,
  with JNI used only for unavoidable callback bridging
  responsibilities.

  ### 2) Keep JNI additions minimal and essential

  JNI is limited to:

  - native callback pointer/state installation
  - JVM thread attach/detach from DuckDB execution threads
  - callback lifecycle and error propagation
  - required helpers for logical type parsing and safe VARCHAR
  extraction

  ### 3) Performance-focused vector path

  The UDF execution path uses dedicated typed vector classes
  (DuckDBReadableVector/DuckDBWritableVector) instead of generic
  JDBC row/object paths, to reduce overhead in callback hot loops:

  - primitive typed access/write APIs
  - direct output vector writes
  - explicit null-mask handling
  - reduced boxing/unboxing and object allocation

  ## Correctness and hardening included

  - DECIMAL output validates declared precision/scale
  - VARCHAR helper validates row bounds
  - VARCHAR null rows are guarded in Java and JNI
  - Vector code uses ByteOrder.nativeOrder() consistently
  - UBIGINT read/write is endian-correct

  ## Testing

  - Added broad scalar UDF coverage in TestScalarFunctions

Co-Authored-By: Luis Fernando Kauer <lfkauer@yahoo.com.br>
@staticlibs staticlibs merged commit 225c062 into duckdb:v1.5-variegata Apr 12, 2026
11 checks passed
@staticlibs staticlibs deleted the scalar_functions_15 branch April 12, 2026 21:17
staticlibs added a commit to staticlibs/duckdb-java that referenced this pull request Apr 12, 2026
This is a backport of the PR duckdb#648 to `v1.5-variegata` stable branch.

This PR is a continuation of duckdb#630, it adds support for writing DuckDB
table functions in Java.

Documentation is added to UDF.md.

Testing: new test added
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant