Background
datafusion-java provides a JVM binding to DataFusion via JNI. To distribute it through Maven Central, we need a packaging strategy that delivers the compiled Rust native library (.so / .dylib / .dll) alongside the Java classes so that consumers get a working artifact with a single dependency declaration — no separate native install step.
Goal
Publish a single artifact to Maven Central that works out of the box on:
- Linux x86_64
- Linux aarch64
- macOS x86_64
- macOS aarch64
Windows (x86_64) support is desirable but out of scope for the initial release. The design should leave room to add it later without restructuring.
Proposed approach: single fat JAR
Bundle all platform-specific native libraries in one published JAR, organized by OS/arch under a known resource path:
org/apache/datafusion/linux/amd64/libdatafusion_jni.so
org/apache/datafusion/linux/aarch64/libdatafusion_jni.so
org/apache/datafusion/darwin/x86_64/libdatafusion_jni.dylib
org/apache/datafusion/darwin/aarch64/libdatafusion_jni.dylib
At runtime, a loader class detects the current OS/arch, extracts the matching library from the JAR to a temp file, and calls System.load() on the absolute path. A System.loadLibrary() attempt should come first so users can override with a system-installed build.
This mirrors the approach used by Apache DataFusion Comet (referenced only as prior art for fat-JAR packaging — datafusion-java is not otherwise related to Comet or Spark). The alternative — publishing one JAR per platform with Maven classifiers — is also viable but pushes platform selection onto consumers and complicates dependency declarations.
Work items
Future work
Background
datafusion-javaprovides a JVM binding to DataFusion via JNI. To distribute it through Maven Central, we need a packaging strategy that delivers the compiled Rust native library (.so/.dylib/.dll) alongside the Java classes so that consumers get a working artifact with a single dependency declaration — no separate native install step.Goal
Publish a single artifact to Maven Central that works out of the box on:
Windows (x86_64) support is desirable but out of scope for the initial release. The design should leave room to add it later without restructuring.
Proposed approach: single fat JAR
Bundle all platform-specific native libraries in one published JAR, organized by OS/arch under a known resource path:
At runtime, a loader class detects the current OS/arch, extracts the matching library from the JAR to a temp file, and calls
System.load()on the absolute path. ASystem.loadLibrary()attempt should come first so users can override with a system-installed build.This mirrors the approach used by Apache DataFusion Comet (referenced only as prior art for fat-JAR packaging —
datafusion-javais not otherwise related to Comet or Spark). The alternative — publishing one JAR per platform with Maven classifiers — is also viable but pushes platform selection onto consumers and complicates dependency declarations.Work items
System.load(), with aSystem.loadLibrary()fallback. Include temp-file locking to handle concurrent JVMs.target/classes/...path beforemvn packageruns.mvn deploy.Future work
.dlland thewin32path segment so this becomes a build-matrix change. Windows complicates temp-file cleanup (can't delete a loaded DLL) — extract to a versioned path and let the OS handle cleanup.