Skip to content

Publish fat JAR with platform-specific native libraries to Maven Central #33

@andygrove

Description

@andygrove

Background

datafusion-java provides a JVM binding to DataFusion via JNI. To distribute it through Maven Central, we need a packaging strategy that delivers the compiled Rust native library (.so / .dylib / .dll) alongside the Java classes so that consumers get a working artifact with a single dependency declaration — no separate native install step.

Goal

Publish a single artifact to Maven Central that works out of the box on:

  • Linux x86_64
  • Linux aarch64
  • macOS x86_64
  • macOS aarch64

Windows (x86_64) support is desirable but out of scope for the initial release. The design should leave room to add it later without restructuring.

Proposed approach: single fat JAR

Bundle all platform-specific native libraries in one published JAR, organized by OS/arch under a known resource path:

org/apache/datafusion/linux/amd64/libdatafusion_jni.so
org/apache/datafusion/linux/aarch64/libdatafusion_jni.so
org/apache/datafusion/darwin/x86_64/libdatafusion_jni.dylib
org/apache/datafusion/darwin/aarch64/libdatafusion_jni.dylib

At runtime, a loader class detects the current OS/arch, extracts the matching library from the JAR to a temp file, and calls System.load() on the absolute path. A System.loadLibrary() attempt should come first so users can override with a system-installed build.

This mirrors the approach used by Apache DataFusion Comet (referenced only as prior art for fat-JAR packaging — datafusion-java is not otherwise related to Comet or Spark). The alternative — publishing one JAR per platform with Maven classifiers — is also viable but pushes platform selection onto consumers and complicates dependency declarations.

Work items

  • Add a native loader class that detects OS/arch, extracts from the resource path, and loads via System.load(), with a System.loadLibrary() fallback. Include temp-file locking to handle concurrent JVMs.
  • Set up cross-compilation for the four target triples (Linux x86_64, Linux aarch64, macOS x86_64, macOS aarch64). Options: a CI matrix that produces per-arch artifacts, or Docker + OSXCross for cross-platform builds from a single host.
  • Wire the build so compiled libraries land at the correct target/classes/... path before mvn package runs.
  • Add a GitHub Actions release workflow: matrix builds per platform produce native libs as artifacts; a final job assembles them into the resource tree and runs mvn deploy.
  • Configure Maven Central / Sonatype publishing: staging repo, GPG signing, POM metadata.
  • Document the release process.

Future work

  • Windows x86_64 support. The loader OS enum should already account for .dll and the win32 path segment so this becomes a build-matrix change. Windows complicates temp-file cleanup (can't delete a loaded DLL) — extract to a versioned path and let the OS handle cleanup.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions