Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions dev/release/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,3 +158,53 @@ svn ls https://dist.apache.org/repos/dist/release/datafusion | grep datafusion-j
svn delete -m "delete old DataFusion Java release" \
https://dist.apache.org/repos/dist/release/datafusion/datafusion-java-0.1.0
```

## Binary Release: Multi-Platform JAR

Source tarballs are the official Apache release artifact, but consumers
also expect a published JAR on Maven Central that bundles native libs
for the common platforms. This section covers building that JAR.

### Prerequisites (release manager machine)

- macOS host (Apple Silicon or Intel)
- Docker Desktop running with BuildKit enabled
- Java 17+
- Rust toolchain via rustup
- `gpg` configured with a key listed in the ASF KEYS file
- `xmllint` on `PATH` (pre-installed on macOS; `libxml2-utils` on Debian/Ubuntu)

### Build the multi-platform JAR

`build-release.sh` clones the repo into two Linux Docker containers
(one for `linux/amd64`, one for `linux/arm64`), builds the native
`.so` libraries inside each, then builds the two macOS `.dylib`
libraries directly on the host. All four libraries are placed in the
JAR's resource tree at
`org/apache/datafusion/<os>/<arch>/lib<datafusion_jni>.<ext>`, and the
JAR is installed into a temporary local Maven repository.

```shell
./dev/release/build-release.sh
```

The script prints the local Maven repo path at the end. Inspect the JAR
to verify all four native libraries are bundled:

```shell
unzip -l "$JAR" | grep org/apache/datafusion/
```

### Publish to Apache Nexus staging

Once the local Maven repo from `build-release.sh` looks correct, sign
and upload to Apache Nexus staging using `publish-to-maven.sh`:

```shell
./dev/release/publish-to-maven.sh -u <asf-username> -r <local-maven-repo-path>
```

The script prompts for the ASF password and GPG passphrase, creates a
staging repository on `repository.apache.org`, signs every artifact,
uploads it, and closes the staging repository. Verify in the Nexus UI
that the staged artifacts look correct before promoting to release.
175 changes: 175 additions & 0 deletions dev/release/build-release.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

# Build a multi-platform datafusion-java JAR bundling native libs for
# linux/amd64, linux/aarch64, darwin/x86_64, and darwin/aarch64. The
# resulting JAR is installed into a temporary local Maven repository
# whose path is printed at the end. This script must run on a macOS host.

set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null && pwd)"
PROJECT_HOME="$(cd "${SCRIPT_DIR}/../.." >/dev/null && pwd)"

REPO="https://github.com/apache/datafusion-java.git"
BRANCH="main"
IMGTAG="latest"

function usage {
cat <<EOF
Usage: $(basename "$0") [-r repo] [-b branch] [-t tag]

-r repo git repo URL to clone inside the linux build containers
(default: ${REPO})
-b branch git branch or tag to check out inside the containers
(default: ${BRANCH})
-t tag docker image tag for the two builder images
(default: ${IMGTAG})

Produces a multi-platform datafusion-java JAR in a temporary local Maven
repo; the repo path is printed at the end. The host must be macOS.
EOF
exit 1
}

while getopts "r:b:t:h" opt; do
case "$opt" in
r) REPO="$OPTARG" ;;
b) BRANCH="$OPTARG" ;;
t) IMGTAG="$OPTARG" ;;
h|*) usage ;;
esac
done

if [ "$(uname -s)" != "Darwin" ]; then
echo "This script must run on a macOS host (got: $(uname -s))." >&2
exit 1
fi

JAVA_VERSION=$(java -version 2>&1 | awk -F '"' '/version/ {print $2}' | awk -F '.' '{print $1}')
if [ -z "$JAVA_VERSION" ] || [ "$JAVA_VERSION" -lt 17 ]; then
echo "Java 17+ is required. Found: $(java -version 2>&1 | head -n 1)" >&2
exit 1
fi

HOST_ARCH="$(uname -m)" # arm64 (Apple Silicon) or x86_64 (Intel)

case "$HOST_ARCH" in
arm64)
HOST_DARWIN_DIR="aarch64"
OTHER_DARWIN_TARGET="x86_64-apple-darwin"
OTHER_DARWIN_DIR="x86_64"
;;
x86_64)
HOST_DARWIN_DIR="x86_64"
OTHER_DARWIN_TARGET="aarch64-apple-darwin"
OTHER_DARWIN_DIR="aarch64"
;;
*)
echo "Unsupported macOS arch: $HOST_ARCH" >&2
exit 1
;;
esac

CONTAINER_AMD64="datafusion-java-amd64-builder-container"
CONTAINER_ARM64="datafusion-java-arm64-builder-container"
IMAGE_AMD64="datafusion-java-rm-amd64:${IMGTAG}"
IMAGE_ARM64="datafusion-java-rm-arm64:${IMGTAG}"

CLEANUP=1
cleanup() {
[ "$CLEANUP" != "0" ] || return 0
echo "Cleaning up build containers..."
docker rm -f "$CONTAINER_AMD64" "$CONTAINER_ARM64" >/dev/null 2>&1 || true
CLEANUP=0
}
trap cleanup SIGINT SIGTERM EXIT

echo "Cleaning leftover builder containers from any prior interrupted run"
docker rm -f "$CONTAINER_AMD64" "$CONTAINER_ARM64" >/dev/null 2>&1 || true

echo "Cleaning previous Java and native build output"
(cd "$PROJECT_HOME" && ./mvnw -q clean)
(cd "$PROJECT_HOME/native" && cargo clean)

echo "Building amd64 builder image"
docker build --no-cache \
--platform=linux/amd64 \
-t "$IMAGE_AMD64" \
"$SCRIPT_DIR/datafusion-java-rm"

echo "Building arm64 builder image"
docker build --no-cache \
--platform=linux/arm64 \
-t "$IMAGE_ARM64" \
"$SCRIPT_DIR/datafusion-java-rm"

echo "Building linux/amd64 native lib"
docker run --name "$CONTAINER_AMD64" \
--platform=linux/amd64 \
"$IMAGE_AMD64" "$REPO" "$BRANCH"

echo "Building linux/aarch64 native lib"
docker run --name "$CONTAINER_ARM64" \
--platform=linux/arm64 \
"$IMAGE_ARM64" "$REPO" "$BRANCH"

JVM_TARGET_DIR="$PROJECT_HOME/core/target/classes/org/apache/datafusion"

mkdir -p "$JVM_TARGET_DIR/linux/amd64"
docker cp \
"$CONTAINER_AMD64:/opt/datafusion-java-rm/datafusion-java/native/target/release/libdatafusion_jni.so" \
"$JVM_TARGET_DIR/linux/amd64/"

mkdir -p "$JVM_TARGET_DIR/linux/aarch64"
docker cp \
"$CONTAINER_ARM64:/opt/datafusion-java-rm/datafusion-java/native/target/release/libdatafusion_jni.so" \
"$JVM_TARGET_DIR/linux/aarch64/"

echo "Building macOS native libs on the host (host=$HOST_ARCH)"
rustup target add "$OTHER_DARWIN_TARGET"

(cd "$PROJECT_HOME/native" && cargo build --release)
(cd "$PROJECT_HOME/native" && cargo build --release --target "$OTHER_DARWIN_TARGET")

mkdir -p "$JVM_TARGET_DIR/darwin/$HOST_DARWIN_DIR"
cp "$PROJECT_HOME/native/target/release/libdatafusion_jni.dylib" \
"$JVM_TARGET_DIR/darwin/$HOST_DARWIN_DIR/"

mkdir -p "$JVM_TARGET_DIR/darwin/$OTHER_DARWIN_DIR"
cp "$PROJECT_HOME/native/target/$OTHER_DARWIN_TARGET/release/libdatafusion_jni.dylib" \
"$JVM_TARGET_DIR/darwin/$OTHER_DARWIN_DIR/"

echo "Installing JAR into local Maven repo"
LOCAL_REPO=$(mktemp -d /tmp/datafusion-java-staging-repo-XXXXXX)
(cd "$PROJECT_HOME" && ./mvnw \
"-Dmaven.repo.local=$LOCAL_REPO" \
"-Ddatafusion.native.profile=release" \
-DskipTests install)

echo ""
echo "===================================================================="
echo "Multi-platform JAR installed to local Maven repo: $LOCAL_REPO"
JAR_PATH=$(find "$LOCAL_REPO/org/apache/datafusion/datafusion-java" -name 'datafusion-java-*.jar' \
-not -name '*-sources.jar' -not -name '*-javadoc.jar' | head -n 1)
echo "JAR: $JAR_PATH"
echo "Bundled native libraries:"
unzip -l "$JAR_PATH" | grep -E 'libdatafusion_jni\.(so|dylib)$' || true
echo "===================================================================="
71 changes: 71 additions & 0 deletions dev/release/datafusion-java-rm/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
FROM ubuntu:20.04

USER root

ENV DEBIAN_FRONTEND=noninteractive
ENV DEBCONF_NONINTERACTIVE_SEEN=true
ENV LC_ALL=C

RUN export LC_ALL=C \
&& apt-get update \
&& apt-get install --no-install-recommends -y \
ca-certificates \
build-essential \
curl \
wget \
git \
llvm \
clang \
libssl-dev \
cmake \
cpio \
libxml2-dev \
patch \
bzip2 \
libbz2-dev \
zlib1g-dev \
default-jdk \
unzip \
gcc-10 \
g++-10 \
cpp-10
ENV CC="gcc-10"
ENV CXX="g++-10"

# protoc — picks the host arch automatically by reading $(uname -m)
RUN PB_REL="https://github.com/protocolbuffers/protobuf/releases" \
&& ARCH="$(uname -m)" \
&& case "$ARCH" in \
x86_64) PB_ARCH="x86_64" ;; \
aarch64) PB_ARCH="aarch_64" ;; \
*) echo "Unsupported arch: $ARCH" >&2; exit 1 ;; \
esac \
&& curl -LO "$PB_REL/download/v30.2/protoc-30.2-linux-${PB_ARCH}.zip" \
&& unzip "protoc-30.2-linux-${PB_ARCH}.zip" -d /root/.local \
&& rm "protoc-30.2-linux-${PB_ARCH}.zip"
ENV PATH="$PATH:/root/.local/bin"

# Rust
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
ENV PATH="/root/.cargo/bin:${PATH}"

COPY build-native-libs.sh /opt/datafusion-java-rm/build-native-libs.sh
WORKDIR /opt/datafusion-java-rm

ENTRYPOINT ["/opt/datafusion-java-rm/build-native-libs.sh"]
45 changes: 45 additions & 0 deletions dev/release/datafusion-java-rm/build-native-libs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

# Build the datafusion_jni release native lib inside a Linux container.
# The container's --platform determines the target arch; we just build.

set -euo pipefail

REPO=${1:-}
BRANCH=${2:-}

if [ -z "$REPO" ] || [ -z "$BRANCH" ]; then
echo "Usage: $0 <git-repo-url> <branch-or-tag>" >&2
exit 1
fi

echo "Building datafusion_jni for $(uname -m) from ${REPO}/${BRANCH}"

rm -rf datafusion-java
git clone "$REPO" datafusion-java
cd datafusion-java
git checkout "$BRANCH"

cd native
cargo build --release

echo "Built $(pwd)/target/release/libdatafusion_jni.so"
ls -l target/release/libdatafusion_jni.so
Loading
Loading