Skip to content

[HUDI-4985] Update ARM64 Docker demo images to use latest Hudi version#13902

Merged
xushiyan merged 2 commits intoapache:masterfrom
deepakpanda93:br_dockerdemo_HUDI-4985
Sep 19, 2025
Merged

[HUDI-4985] Update ARM64 Docker demo images to use latest Hudi version#13902
xushiyan merged 2 commits intoapache:masterfrom
deepakpanda93:br_dockerdemo_HUDI-4985

Conversation

@deepakpanda93
Copy link
Collaborator

Change Logs

Update ARM64 Docker demo images to use latest Hudi version

Impact

None

Risk level (write none, low medium or high below)

none

Documentation Update

We will raise another PR to do the necessary document changes for docker demo.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Sep 16, 2025
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to https://issues.apache.org/jira/browse/HADOOP-16614 you have added libleveldbjni.so?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deepakpanda93 what about the other PR?
#13829 (comment)
i wanted to continue discussing the issue with using official hadoop image instead of maintaining a custom one.

if you're moving to another PR, please make sure they are linked. the previous review would be lost because of raising this new PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @xushiyan

For now, Hadoop 3.3.4 docker image is not available. For the Spark version we are working, the supported version is hadoop 3.3.4. This is the reason I am still on the base image approach.

The other PR was for Hudi 1.0.2 and I think according to Apache guidelines, it may not allow to push to 1.0.2 branch. So created this PR for master branch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @rangareddy

The Docker image runs on an AARCH64 platform, and the absence of a compatible libleveldbjni library was preventing the HistoryServer from starting. To resolve this, I added the appropriate libleveldbjni version that supports AARCH64.

Error:

ERROR applicationhistoryservice.ApplicationHistoryServer: Error starting ApplicationHistoryServer

java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /tmp/libleveldbjni-64-1-8976797200427282087.8: /tmp/libleveldbjni-64-1-8976797200427282087.8: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a AARCH64-bit platform)]

Copy link
Member

@xushiyan xushiyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please confirm the plan as commented, and fix the hive sync script issue. then we're good to go

HIVE_JDBC=`ls ${HIVE_HOME}/lib/hive-jdbc-*.jar | grep -v handler | tr '\n' ':'`
fi
HIVE_JARS=$HIVE_METASTORE:$HIVE_SERVICE:$HIVE_EXEC:$HIVE_JDBC
HIVE_JARS=$HIVE_METASTORE:$HIVE_SERVICE:$HIVE_EXEC:$HIVE_JDBC:${HIVE_HOME}/lib/calcite-core-1.16.0.jar:${HIVE_HOME}/lib/libfb303-0.9.3.jar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could break if jars not available? how to ensure when user runs this script, jars are there? you're fix this because of the docker setup change right, but this script is not only intended for docker demo. you'll need to figure some way to decouple this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deepakpanda93 this is the only blocking issue for this pr

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xushiyan The jars like calcite-core-1.16.0.jar and libfb303-0.9.3.jar is already available in the docker from which we added in the HiveSyncTool Classpath to make it work without dependency errors.
No manual effort is needed to fetch the required jars, they are auto-resolved by the build system.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i meant this script is not only used within docker demo. users can run it as standalone job for hive sync. Outside of docker demo, how can these 2 jars be always available? please have Aditya review PRs and give your guidance when in doubt.

ARG HADOOP_VERSION=2.8.4
ARG HIVE_VERSION=2.3.3
ARG HADOOP_VERSION=3.3.4
ARG HIVE_VERSION=3.1.3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so to confirm, this pr is the first step which makes these changes:

  • make the same docker stack as the current docker demo, make sure it works for both amd64 and arm64
  • hadoop upgrade to 3.3.4, hive upgrade to 3.1.3
  • users still need to follow the same step to run the demo apps, like first build bundle jars, then specify this yml for the docker compose to run

The next step is add notebook container into the stack to allow users to use UI to work with the stack.

Then we need to avoid building jars and simplify the hadoop and other setup to keep the demo lightweight. @deepakpanda93 @rangareddy please confirm this is the plan to go with.

Copy link
Collaborator

@rangareddy rangareddy Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

We are planning a few enhancements for the upcoming releases:

  • Update the Hudi Docker demo steps on the official Hudi website with clearer, more user-friendly instructions.
  • Add Jupyter notebooks to the Docker setup for easier experimentation and tutorials.
  • Test compatibility on both Docker amd64 and arm64 architectures to ensure broader platform support.
  • Adopt Apache’s official Docker images instead of maintaining custom-built ones.
  • Use Hudi jars from Maven Central rather than relying on locally built artifacts, ensuring consistency and reproducibility.

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Comment on lines 20 to +23
HUDI_DEMO_ENV=$1
WS_ROOT=`dirname $SCRIPT_PATH`
COMPOSE_FILE_NAME="docker-compose_hadoop284_hive233_spark353_amd64.yml"
if [ "$HUDI_DEMO_ENV" = "--mac-aarch64" ]; then
COMPOSE_FILE_NAME="docker-compose_hadoop284_hive233_spark353_arm64.yml"
if [ "$(uname -m)" = "arm64" ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deepakpanda93 in follow up pr, clean up this HUDI_DEMO_ENV as you made it unused. Also update any docs (search in branch asf-site) that shows how to use this script accordingly.

# set up root directory
WS_ROOT=`dirname $SCRIPT_PATH`
COMPOSE_FILE_NAME="docker-compose_hadoop284_hive233_spark353_amd64.yml"
if [ "$HUDI_DEMO_ENV" = "--mac-aarch64" ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as the setup demo script

@xushiyan xushiyan merged commit e3e8b73 into apache:master Sep 19, 2025
61 of 62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants