[HUDI-4985] Update ARM64 Docker demo images to use latest Hudi version#13902
[HUDI-4985] Update ARM64 Docker demo images to use latest Hudi version#13902xushiyan merged 2 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Due to https://issues.apache.org/jira/browse/HADOOP-16614 you have added libleveldbjni.so?
There was a problem hiding this comment.
@deepakpanda93 what about the other PR?
#13829 (comment)
i wanted to continue discussing the issue with using official hadoop image instead of maintaining a custom one.
if you're moving to another PR, please make sure they are linked. the previous review would be lost because of raising this new PR.
There was a problem hiding this comment.
Hello @xushiyan
For now, Hadoop 3.3.4 docker image is not available. For the Spark version we are working, the supported version is hadoop 3.3.4. This is the reason I am still on the base image approach.
The other PR was for Hudi 1.0.2 and I think according to Apache guidelines, it may not allow to push to 1.0.2 branch. So created this PR for master branch.
There was a problem hiding this comment.
Hello @rangareddy
The Docker image runs on an AARCH64 platform, and the absence of a compatible libleveldbjni library was preventing the HistoryServer from starting. To resolve this, I added the appropriate libleveldbjni version that supports AARCH64.
Error:
ERROR applicationhistoryservice.ApplicationHistoryServer: Error starting ApplicationHistoryServer
java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /tmp/libleveldbjni-64-1-8976797200427282087.8: /tmp/libleveldbjni-64-1-8976797200427282087.8: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a AARCH64-bit platform)]
xushiyan
left a comment
There was a problem hiding this comment.
Please confirm the plan as commented, and fix the hive sync script issue. then we're good to go
| HIVE_JDBC=`ls ${HIVE_HOME}/lib/hive-jdbc-*.jar | grep -v handler | tr '\n' ':'` | ||
| fi | ||
| HIVE_JARS=$HIVE_METASTORE:$HIVE_SERVICE:$HIVE_EXEC:$HIVE_JDBC | ||
| HIVE_JARS=$HIVE_METASTORE:$HIVE_SERVICE:$HIVE_EXEC:$HIVE_JDBC:${HIVE_HOME}/lib/calcite-core-1.16.0.jar:${HIVE_HOME}/lib/libfb303-0.9.3.jar |
There was a problem hiding this comment.
this could break if jars not available? how to ensure when user runs this script, jars are there? you're fix this because of the docker setup change right, but this script is not only intended for docker demo. you'll need to figure some way to decouple this
There was a problem hiding this comment.
@deepakpanda93 this is the only blocking issue for this pr
There was a problem hiding this comment.
@xushiyan The jars like calcite-core-1.16.0.jar and libfb303-0.9.3.jar is already available in the docker from which we added in the HiveSyncTool Classpath to make it work without dependency errors.
No manual effort is needed to fetch the required jars, they are auto-resolved by the build system.
There was a problem hiding this comment.
i meant this script is not only used within docker demo. users can run it as standalone job for hive sync. Outside of docker demo, how can these 2 jars be always available? please have Aditya review PRs and give your guidance when in doubt.
| ARG HADOOP_VERSION=2.8.4 | ||
| ARG HIVE_VERSION=2.3.3 | ||
| ARG HADOOP_VERSION=3.3.4 | ||
| ARG HIVE_VERSION=3.1.3 |
There was a problem hiding this comment.
so to confirm, this pr is the first step which makes these changes:
- make the same docker stack as the current docker demo, make sure it works for both amd64 and arm64
- hadoop upgrade to 3.3.4, hive upgrade to 3.1.3
- users still need to follow the same step to run the demo apps, like first build bundle jars, then specify this yml for the docker compose to run
The next step is add notebook container into the stack to allow users to use UI to work with the stack.
Then we need to avoid building jars and simplify the hadoop and other setup to keep the demo lightweight. @deepakpanda93 @rangareddy please confirm this is the plan to go with.
There was a problem hiding this comment.
Yes
We are planning a few enhancements for the upcoming releases:
- Update the Hudi Docker demo steps on the official Hudi website with clearer, more user-friendly instructions.
- Add Jupyter notebooks to the Docker setup for easier experimentation and tutorials.
- Test compatibility on both Docker amd64 and arm64 architectures to ensure broader platform support.
- Adopt Apache’s official Docker images instead of maintaining custom-built ones.
- Use Hudi jars from Maven Central rather than relying on locally built artifacts, ensuring consistency and reproducibility.
| HUDI_DEMO_ENV=$1 | ||
| WS_ROOT=`dirname $SCRIPT_PATH` | ||
| COMPOSE_FILE_NAME="docker-compose_hadoop284_hive233_spark353_amd64.yml" | ||
| if [ "$HUDI_DEMO_ENV" = "--mac-aarch64" ]; then | ||
| COMPOSE_FILE_NAME="docker-compose_hadoop284_hive233_spark353_arm64.yml" | ||
| if [ "$(uname -m)" = "arm64" ]; then |
There was a problem hiding this comment.
@deepakpanda93 in follow up pr, clean up this HUDI_DEMO_ENV as you made it unused. Also update any docs (search in branch asf-site) that shows how to use this script accordingly.
| # set up root directory | ||
| WS_ROOT=`dirname $SCRIPT_PATH` | ||
| COMPOSE_FILE_NAME="docker-compose_hadoop284_hive233_spark353_amd64.yml" | ||
| if [ "$HUDI_DEMO_ENV" = "--mac-aarch64" ]; then |
Change Logs
Update ARM64 Docker demo images to use latest Hudi version
Impact
None
Risk level (write none, low medium or high below)
none
Documentation Update
We will raise another PR to do the necessary document changes for docker demo.
Contributor's checklist