Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-packages] update local dev build process #4640

Merged
merged 1 commit into from Jul 16, 2019

Conversation

Projects
None yet
3 participants
@thesuperzapper
Copy link
Contributor

commented Jul 4, 2019

This PR changes the dev (non-ci) build process for xgboost4j to use a locally defined Dockerfile. The main purpose of this it to upgrade to gcc 5.X in line with PR #4538, as the old container can no longer build XGBoost.

@thesuperzapper

This comment has been minimized.

Copy link
Contributor Author

commented Jul 4, 2019

@CodingCat, this is that thing we discussed about updating the Dockerfile for local builds, thoughts?

@CodingCat

This comment has been minimized.

Copy link
Member

commented Jul 7, 2019

I didn't understand why we need such changes to enable new gcc, isn't it just require an update of docker image?

@thesuperzapper

This comment has been minimized.

Copy link
Contributor Author

commented Jul 7, 2019

Its better if we have the docker image in the codebase so people can see what image they are using to build.

The image I have used here is pretty much the same one we use in CI for JVM.

@CodingCat

This comment has been minimized.

Copy link
Member

commented Jul 7, 2019

Its better if we have the docker image in the codebase so people can see what image they are using to build.

The image I have used here is pretty much the same one we use in CI for JVM.

  1. I do not think we should rebuild the image by installing everything for every time, there are many many times I am using docker to run unit test in linux (for the different behavior of linux/mac)...it will waste a lot of time when we are in the dev cycle

  2. even we install everything, we should not duplicate docker files but rather using a bash script to refer to the one used by CI

@thesuperzapper

This comment has been minimized.

Copy link
Contributor Author

commented Jul 7, 2019

@CodingCat are you referring to the MVN download, or the building/tagging of the Dockerfile?

  • The new build.sh script asks if you want to rebuild the docker file.
  • In terms of pointing at the CI one, there are a few changes, like that we don't use ENTRYPOINT, or gosu. (And isn't it better if this is slightly separate so changes to the CI image don't break what we are doing?)
  • I agree that we should look into stopping the MVN download each time, we can probably do this with a volume mount.
@CodingCat

This comment has been minimized.

Copy link
Member

commented Jul 7, 2019

@CodingCat are you referring to the MVN download, or the building/tagging of the Dockerfile?

  • The new build.sh script asks if you want to rebuild the docker file.
  • In terms of pointing at the CI one, there are a few changes, like that we don't use ENTRYPOINT, or gosu. (And isn't it better if this is slightly separate so changes to the CI image don't break what we are doing?)
  • I agree that we should look into stopping the MVN download each time, we can probably do this with a volume mount.

I am talking about yum install part

@thesuperzapper

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

@CodingCat with this PR, if you run ./dev/build.sh it will ask you, "do you want to rebuild/tag the Docker image", if you say "no" it will just use a locally cached version (if available).

However, I agree that we should stop MVN trying to download the jars every time. (I will look into mounting the users home folder)

@CodingCat

This comment has been minimized.

Copy link
Member

commented Jul 8, 2019

what's the glibc version used in this PR?

@thesuperzapper

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

As it's based off centos6 it uses ldd (GNU libc) 2.12

Show resolved Hide resolved jvm-packages/dev/change_version.sh Outdated
Show resolved Hide resolved jvm-packages/dev/change_version.sh Outdated
Show resolved Hide resolved jvm-packages/dev/package.sh Outdated
Show resolved Hide resolved jvm-packages/dev/build.sh Outdated
Show resolved Hide resolved jvm-packages/dev/Dockerfile
Show resolved Hide resolved jvm-packages/dev/build.sh Outdated
@thesuperzapper

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

@CodingCat I have made some changes:

  • The maven cache is now mounted inside the ./dev folder, meaning it will not re-download the jars every time. (This folder has been added to .gitignore)
  • I have renamed the files you asked.
  • I have re-added the licence comments to change_version.sh
  • I have removed the prompt to rebuild the docker image, (It will now just build if the Dockerfile changes, or has never been build locally before)
  • I have stopped build-linux.sh creating a build.log

Question:

  • Assuming you don't want the interactive prompt for skipping tests, what flag for build-linux.sh do you want me to create? (--skip-tests, -DskipTests, etc?)
@CodingCat
Copy link
Member

left a comment

let's use --skip-tests

Show resolved Hide resolved jvm-packages/dev/change_version.sh Outdated

@thesuperzapper thesuperzapper force-pushed the thesuperzapper:fix_jvm_build branch from cb88f1f to 1aaf4a6 Jul 8, 2019

@thesuperzapper thesuperzapper reopened this Jul 8, 2019

@thesuperzapper

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

@CodingCat I have now made it so that tests can be skipped using --skip-tests, and maven caches the jars locally in ./dev/.m2.

The way you use the build is: ./dev/build-linux.sh [--skip-tests]

@thesuperzapper

This comment has been minimized.

Copy link
Contributor Author

commented Jul 10, 2019

@CodingCat I have attempted to make a build-linux.cmd for windows docker environment, but this can be a bit flaky, as package-linux.sh sometimes has its line endings change to Windows when you git clone.

Should we COPY the package-linux.sh into the Dockerfile? Or are you happy having it referenced from a local volume mount?

@thesuperzapper

This comment has been minimized.

Copy link
Contributor Author

commented Jul 10, 2019

@CodingCat don't worry, I fixed it with a .gitattributes file.

@trams

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2019

I tried 7cf9f94 in my locally checked out xgboost folder and build-linux.sh failed with

CMake Error: The current CMakeCache.txt directory /xgboost/build/CMakeCache.txt is different than the directory /home/projects/xgboost/build where CMakeCache.txt was created. This may result in binaries being created in the wrong place. If you are not sure, reedit the CMakeCache.txt CMake Error: The source "/xgboost/CMakeLists.txt" does not match the source "/home/projects/xgboost/CMakeLists.txt" used to generate cache. Re-run cmake with a different source directory. building Java wrapper cd .. mkdir -p build cd build cmake .. -DUSE_OPENMP:BOOL=ON -DUSE_HDFS:BOOL=OFF -DUSE_AZURE:BOOL=OFF -DUSE_S3:BOOL=OFF -DUSE_CUDA:BOOL=OFF -DJVM_BINDINGS:BOOL=ON Traceback (most recent call last): File "create_jni.py", line 92, in <module> run("cmake .. " + " ".join(args) + maybe_generator) File "create_jni.py", line 51, in run subprocess.check_call(command, shell=True, **kwargs) File "/opt/python/lib/python3.7/subprocess.py", line 341, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'cmake .. -DUSE_OPENMP:BOOL=ON -DUSE_HDFS:BOOL=OFF -DUSE_AZURE:BOOL=OFF -DUSE_S3:BOOL=OFF -DUSE_CUDA:BOOL=OFF -DJVM_BINDINGS:BOOL=ON' returned non-zero exit status 1.

I think it is because I have already build it locally

I'll retry it on the clean clone

@thesuperzapper

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2019

@trams this usually happens when you attempt to build the project with your local machine, (which is almost certainly not running centos6).

Please try deleting the ./build directory under the project root.

@trams

This comment has been minimized.

Copy link
Contributor

commented Jul 12, 2019

Thank you for reply.

Should we check for this folder or should we use another folder name for in-Docker build?

@thesuperzapper

This comment has been minimized.

Copy link
Contributor Author

commented Jul 12, 2019

@CodingCat do you want me to rename the cmake build directory for JVM builds?

How about: ./build_jvm?

@CodingCat

This comment has been minimized.

Copy link
Member

commented Jul 15, 2019

@CodingCat do you want me to rename the cmake build directory for JVM builds?

How about: ./build_jvm?

is it really necessary? I personally think it will make the process error-prone. Clean everything after building for another arch is not something "too strict"....

Though I didn't verify, if we have different output folder, are we even able to run linux & mac build in parallel? and if that's the case, are we messing up the object files? even not, I would choose to keep the build step as strict as possible to avoid any unexpected behavior

@thesuperzapper thesuperzapper force-pushed the thesuperzapper:fix_jvm_build branch from 8435774 to a9fca1b Jul 16, 2019

@thesuperzapper thesuperzapper force-pushed the thesuperzapper:fix_jvm_build branch from a9fca1b to 0c4329d Jul 16, 2019

@thesuperzapper

This comment has been minimized.

Copy link
Contributor Author

commented Jul 16, 2019

@CodingCat I have squashed the commits, is this good to merge now?

@CodingCat CodingCat merged commit 6323ef9 into dmlc:master Jul 16, 2019

9 of 10 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
Jenkins Linux: Build Stage built successfully
Details
Jenkins Linux: Formatting Check Stage built successfully
Details
Jenkins Linux: Get sources Stage built successfully
Details
Jenkins Linux: Test Stage built successfully
Details
Jenkins Win64: Build Stage built successfully
Details
Jenkins Win64: Get sources Stage built successfully
Details
Jenkins Win64: Test Stage built successfully
Details
continuous-integration/jenkins/pr-merge This commit looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@CodingCat

This comment has been minimized.

Copy link
Member

commented Jul 16, 2019

LGTM

sperlingxx added a commit to alipay/ant-xgboost that referenced this pull request Jul 17, 2019

sperlingxx added a commit to alipay/ant-xgboost that referenced this pull request Jul 17, 2019

@trams

This comment has been minimized.

Copy link
Contributor

commented Jul 17, 2019

Sorry for polluting this closed pull request.

I rerun jvm dev scripts on my linux dev machine again and I realised what kind of problem did I have.
The problem is that all files created inside the docker are owned by root.

So if I have a folder which I develop & build locally then I use this scripts to build a jar then I cannot go back to developing locally without manually cleaning all artifacts cause they are all owned by root

As far as I understand you should not have this problem on OS X or Windows

Why do I want to use these scripts if I already have a linux machine? The reason is I need to build a native shared library inside CentOS 6 docker otherwise I won't be able to use xgboost-spark on my cluster cause this cluster has an old OS with an old libc

That being said there are some work around (mainly this one https://www.jujens.eu/posts/en/2017/Jul/02/docker-userns-remap/) but I think after applying it I won't be able to use ci_build scripts

What do you think if I would modify the Docker container and the scripts to use entrypoint.sh like all other CI build scripts do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.