-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-553] Restructure dockcross dockerfiles to fix caching #11302
Conversation
Would appreciate a review from @marcoabreu and @lebeg. |
Excellent work @KellenSunderland! Do you think there is any need to make the centos installations also robust to compiler changes? Maybe mentioning #11257 would be good as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Is this a bug in docker? Shall we report it?
@lebeg I think calling the compiler with $CC should be sufficient for CentOS, at least for the time being. We can always update in the future if needed. @larroy I think there's a feature request that will make this easier in the future. Hopefully v19 will include the feature req. It is certainly a breaking change from versions prior to 17, but it's made as a response to some valid security concerns. There's also a work around, but I think implementing this would add quite a lot of complexity. |
Great job, Kellen! Thanks a lot! I think the ccache is now not being applied to all builds. Examples: On some others, it works: Have a look at the execution durations to see whether ccache was actually used or not. Things that still don't work and haven't worked before are GPU builds. Those are a separate issue. |
LGTM I can try what I asked myself as I'm not sure my question was well understood. |
@larroy Sounds good, we can also chat offline about if it in more detail if I've misunderstood. @marcoabreu Not sure I fully understand the examples you've given. Do they relate to the issue that this PR addresses (docker cache misses?). |
./configure | ||
# Manually specify x86 gcc versions so that this script remains compatible with dockcross (which uses an ARM based gcc | ||
# by default). | ||
CC=/usr/bin/gcc CXX=/usr/bin/g++ ./configure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we set this environment variable in the docker file and then unset it? Since it's specific to dockcross.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to ensure we're maintaining the CC and CXX env vars from the 'dockcross/linux-armv7' image, so to do this we'd have to store those and then restore them. Seems more complicated than just overriding them for a single command to me.
What would be the advantage, readability?
Yes, I got the feeling that your changes are causing some builds to not use ccache anymore. It'd be good if you could verify that |
@marcoabreu Gotcha. I think what would be useful would be to include some more information about what ccache is doing into the logs. I'll see if I can create a (separate) PR to do that, and then rebase. |
That's an excellent idea! Maybe just printing the cache hit/miss statistics after a build (not globally using ccache -s as it will include everything). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't really work for me:
pllarroy@186590d670bd:0:~/devel/mxnet/mxnet$ ci/build.py -p android_arm64
build.py: 2018-06-16 23:12:21,380 Building container tagged 'mxnetci/build.android_arm64' with docker
build.py: 2018-06-16 23:12:21,380 Running command: 'docker build -f docker/Dockerfile.build.android_arm64 --build-arg USER_ID=1177751787 --cache-from mxnetci/build.android_arm64 -t mxnetci/build.android_arm64 docker'
Sending build context to Docker daemon 103.4kB
Step 1/27 : FROM dockcross/base:latest
---> a4a05890b715
Step 2/27 : MAINTAINER Pedro Larroy "pllarroy@amazon.com"
---> Using cache
---> 7592fc7b9c5c
Step 3/27 : COPY --from=ccachebuilder /usr/local/bin/ccache /usr/local/bin/ccache
invalid from flag value ccachebuilder: pull access denied for ccachebuilder, repository does not exist or may require 'docker login'
Traceback (most recent call last):
File "ci/build.py", line 357, in <module>
sys.exit(main())
File "ci/build.py", line 283, in main
build_docker(platform, docker_binary, registry=args.docker_registry)
File "ci/build.py", line 84, in build_docker
check_call(cmd)
File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 291, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['docker', 'build', '-f', 'docker/Dockerfile.build.android_arm64', '--build-arg', 'USER_ID=1177751787', '--cache-from', 'mxnetci/build.android_arm64', '-t', 'mxnetci/build.android_arm64', 'docker']' returned non-zero exit status 1.
Could you elaborate Pedro? Kellen removes the multi head in his PR while your log seems to be referencing a dockerfile that still has it. Could it be possible you checked out the wrong commit? |
2df969f
to
3e8ab97
Compare
@marcoabreu @larroy: No actually the error Pedro pointed out was a valid error. I've fixed it and am investigating why the build succeeded earlier. Edit: looks like the earlier run didn't tests Android/64 so that would explain it. Thanks for the testing Pedro. |
3e8ab97
to
065128c
Compare
Ah your PR did not change Android Arm64 because we just merged it and that caused the race condition in the git merge. Thanks for elaborating |
Any other data you want to see here @marcoabreu ? No rush on the merge but lmk if you need anything else. |
I compared the two following runs: I noticed the following slow downs - partially quite significantely: The other ones have not been slowed down or improved. I think we should investigate this before we merge. What I noticed is that the CentOS ones are still exactly the same while almost all other jobs got slower by up to factors like 10x |
Right but why would we see speed ups before the docker cache is pushed?
…On Tue, Jun 19, 2018, 7:48 AM Marco de Abreu ***@***.***> wrote:
I compared the two following runs:
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11302/6/pipeline/58
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/1012/pipeline/56/
I noticed the following slow downs - partially quite significantely:
CPU clang 3.9, CPU clang 3.9 MKLDNN, CPU clang 5.0, CPU clang 5.0 MKLDNN,
CPU MKLDNN, CPU: Openblas, GPU CMake, GPU CMake MKLDNN, GPU CUDA 9.1, GPU
MKLDNN, NVidia Jetson ARMv8
The other ones have not been slowed down or improved.
I think we should investigate this before we merge. What I noticed is that
the CentOS ones are still exactly the same while almost all other jobs got
slower by up to factors like 10x
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11302 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHGTE4B-l7x3IR7zmYuG7Xsri9n74YR4ks5t-JCxgaJpZM4Upb4P>
.
|
Oops, I forgot to take that time into account. Thank you and sorry for the inconvenience. |
Thanks man, appreciate it. I'll make sure and monitor master for a couple days and verify that the cache is being utilized. |
…11302) * Add ccache reporting to CI * Restructure dockcross dockerfiles to fix caching
…11302) * Add ccache reporting to CI * Restructure dockcross dockerfiles to fix caching
Description
This PR restructures the dockcross Dockerfiles to remove their multi-head components. These multi-head components made it difficult to properly provide a remote cache for the Dockerfiles, and was slowing down builds. A simple work-around for this was to generalize the ubuntu_ccache.sh file such that it works with ubuntu, debian, and in cross-compilation containers. We can now use this script the same way in a variety of containers, which eliminates the need for a special multi-head docker stage.
Should address #11257.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.