-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Eigen compatibility with CUDA (10.4.x) #4474
Improve Eigen compatibility with CUDA (10.4.x) #4474
Conversation
Update Eigen to the master branch as of Tue Sep 25 20:26:16 2018 +0200 - hg hash 66ba78bf7efa93f69a075830a87a010ed1b1fe30 - git hash 01ae86b9aad30b1e65cf1b749fd6cd9a645ac00d Patch Tensorflow to follow Eigen internal changes - cherry-pick changes from upstream repository - add local changes for the latest updates
- add support for cache-size queries on CUDA devices. - extend support for matrix inversion on CUDA devices above 4x4 matrices; the size of the matrices that can be inverted is limited at runtime by the per-thread stack size. - extend support for diagonal matrices on CUDA devices. - fix deprecation warning in CUDA 10.0.
@cmsbuild, please test |
The tests are being triggered in jenkins. |
A new Pull Request was created by @fwyzard (Andrea Bocci) for branch IB/CMSSW_10_4_X/gcc700. @cmsbuild, @smuzaffar, @gudrutis, @mrodozov can you please review it and eventually sign? Thanks. |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
When I build all the externals including these changes, I run into a problem with the
but jvm.out is empty. @smuzaffar , @davidlange6 have you seen this before ? do you have any suggestions ? |
I haven't see this, no. I see this possibly related issue
bazelbuild/bazel#3020
you may have less memory than on other machines that we've used to build tensorflow? if it happens more than once in a while we may want to add --jobs=<something reasonable> to try to constrain the resources the build uses.
… On Nov 15, 2018, at 7:18 AM, Andrea Bocci ***@***.***> wrote:
When I build all the externals including these changes, I run into a problem with the tensorflow-python3-sources package, where the build fails with a cryptic message:
Server terminated abruptly (error code: 14, error message: '', log file: '/data/user/fwyzard/patatrack/build/slc7_amd64_gcc700.patatrack/BUILD/slc7_amd64_gcc700/external/tensorflow-python3-sources/1.6.0-patatrack/build/72fcdeb2f560249cbc23c63d6d0200b0/server/jvm.out')
but jvm.out is empty.
If I re-run the same build command in the same build area, the second time it succeeds.
@davidlange6 have you seen this before ? do you have any suggestions ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Mhm, the strange thing is that it always goes like this:
I saw that bazel leaves some build files under $HOME/.cache/bazel . I did not check how much space it uses during a build, is it possible that it runs out of space when doing multiple builds? |
Could it be that first time we started two bazel servers (one for python2 and other for python3) and these two servers stepped out each other? Try adding
in tensorflow-python3-sources to make sure that only one of these run at one time |
On Nov 15, 2018, at 9:51 AM, Andrea Bocci ***@***.***> wrote:
Mhm, the strange thing is that it always goes like this:
• it builds successfully the python 2 version
• it fails the python 3 version
• at the second attempt, it builds successfully the python 3 version as well
I saw that bazel leaves some build files under $HOME/.cache/bazel . I did not check how much space it uses during a build, is it possible that it runs out of space when doing multiple builds?
Right - thats annoying. I have run into bazel filling my $HOME - but then you get an error message telling you that you are out of space...
…
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Thanks, I will give it a try.
Is there any way to tell bazel to use a different temp directory ? |
On Nov 15, 2018, at 10:17 AM, Andrea Bocci ***@***.***> wrote:
Try adding
BuildRequires: tensorflow-python2-sources
in tensorflow-python3-sources to make sure that only one of these run at one time
Thanks, I will give it a try.
I have run into bazel filling my $HOME - but then you get an error message telling you that you are out of space...
Is there any way to tell bazel to use a different temp directory ?
Indeed - unfortunately we already tell bazel to use a different temp area (which gets very big).. i think someone spent a bit of time on this bug, but we didn't solve it yet.
… —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Update Eigen to the master branch as of Tue Sep 25 20:26:16 2018 +0200
Patch Tensorflow to follow Eigen internal changes
Improve Eigen compatibility with CUDA
the size of the matrices that can be inverted is limited at runtime by the per-thread stack size.