New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Update TF to 2.4.1. #6545
Conversation
A new Pull Request was created by @riga (Marcel R.) for branch IB/CMSSW_11_3_X/master. @cmsbuild, @smuzaffar, @mrodozov can you please review it and eventually sign? Thanks. |
last issue we had with |
Great, thank you! I gave you full access to my tensorflow v2.4.0 branch in case you want to test things on top.
since bazel uses its process wrapper again, but I couldn't yet spot the place where to disable that. |
@riga, https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_11_3_X/master/bazel-3.7.0-patches.patch should disable the bazel process wrapper use |
I tried bazel 3.7.2 in the morning. it builds with the existing patch. haven't tried TF 24 yet |
@smuzaffar This was my expectation as well. Maybe there is another place where a patch is needed, I'll have a look. |
looks like you also need the following
|
@riga, also remove https://github.com/riga/tensorflow/blob/cms/v2.4.0/tensorflow/workspace.bzl#L393-L402 which should be picked up from cms |
I think I even had that at some point, but ran into another problem (will post it).
Right, thank you. |
The
(both 820 and 900). Do you know if I'm missing something? |
gcc version in this branch is GCC 9, so please use |
I'm experiencing the same issue with gcc900 based the latest commit (ddec832).
I'm wondering why cuda isn't picked up from the cache. |
are you sure you have latest cmsdist IB/CMSSW_11_3_X/master branch + your changes? Also try to build in a fresh area. Use |
was there any progress on this? |
Pull request #6545 was updated. |
Conflicts: pip/requirements.txt tensorflow-requires.file
Pull request #6545 was updated. |
Hi @smuzaffar,
Unfortunately, nothing else is logged to track down the error. Did you perhaps see something similar in the 2.3 integration? |
@riga, I think issue here is that |
Thanks for the hint. Do you think we can use the version of |
yes we can and we should. I had updated TF to use grpc from our stack but I am afraid we are stuck with TF 2.3 unless grpc supports c++17 |
Just curious, why couldn't we build |
we need to build with c++17 otheriwse it using absl instead of string_view |
c++11…
Perhaps this thread?
https://stackoverflow.com/questions/59939678/grpc-auth-context-h-using-stditerator-produces-deprecated-warning-with-latest
… On Feb 15, 2021, at 3:21 PM, Matti Kortelainen ***@***.***> wrote:
I had updated TF to use grpc from our stack but I am afraid we are stuck with TF 2.3 unless grpc supports c++17
Just curious, why couldn't we build grpc with C++14?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I built grpc with
|
I had done all that but it still fails. Even latest version of grpc with protobuf 3.12 fails. |
I got somewhere using only the py3 recipe, now I'm at:
@smuzaffar would you tell us if what you tried failed again with segfault or with this same missing symbol (which has numerous issues related to it and looks like it can be fixed, or at least it's better than the segfault) |
this was the exact error I got for py3 |
@mrodozov , note that TF uses its internal grpc, so building our grpc is not going to change TF build. I will push the changes for TF to use our GRPC. |
Pull request #6545 was updated. |
Pull request #6545 was updated. |
Pull request #6545 was updated. |
Ok, I have updated this PR to use grpc from cms externals. TF sources are not downloaded from cms-externals organization. |
Ah, GitHub did not update the comments above while my tab was open. I roughly did the same + I managed to built the lastest grpc (1.35.0) after patching the parts of the code that were not C++17 compatible (at least the lines that error'd in the build) and updating protobuf to the lastest version (3.14.0). I couldn't fully test this yet as TF is still compiling but I'll keep you posted. Edit: the same error persists. |
Pull request #6545 was updated. |
I went a bit down the rabbit whole to track down the missing symbol. First of all, all links in both The missing symbol
which is indeed missing in Also I noted that both gRPC and TF bring their own absl library, potentially in different versions, so we might consider building absl in cmsdist and linking it. |
Hmm since the last commits we are using gprc from our externals, but grpc is installing absl as a submodule, and also compiles it with C++14 by default (I think) |
That's actually what I'm doing :) I updated gPRC to 1.35.0 and build with C++17. |
closing in favor of #6674 |
This PR is meant to update TensorFlow to version 2.4.1 which seems to be the first release that is compatible with CUDA 11 in our software stack, so we can eventually aim for GPU support.
However, in terms of the usual manual patches we apply during the integration process, 2.4 drops SWIG but uses pybind11 instead to generate its bindings. This change interferes with some of the patches in our build process (no more
use_default_shell_env
@smuzaffar @mrodozov) which is why I opened the PR as "work-in-progress" to store discussions early on for later reference.So far, the changes to cmsdist are minimal and most of the work needs to be done in the external tensorflow fork.
The general idea would be to enable GPU support in a second PR after 2.4.1 is successfully integrated.
@mialiu149 @vlimant @gkasieczka