Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building TensorFlow with custom GCC requires hardcoded ld,nm and as #1713

Closed
akors opened this issue Sep 2, 2016 · 11 comments
Closed

Building TensorFlow with custom GCC requires hardcoded ld,nm and as #1713

akors opened this issue Sep 2, 2016 · 11 comments

Comments

@akors
Copy link

akors commented Sep 2, 2016

Hi! I am using Fedora 23 and Ubuntu 16.04.1 LTS to build TensorFlow with GPU support.

GPU support requires a specific GCC version, 5.3 for successful compilation. Since this version is not available from the Fedora Package repositories, it has to be compiled from source. Unfortunately, there are a few obstacles using this version with bazel.

One of those is the following:
After configuring TensorFlow with the self-compiled GCC, and running bazel, the compilation stops with the following message:

gcc: error trying to exec 'as': execvp: No such file or directory
or
gcc: error trying to exec 'nm': execvp: No such file or directory
or
gcc: error trying to exec 'ld': execvp: No such file or directory

To work around this, one can compile GCC by hardcoding the paths to those tools, by adding this to the configuration line of GCC:
--with-ld=/bin/ld --with-nm=/bin/nm --with-as=/usr/bin/as

This does seem rather strange, however, because those programs are on the path, and the custom GCC without bazel is capable of producing working binaries just fine. I feel that in a well-working build system, this kind of hacks should not be necessary.

There was a TensorFlow issue for this: tensorflow/tensorflow#2806 but the devs believe that this is a bazel problem.

@philwo
Copy link
Member

philwo commented Sep 2, 2016

Hi @akors,

this is pretty strange, but we can figure out together what's going on here. :)

Could you please provide the following:

  • Output of "env" right before you run "bazel build", to see which environment variables you usually have set.
  • Output of "bazel build -s //your:target" (if the log is too long, the important part here is just the actual failing step, so we can see if Bazel correctly passes a good PATH to gcc).
  • The absolute path to the as / nm / ld tools that your gcc is supposed to use (I guess they're not really in /bin?).

Thanks!

@akors
Copy link
Author

akors commented Sep 2, 2016

Output of "env" right before you run "bazel build", to see which environment variables you usually have set.

Here is the env output. Note that CC, CXXand LD_LIBRARY_PATH has been manually set by me.
https://gist.github.com/akors/3474216eabf1b777196b8a0bcd582b54

Output of "bazel build -s //your:target" (if the log is too long, the important part here is just the actual failing step, so we can see if Bazel correctly passes a good PATH to gcc).

The output is indeed quite long because it is cluttered with compiler warnings. I have pasted only the last few messages.
By the way, The command I run is
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package --verbose_failures

INFO: From Compiling tensorflow/core/ops/functional_grad.cc:
In file included from tensorflow/core/ops/functional_grad.cc:16:0:
./tensorflow/core/framework/function.h:457:15: warning: 'tensorflow::unused_grad_0' defined but not used [-Wunused-variable]
   static bool unused_grad_##ctr = SHOULD_REGISTER_OP_GRADIENT && \
               ^
./tensorflow/core/framework/function.h:454:3: note: in expansion of macro 'REGISTER_OP_GRADIENT_UNIQ'
   REGISTER_OP_GRADIENT_UNIQ(ctr, name, fn)
   ^
./tensorflow/core/framework/function.h:448:3: note: in expansion of macro 'REGISTER_OP_GRADIENT_UNIQ_HELPER'
   REGISTER_OP_GRADIENT_UNIQ_HELPER(__COUNTER__, name, fn)
   ^
tensorflow/core/ops/functional_grad.cc:56:1: note: in expansion of macro 'REGISTER_OP_GRADIENT'
 REGISTER_OP_GRADIENT("MapAccumulate", MapAccumulateGrad);
 ^
INFO: From Compiling tensorflow/core/kernels/batchtospace_op_gpu.cu.cc:
nvcc warning : option '--relaxed-constexpr' has been deprecated and replaced by option '--expt-relaxed-constexpr'.
nvcc warning : option '--relaxed-constexpr' has been deprecated and replaced by option '--expt-relaxed-constexpr'.
gcc: error trying to exec 'as': execvp: No such file or directory
ERROR: /home/akorsunsky/.local/src/tensorflow/tensorflow/core/kernels/BUILD:1530:1: output 'tensorflow/core/kernels/_objs/batchtospace_op_gpu/tensorflow/core/kernels/batchtospace_op_gpu.cu.pic.o' was not created.
ERROR: /home/akorsunsky/.local/src/tensorflow/tensorflow/core/kernels/BUILD:1530:1: not all outputs were created.
ERROR: /home/akorsunsky/.local/src/tensorflow/tensorflow/core/kernels/BUILD:1550:1: output 'tensorflow/core/kernels/_objs/depth_space_ops_gpu/tensorflow/core/kernels/spacetodepth_op_gpu.cu.pic.o' was not created.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 468.069s, Critical Path: 426.33s

The absolute path to the as / nm / ld tools that your gcc is supposed to use (I guess they're not really in /bin?).

Oops, my bad, they are actually in /usr/bin. However, this does not make a difference, the main point is that the binaries can't be found on either system, and I have to pass the correct path during compiler configuration.
On Fedora, they were indeed in /bin. On Ubuntu which I'm currently using for testing, they are in /usr/bin/.

By the way, this is version 0.3.0 from the Ubuntu repositories:

Build label: 0.3.0
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri Jun 10 11:38:23 2016 (1465558703)
Build timestamp: 1465558703
Build timestamp as int: 1465558703

If I compile bazel from the current HEAD, I get this weirdness:

Found non-responsive server process (pid=12635). Killing it.
.
ERROR: no such package '@local_config_cuda//crosstool': BUILD file not found on package path.
ERROR: no such package '@local_config_cuda//crosstool': BUILD file not found on package path.
INFO: Elapsed time: 0.752s

@aehlig
Copy link
Contributor

aehlig commented Sep 2, 2016

After configuring TensorFlow with the self-compiled GCC, and running bazel, the compilation stops with the following message:

gcc: error trying to exec 'as': execvp: No such file or directory
or
gcc: error trying to exec 'nm': execvp: No such file or directory
or
gcc: error trying to exec 'ld': execvp: No such file or directory

To work around this, one can compile GCC by hardcoding the paths to those tools, by adding this to the configuration line of GCC:
--with-ld=/bin/ld --with-nm=/bin/nm --with-as=/usr/bin/as

Just to be sure, did you check that this not not related to the way you
built your gcc? I.e., it is not expecting the dependent tools next to its
binary. Did you try your custom gcc to build and link a file (not using
bazel at all)?

Thanks,
Klaus

Klaus Aehlig
Google Germany GmbH, Erika-Mann-Str. 33, 80636 Muenchen
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschaeftsfuehrer: Matthew Scott Sucherman, Paul Terence Manicle

@akors
Copy link
Author

akors commented Sep 2, 2016

@aehlig

Just to be sure, did you check that this not not related to the way you
built your gcc?

Yes, it works perfectly fine:

$ cat > helloworld.cpp << EOF
> 
> extern void sayhello();
> 
> int main()
> {
>     sayhello();
>     return 0;
> }
> EOF
$ cat > sayhello.cpp << EOF
> #include <iostream>
> 
> void sayhello()
> {
>     std::cout<<"Hello world!\n";
> }
> EOF
$ /opt/gcc-5.3-nh/bin/g++ helloworld.cpp sayhello.cpp -o helloworld
$ ./helloworld
Hello world!
$ 

@aehlig
Copy link
Contributor

aehlig commented Sep 2, 2016

Output of "bazel build -s //your:target" (if the log is too long, the important part here is just the actual failing step, so we can see if Bazel correctly passes a good PATH to gcc).

The output is indeed quite long because it is cluttered with compiler warnings. I have pasted only the last few messages.
By the way, The command I run is
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package --verbose_failures

Can you please provide the (tail of the) output of "bazel build -s //your:target"? Note the "-s" Options
which shows the precise way the actions have been called.

Thanks,
Klaus

Klaus Aehlig
Google Germany GmbH, Erika-Mann-Str. 33, 80636 Muenchen
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschaeftsfuehrer: Matthew Scott Sucherman, Paul Terence Manicle

@akors
Copy link
Author

akors commented Sep 2, 2016

@aehlig
Here's the tail of a build with the -s option:

>>>>> # //tensorflow/core/kernels:strided_slice_op [action 'Compiling tensorflow/core/kernels/strided_slice_op_inst_1.cc']
(cd /home/akorsunsky/.cache/bazel/_bazel_akorsunsky/e5ea3746ee36904dabb4939717fade59/execroot/tensorflow && \
  exec env - \
    PATH=/home/akorsunsky/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections '-std=c++11' '-frandom-seed=bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/strided_slice_op/tensorflow/core/kernels/strided_slice_op_inst_1.pic.o' -fPIC -DHAVE_CONFIG_H -iquote . -iquote bazel-out/local_linux-py3-opt/genfiles -iquote external/protobuf -iquote bazel-out/local_linux-py3-opt/genfiles/external/protobuf -iquote external/bazel_tools -iquote bazel-out/local_linux-py3-opt/genfiles/external/bazel_tools -iquote external/farmhash_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/farmhash_archive -iquote external/gif_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/gif_archive -iquote external/highwayhash -iquote bazel-out/local_linux-py3-opt/genfiles/external/highwayhash -iquote external/jpeg_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/jpeg_archive -iquote external/png_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/png_archive -iquote external/re2 -iquote bazel-out/local_linux-py3-opt/genfiles/external/re2 -iquote external/eigen_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/eigen_archive -iquote external/zlib_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/zlib_archive -iquote external/local_config_cuda -iquote bazel-out/local_linux-py3-opt/genfiles/external/local_config_cuda -isystem external/protobuf/src -isystem bazel-out/local_linux-py3-opt/genfiles/external/protobuf/src -isystem external/bazel_tools/tools/cpp/gcc3 -isystem external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260 -isystem bazel-out/local_linux-py3-opt/genfiles/external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260 -isystem external/gif_archive/giflib-5.1.4/lib -isystem bazel-out/local_linux-py3-opt/genfiles/external/gif_archive/giflib-5.1.4/lib -isystem external/highwayhash -isystem bazel-out/local_linux-py3-opt/genfiles/external/highwayhash -isystem external/jpeg_archive/jpeg-9a -isystem bazel-out/local_linux-py3-opt/genfiles/external/jpeg_archive/jpeg-9a -isystem external/png_archive/libpng-1.2.53 -isystem bazel-out/local_linux-py3-opt/genfiles/external/png_archive/libpng-1.2.53 -isystem external/re2 -isystem bazel-out/local_linux-py3-opt/genfiles/external/re2 -isystem external/eigen_archive -isystem bazel-out/local_linux-py3-opt/genfiles/external/eigen_archive -isystem external/zlib_archive/zlib-1.2.8 -isystem bazel-out/local_linux-py3-opt/genfiles/external/zlib_archive/zlib-1.2.8 -isystem external/local_config_cuda/cuda/include -isystem bazel-out/local_linux-py3-opt/genfiles/external/local_config_cuda/cuda/include -isystem external/local_config_cuda/cuda -isystem bazel-out/local_linux-py3-opt/genfiles/external/local_config_cuda/cuda -fno-exceptions -DEIGEN_AVOID_STL_ARRAY '-DGOOGLE_CUDA=1' -pthread '-DGOOGLE_CUDA=1' -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fno-canonical-system-headers -MD -MF bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/strided_slice_op/tensorflow/core/kernels/strided_slice_op_inst_1.pic.d -c tensorflow/core/kernels/strided_slice_op_inst_1.cc -o bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/strided_slice_op/tensorflow/core/kernels/strided_slice_op_inst_1.pic.o)
INFO: From Compiling tensorflow/core/kernels/spacetodepth_op_gpu.cu.cc:
nvcc warning : option '--relaxed-constexpr' has been deprecated and replaced by option '--expt-relaxed-constexpr'.
nvcc warning : option '--relaxed-constexpr' has been deprecated and replaced by option '--expt-relaxed-constexpr'.
gcc: error trying to exec 'as': execvp: No such file or directory
ERROR: /home/akorsunsky/.local/src/tensorflow/tensorflow/core/kernels/BUILD:1550:1: output 'tensorflow/core/kernels/_objs/depth_space_ops_gpu/tensorflow/core/kernels/spacetodepth_op_gpu.cu.pic.o' was not created.
ERROR: /home/akorsunsky/.local/src/tensorflow/tensorflow/core/kernels/BUILD:1550:1: not all outputs were created.
>>>>> # //tensorflow/core/kernels:strided_slice_op [action 'Compiling tensorflow/core/kernels/strided_slice_op_inst_3.cc']
(cd /home/akorsunsky/.cache/bazel/_bazel_akorsunsky/e5ea3746ee36904dabb4939717fade59/execroot/tensorflow && \
  exec env - \
    PATH=/home/akorsunsky/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections '-std=c++11' '-frandom-seed=bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/strided_slice_op/tensorflow/core/kernels/strided_slice_op_inst_3.pic.o' -fPIC -DHAVE_CONFIG_H -iquote . -iquote bazel-out/local_linux-py3-opt/genfiles -iquote external/protobuf -iquote bazel-out/local_linux-py3-opt/genfiles/external/protobuf -iquote external/bazel_tools -iquote bazel-out/local_linux-py3-opt/genfiles/external/bazel_tools -iquote external/farmhash_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/farmhash_archive -iquote external/gif_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/gif_archive -iquote external/highwayhash -iquote bazel-out/local_linux-py3-opt/genfiles/external/highwayhash -iquote external/jpeg_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/jpeg_archive -iquote external/png_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/png_archive -iquote external/re2 -iquote bazel-out/local_linux-py3-opt/genfiles/external/re2 -iquote external/eigen_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/eigen_archive -iquote external/zlib_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/zlib_archive -iquote external/local_config_cuda -iquote bazel-out/local_linux-py3-opt/genfiles/external/local_config_cuda -isystem external/protobuf/src -isystem bazel-out/local_linux-py3-opt/genfiles/external/protobuf/src -isystem external/bazel_tools/tools/cpp/gcc3 -isystem external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260 -isystem bazel-out/local_linux-py3-opt/genfiles/external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260 -isystem external/gif_archive/giflib-5.1.4/lib -isystem bazel-out/local_linux-py3-opt/genfiles/external/gif_archive/giflib-5.1.4/lib -isystem external/highwayhash -isystem bazel-out/local_linux-py3-opt/genfiles/external/highwayhash -isystem external/jpeg_archive/jpeg-9a -isystem bazel-out/local_linux-py3-opt/genfiles/external/jpeg_archive/jpeg-9a -isystem external/png_archive/libpng-1.2.53 -isystem bazel-out/local_linux-py3-opt/genfiles/external/png_archive/libpng-1.2.53 -isystem external/re2 -isystem bazel-out/local_linux-py3-opt/genfiles/external/re2 -isystem external/eigen_archive -isystem bazel-out/local_linux-py3-opt/genfiles/external/eigen_archive -isystem external/zlib_archive/zlib-1.2.8 -isystem bazel-out/local_linux-py3-opt/genfiles/external/zlib_archive/zlib-1.2.8 -isystem external/local_config_cuda/cuda/include -isystem bazel-out/local_linux-py3-opt/genfiles/external/local_config_cuda/cuda/include -isystem external/local_config_cuda/cuda -isystem bazel-out/local_linux-py3-opt/genfiles/external/local_config_cuda/cuda -fno-exceptions -DEIGEN_AVOID_STL_ARRAY '-DGOOGLE_CUDA=1' -pthread '-DGOOGLE_CUDA=1' -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fno-canonical-system-headers -MD -MF bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/strided_slice_op/tensorflow/core/kernels/strided_slice_op_inst_3.pic.d -c tensorflow/core/kernels/strided_slice_op_inst_3.cc -o bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/strided_slice_op/tensorflow/core/kernels/strided_slice_op_inst_3.pic.o)
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 347.157s, Critical Path: 314.25s

The full log can be accessed here:
https://gist.github.com/akors/dabe16016ba4ccec8fa1f230325f2669

@Phhere
Copy link

Phhere commented Sep 9, 2016

Any update on this? I have the same problem with my local gcc

@aehlig
Copy link
Contributor

aehlig commented Oct 6, 2016

There are two aspects to that issue.

(a) Passing through environemt variables; this should be solved
with our environment variables design
https://github.com/bazelbuild/bazel/blob/master/site/designs/_posts/2016-06-21-environment.md
which is implemented in 0.3.2 and later. So, to get environment variables available, just add

  build --action_env=PATH
  build --action_env=LD_LIBRARY_PATH
  build --action_env=CC
  build --action_env=CXX

to the appropriate rc-file; you can also add specific values there. In particular, I would
recommend to set PATH in the user-workspace rc file (i.e., the .bazelrc next to the WORKSPACE)
such that the the CC is the first {g,}cc on the search path.

(b) There is the "autoconfiguration" aspect, where bazel tries to be smart and infer properties
of the C compiler to be used. @lberki is a lot more knowledgable about this aspect than I am.

Klaus Aehlig
Google Germany GmbH, Erika-Mann-Str. 33, 80636 Muenchen
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschaeftsfuehrer: Matthew Scott Sucherman, Paul Terence Manicle

@lberki
Copy link
Contributor

lberki commented Oct 7, 2016

@aehlig: I'm not :)

I kind of agree with the sentiment that the autodetection should work better, but the smarter it is, the more surprising it is when it doesn't work, so a bit of care needs to be taken.

What do gcc -print-prog-name=as rand gcc -print-prog-name=ld return? If they return the right paths, we could make our autoconfiguration script call that and make sure that those are accessible somehow.

(Note, though, that that command can return non-normalized paths e.g. with .. path segments in it)

@akors
Copy link
Author

akors commented Oct 7, 2016

In particular, I would recommend to set PATH in the user-workspace rc file

To make matters unnecessarily complicated, in the case of tensorflow, there are 2 compilers involved: the one that builds the CUDA kernels (used by nvcc), and the one that builds the rest. Those can be separate compilers, if I understand correctly. Not asking for the ultimate solution here, this is just a heads-up.

I kind of agree with the sentiment that the autodetection should work better, but the smarter it is, the more surprising it is when it doesn't work

I gotta say, getting "no such file or directory" errors for binaries that very much exist and are on the PATH, which doesn't happen when running directly without bazel is somewhat surprising to users as well ;)

What do gcc -print-prog-name=as rand gcc -print-prog-name=ld return?

Just tried it out, and that depends on if the values were hardcoded during compiler compilation or not:

$ /opt/gcc-5.3/bin/gcc -print-prog-name=as
/usr/bin/as
$ /opt/gcc-5.3-nh/bin/gcc -print-prog-name=as
as

Here, the first one is with hardcoded paths (configured with --with-as=/usr/bin/as), the second one is configured without that flag.

@lberki
Copy link
Contributor

lberki commented Oct 7, 2016

Yeah... unfortunately, passing PATH through to actions also has its own issues. Once upon a time, we came up with a rather complex plan to make this work as best as we can:

http://www.bazel.io/docs/designs/2016/06/21/environment.html

It's essentially a balancing act between actions being reproducible (and thus not dependent on the environment Bazel is run in) and convenience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants