Building TensorFlow with custom GCC requires hardcoded ld,nm and as #1713

akors · 2016-09-02T11:50:48Z

Hi! I am using Fedora 23 and Ubuntu 16.04.1 LTS to build TensorFlow with GPU support.

GPU support requires a specific GCC version, 5.3 for successful compilation. Since this version is not available from the Fedora Package repositories, it has to be compiled from source. Unfortunately, there are a few obstacles using this version with bazel.

One of those is the following:
After configuring TensorFlow with the self-compiled GCC, and running bazel, the compilation stops with the following message:

gcc: error trying to exec 'as': execvp: No such file or directory
or
gcc: error trying to exec 'nm': execvp: No such file or directory
or
gcc: error trying to exec 'ld': execvp: No such file or directory

To work around this, one can compile GCC by hardcoding the paths to those tools, by adding this to the configuration line of GCC:
--with-ld=/bin/ld --with-nm=/bin/nm --with-as=/usr/bin/as

This does seem rather strange, however, because those programs are on the path, and the custom GCC without bazel is capable of producing working binaries just fine. I feel that in a well-working build system, this kind of hacks should not be necessary.

There was a TensorFlow issue for this: tensorflow/tensorflow#2806 but the devs believe that this is a bazel problem.

The text was updated successfully, but these errors were encountered:

philwo · 2016-09-02T12:10:43Z

Hi @akors,

this is pretty strange, but we can figure out together what's going on here. :)

Could you please provide the following:

Output of "env" right before you run "bazel build", to see which environment variables you usually have set.
Output of "bazel build -s //your:target" (if the log is too long, the important part here is just the actual failing step, so we can see if Bazel correctly passes a good PATH to gcc).
The absolute path to the as / nm / ld tools that your gcc is supposed to use (I guess they're not really in /bin?).

Thanks!

akors · 2016-09-02T12:41:27Z

Output of "env" right before you run "bazel build", to see which environment variables you usually have set.

Here is the env output. Note that CC, CXXand LD_LIBRARY_PATH has been manually set by me.
https://gist.github.com/akors/3474216eabf1b777196b8a0bcd582b54

Output of "bazel build -s //your:target" (if the log is too long, the important part here is just the actual failing step, so we can see if Bazel correctly passes a good PATH to gcc).

The output is indeed quite long because it is cluttered with compiler warnings. I have pasted only the last few messages.
By the way, The command I run is
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package --verbose_failures

INFO: From Compiling tensorflow/core/ops/functional_grad.cc:
In file included from tensorflow/core/ops/functional_grad.cc:16:0:
./tensorflow/core/framework/function.h:457:15: warning: 'tensorflow::unused_grad_0' defined but not used [-Wunused-variable]
   static bool unused_grad_##ctr = SHOULD_REGISTER_OP_GRADIENT && \
               ^
./tensorflow/core/framework/function.h:454:3: note: in expansion of macro 'REGISTER_OP_GRADIENT_UNIQ'
   REGISTER_OP_GRADIENT_UNIQ(ctr, name, fn)
   ^
./tensorflow/core/framework/function.h:448:3: note: in expansion of macro 'REGISTER_OP_GRADIENT_UNIQ_HELPER'
   REGISTER_OP_GRADIENT_UNIQ_HELPER(__COUNTER__, name, fn)
   ^
tensorflow/core/ops/functional_grad.cc:56:1: note: in expansion of macro 'REGISTER_OP_GRADIENT'
 REGISTER_OP_GRADIENT("MapAccumulate", MapAccumulateGrad);
 ^
INFO: From Compiling tensorflow/core/kernels/batchtospace_op_gpu.cu.cc:
nvcc warning : option '--relaxed-constexpr' has been deprecated and replaced by option '--expt-relaxed-constexpr'.
nvcc warning : option '--relaxed-constexpr' has been deprecated and replaced by option '--expt-relaxed-constexpr'.
gcc: error trying to exec 'as': execvp: No such file or directory
ERROR: /home/akorsunsky/.local/src/tensorflow/tensorflow/core/kernels/BUILD:1530:1: output 'tensorflow/core/kernels/_objs/batchtospace_op_gpu/tensorflow/core/kernels/batchtospace_op_gpu.cu.pic.o' was not created.
ERROR: /home/akorsunsky/.local/src/tensorflow/tensorflow/core/kernels/BUILD:1530:1: not all outputs were created.
ERROR: /home/akorsunsky/.local/src/tensorflow/tensorflow/core/kernels/BUILD:1550:1: output 'tensorflow/core/kernels/_objs/depth_space_ops_gpu/tensorflow/core/kernels/spacetodepth_op_gpu.cu.pic.o' was not created.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 468.069s, Critical Path: 426.33s

The absolute path to the as / nm / ld tools that your gcc is supposed to use (I guess they're not really in /bin?).

Oops, my bad, they are actually in /usr/bin. However, this does not make a difference, the main point is that the binaries can't be found on either system, and I have to pass the correct path during compiler configuration.
On Fedora, they were indeed in /bin. On Ubuntu which I'm currently using for testing, they are in /usr/bin/.

By the way, this is version 0.3.0 from the Ubuntu repositories:

Build label: 0.3.0
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri Jun 10 11:38:23 2016 (1465558703)
Build timestamp: 1465558703
Build timestamp as int: 1465558703

If I compile bazel from the current HEAD, I get this weirdness:

Found non-responsive server process (pid=12635). Killing it.
.
ERROR: no such package '@local_config_cuda//crosstool': BUILD file not found on package path.
ERROR: no such package '@local_config_cuda//crosstool': BUILD file not found on package path.
INFO: Elapsed time: 0.752s

aehlig · 2016-09-02T13:07:27Z

After configuring TensorFlow with the self-compiled GCC, and running bazel, the compilation stops with the following message:

gcc: error trying to exec 'as': execvp: No such file or directory
or
gcc: error trying to exec 'nm': execvp: No such file or directory
or
gcc: error trying to exec 'ld': execvp: No such file or directory

To work around this, one can compile GCC by hardcoding the paths to those tools, by adding this to the configuration line of GCC:
--with-ld=/bin/ld --with-nm=/bin/nm --with-as=/usr/bin/as

Just to be sure, did you check that this not not related to the way you
built your gcc? I.e., it is not expecting the dependent tools next to its
binary. Did you try your custom gcc to build and link a file (not using
bazel at all)?

Thanks,
Klaus

Klaus Aehlig
Google Germany GmbH, Erika-Mann-Str. 33, 80636 Muenchen
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschaeftsfuehrer: Matthew Scott Sucherman, Paul Terence Manicle

akors · 2016-09-02T13:19:35Z

@aehlig

Just to be sure, did you check that this not not related to the way you
built your gcc?

Yes, it works perfectly fine:

$ cat > helloworld.cpp << EOF
> 
> extern void sayhello();
> 
> int main()
> {
>     sayhello();
>     return 0;
> }
> EOF
$ cat > sayhello.cpp << EOF
> #include <iostream>
> 
> void sayhello()
> {
>     std::cout<<"Hello world!\n";
> }
> EOF
$ /opt/gcc-5.3-nh/bin/g++ helloworld.cpp sayhello.cpp -o helloworld
$ ./helloworld
Hello world!
$

aehlig · 2016-09-02T13:20:08Z

Output of "bazel build -s //your:target" (if the log is too long, the important part here is just the actual failing step, so we can see if Bazel correctly passes a good PATH to gcc).

The output is indeed quite long because it is cluttered with compiler warnings. I have pasted only the last few messages.
By the way, The command I run is
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package --verbose_failures

Can you please provide the (tail of the) output of "bazel build -s //your:target"? Note the "-s" Options
which shows the precise way the actions have been called.

Thanks,
Klaus

Klaus Aehlig
Google Germany GmbH, Erika-Mann-Str. 33, 80636 Muenchen
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschaeftsfuehrer: Matthew Scott Sucherman, Paul Terence Manicle

akors · 2016-09-02T15:53:10Z

@aehlig
Here's the tail of a build with the -s option:

>>>>> # //tensorflow/core/kernels:strided_slice_op [action 'Compiling tensorflow/core/kernels/strided_slice_op_inst_1.cc']
(cd /home/akorsunsky/.cache/bazel/_bazel_akorsunsky/e5ea3746ee36904dabb4939717fade59/execroot/tensorflow && \
  exec env - \
    PATH=/home/akorsunsky/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections '-std=c++11' '-frandom-seed=bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/strided_slice_op/tensorflow/core/kernels/strided_slice_op_inst_1.pic.o' -fPIC -DHAVE_CONFIG_H -iquote . -iquote bazel-out/local_linux-py3-opt/genfiles -iquote external/protobuf -iquote bazel-out/local_linux-py3-opt/genfiles/external/protobuf -iquote external/bazel_tools -iquote bazel-out/local_linux-py3-opt/genfiles/external/bazel_tools -iquote external/farmhash_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/farmhash_archive -iquote external/gif_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/gif_archive -iquote external/highwayhash -iquote bazel-out/local_linux-py3-opt/genfiles/external/highwayhash -iquote external/jpeg_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/jpeg_archive -iquote external/png_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/png_archive -iquote external/re2 -iquote bazel-out/local_linux-py3-opt/genfiles/external/re2 -iquote external/eigen_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/eigen_archive -iquote external/zlib_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/zlib_archive -iquote external/local_config_cuda -iquote bazel-out/local_linux-py3-opt/genfiles/external/local_config_cuda -isystem external/protobuf/src -isystem bazel-out/local_linux-py3-opt/genfiles/external/protobuf/src -isystem external/bazel_tools/tools/cpp/gcc3 -isystem external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260 -isystem bazel-out/local_linux-py3-opt/genfiles/external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260 -isystem external/gif_archive/giflib-5.1.4/lib -isystem bazel-out/local_linux-py3-opt/genfiles/external/gif_archive/giflib-5.1.4/lib -isystem external/highwayhash -isystem bazel-out/local_linux-py3-opt/genfiles/external/highwayhash -isystem external/jpeg_archive/jpeg-9a -isystem bazel-out/local_linux-py3-opt/genfiles/external/jpeg_archive/jpeg-9a -isystem external/png_archive/libpng-1.2.53 -isystem bazel-out/local_linux-py3-opt/genfiles/external/png_archive/libpng-1.2.53 -isystem external/re2 -isystem bazel-out/local_linux-py3-opt/genfiles/external/re2 -isystem external/eigen_archive -isystem bazel-out/local_linux-py3-opt/genfiles/external/eigen_archive -isystem external/zlib_archive/zlib-1.2.8 -isystem bazel-out/local_linux-py3-opt/genfiles/external/zlib_archive/zlib-1.2.8 -isystem external/local_config_cuda/cuda/include -isystem bazel-out/local_linux-py3-opt/genfiles/external/local_config_cuda/cuda/include -isystem external/local_config_cuda/cuda -isystem bazel-out/local_linux-py3-opt/genfiles/external/local_config_cuda/cuda -fno-exceptions -DEIGEN_AVOID_STL_ARRAY '-DGOOGLE_CUDA=1' -pthread '-DGOOGLE_CUDA=1' -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fno-canonical-system-headers -MD -MF bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/strided_slice_op/tensorflow/core/kernels/strided_slice_op_inst_1.pic.d -c tensorflow/core/kernels/strided_slice_op_inst_1.cc -o bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/strided_slice_op/tensorflow/core/kernels/strided_slice_op_inst_1.pic.o)
INFO: From Compiling tensorflow/core/kernels/spacetodepth_op_gpu.cu.cc:
nvcc warning : option '--relaxed-constexpr' has been deprecated and replaced by option '--expt-relaxed-constexpr'.
nvcc warning : option '--relaxed-constexpr' has been deprecated and replaced by option '--expt-relaxed-constexpr'.
gcc: error trying to exec 'as': execvp: No such file or directory
ERROR: /home/akorsunsky/.local/src/tensorflow/tensorflow/core/kernels/BUILD:1550:1: output 'tensorflow/core/kernels/_objs/depth_space_ops_gpu/tensorflow/core/kernels/spacetodepth_op_gpu.cu.pic.o' was not created.
ERROR: /home/akorsunsky/.local/src/tensorflow/tensorflow/core/kernels/BUILD:1550:1: not all outputs were created.
>>>>> # //tensorflow/core/kernels:strided_slice_op [action 'Compiling tensorflow/core/kernels/strided_slice_op_inst_3.cc']
(cd /home/akorsunsky/.cache/bazel/_bazel_akorsunsky/e5ea3746ee36904dabb4939717fade59/execroot/tensorflow && \
  exec env - \
    PATH=/home/akorsunsky/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections '-std=c++11' '-frandom-seed=bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/strided_slice_op/tensorflow/core/kernels/strided_slice_op_inst_3.pic.o' -fPIC -DHAVE_CONFIG_H -iquote . -iquote bazel-out/local_linux-py3-opt/genfiles -iquote external/protobuf -iquote bazel-out/local_linux-py3-opt/genfiles/external/protobuf -iquote external/bazel_tools -iquote bazel-out/local_linux-py3-opt/genfiles/external/bazel_tools -iquote external/farmhash_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/farmhash_archive -iquote external/gif_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/gif_archive -iquote external/highwayhash -iquote bazel-out/local_linux-py3-opt/genfiles/external/highwayhash -iquote external/jpeg_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/jpeg_archive -iquote external/png_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/png_archive -iquote external/re2 -iquote bazel-out/local_linux-py3-opt/genfiles/external/re2 -iquote external/eigen_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/eigen_archive -iquote external/zlib_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/zlib_archive -iquote external/local_config_cuda -iquote bazel-out/local_linux-py3-opt/genfiles/external/local_config_cuda -isystem external/protobuf/src -isystem bazel-out/local_linux-py3-opt/genfiles/external/protobuf/src -isystem external/bazel_tools/tools/cpp/gcc3 -isystem external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260 -isystem bazel-out/local_linux-py3-opt/genfiles/external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260 -isystem external/gif_archive/giflib-5.1.4/lib -isystem bazel-out/local_linux-py3-opt/genfiles/external/gif_archive/giflib-5.1.4/lib -isystem external/highwayhash -isystem bazel-out/local_linux-py3-opt/genfiles/external/highwayhash -isystem external/jpeg_archive/jpeg-9a -isystem bazel-out/local_linux-py3-opt/genfiles/external/jpeg_archive/jpeg-9a -isystem external/png_archive/libpng-1.2.53 -isystem bazel-out/local_linux-py3-opt/genfiles/external/png_archive/libpng-1.2.53 -isystem external/re2 -isystem bazel-out/local_linux-py3-opt/genfiles/external/re2 -isystem external/eigen_archive -isystem bazel-out/local_linux-py3-opt/genfiles/external/eigen_archive -isystem external/zlib_archive/zlib-1.2.8 -isystem bazel-out/local_linux-py3-opt/genfiles/external/zlib_archive/zlib-1.2.8 -isystem external/local_config_cuda/cuda/include -isystem bazel-out/local_linux-py3-opt/genfiles/external/local_config_cuda/cuda/include -isystem external/local_config_cuda/cuda -isystem bazel-out/local_linux-py3-opt/genfiles/external/local_config_cuda/cuda -fno-exceptions -DEIGEN_AVOID_STL_ARRAY '-DGOOGLE_CUDA=1' -pthread '-DGOOGLE_CUDA=1' -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fno-canonical-system-headers -MD -MF bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/strided_slice_op/tensorflow/core/kernels/strided_slice_op_inst_3.pic.d -c tensorflow/core/kernels/strided_slice_op_inst_3.cc -o bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/strided_slice_op/tensorflow/core/kernels/strided_slice_op_inst_3.pic.o)
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 347.157s, Critical Path: 314.25s

The full log can be accessed here:
https://gist.github.com/akors/dabe16016ba4ccec8fa1f230325f2669

Phhere · 2016-09-09T08:01:42Z

Any update on this? I have the same problem with my local gcc

aehlig · 2016-10-06T12:20:37Z

There are two aspects to that issue.

(a) Passing through environemt variables; this should be solved
with our environment variables design
https://github.com/bazelbuild/bazel/blob/master/site/designs/_posts/2016-06-21-environment.md
which is implemented in 0.3.2 and later. So, to get environment variables available, just add

  build --action_env=PATH
  build --action_env=LD_LIBRARY_PATH
  build --action_env=CC
  build --action_env=CXX

to the appropriate rc-file; you can also add specific values there. In particular, I would
recommend to set PATH in the user-workspace rc file (i.e., the .bazelrc next to the WORKSPACE)
such that the the CC is the first {g,}cc on the search path.

(b) There is the "autoconfiguration" aspect, where bazel tries to be smart and infer properties
of the C compiler to be used. @lberki is a lot more knowledgable about this aspect than I am.

Klaus Aehlig
Google Germany GmbH, Erika-Mann-Str. 33, 80636 Muenchen
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschaeftsfuehrer: Matthew Scott Sucherman, Paul Terence Manicle

lberki · 2016-10-07T14:04:15Z

@aehlig: I'm not :)

I kind of agree with the sentiment that the autodetection should work better, but the smarter it is, the more surprising it is when it doesn't work, so a bit of care needs to be taken.

What do gcc -print-prog-name=as rand gcc -print-prog-name=ld return? If they return the right paths, we could make our autoconfiguration script call that and make sure that those are accessible somehow.

(Note, though, that that command can return non-normalized paths e.g. with .. path segments in it)

akors · 2016-10-07T14:16:08Z

In particular, I would recommend to set PATH in the user-workspace rc file

To make matters unnecessarily complicated, in the case of tensorflow, there are 2 compilers involved: the one that builds the CUDA kernels (used by nvcc), and the one that builds the rest. Those can be separate compilers, if I understand correctly. Not asking for the ultimate solution here, this is just a heads-up.

I kind of agree with the sentiment that the autodetection should work better, but the smarter it is, the more surprising it is when it doesn't work

I gotta say, getting "no such file or directory" errors for binaries that very much exist and are on the PATH, which doesn't happen when running directly without bazel is somewhat surprising to users as well ;)

What do gcc -print-prog-name=as rand gcc -print-prog-name=ld return?

Just tried it out, and that depends on if the values were hardcoded during compiler compilation or not:

$ /opt/gcc-5.3/bin/gcc -print-prog-name=as
/usr/bin/as
$ /opt/gcc-5.3-nh/bin/gcc -print-prog-name=as
as

Here, the first one is with hardcoded paths (configured with --with-as=/usr/bin/as), the second one is configured without that flag.

lberki · 2016-10-07T14:24:14Z

Yeah... unfortunately, passing PATH through to actions also has its own issues. Once upon a time, we came up with a rather complex plan to make this work as best as we can:

http://www.bazel.io/docs/designs/2016/06/21/environment.html

It's essentially a balancing act between actions being reproducible (and thus not dependent on the environment Bazel is run in) and convenience.

akors mentioned this issue Sep 2, 2016

Building TF with custom GCC requires hardcoded ld,nm and as tensorflow/tensorflow#2806

Closed

aehlig added the under investigation label Sep 2, 2016

damienmg assigned philwo and aehlig Oct 6, 2016

This was referenced Oct 15, 2016

gcc: error trying to exec 'as': execvp: No such file or directory tensorflow/tensorflow#4984

Closed

add tensorflow module AndreasMadsen/my-setup#5

Merged

i3v mentioned this issue Dec 7, 2016

failed to install bazel on Red Hat 6.7 #760

Closed

philwo closed this as completed Jul 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building TensorFlow with custom GCC requires hardcoded ld,nm and as #1713

Building TensorFlow with custom GCC requires hardcoded ld,nm and as #1713

akors commented Sep 2, 2016

philwo commented Sep 2, 2016

akors commented Sep 2, 2016

aehlig commented Sep 2, 2016

akors commented Sep 2, 2016 •

edited

Loading

aehlig commented Sep 2, 2016

akors commented Sep 2, 2016

Phhere commented Sep 9, 2016

aehlig commented Oct 6, 2016

lberki commented Oct 7, 2016

akors commented Oct 7, 2016

lberki commented Oct 7, 2016

Building TensorFlow with custom GCC requires hardcoded ld,nm and as #1713

Building TensorFlow with custom GCC requires hardcoded ld,nm and as #1713

Comments

akors commented Sep 2, 2016

philwo commented Sep 2, 2016

akors commented Sep 2, 2016

aehlig commented Sep 2, 2016

akors commented Sep 2, 2016 • edited Loading

aehlig commented Sep 2, 2016

akors commented Sep 2, 2016

Phhere commented Sep 9, 2016

aehlig commented Oct 6, 2016

lberki commented Oct 7, 2016

akors commented Oct 7, 2016

lberki commented Oct 7, 2016

akors commented Sep 2, 2016 •

edited

Loading