Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot make tensorflow to use a local gcc installation? #649

Closed
jesusandres-ferrer opened this issue Nov 24, 2015 · 22 comments
Closed

Cannot make tensorflow to use a local gcc installation? #649

jesusandres-ferrer opened this issue Nov 24, 2015 · 22 comments

Comments

@jesusandres-ferrer
Copy link

Hi,
I am bound to an old installation on a machine. I had to install a local gcc 4.8.1 and thanks to the community I managed to compile bazel (#629) . However, now I am unable to compile tensorflow. This is the error I get:

gcc: unrecognized option '-no-canonical-prefixes'
cc1plus: error: unrecognized command line option "-std=c++11"
cc1plus: warning: unrecognized command line option "-Wno-free-nonheap-object"
ERROR: $HOME/tensorflow/google/protobuf/BUILD:29:1: C++ compilation of rule '//google/protobuf:protobuf_lite' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object ... : com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.

I guess it is not getting the correct gcc, since if I run :

/path/to/local/gcc.4.8.1/bin/gcc -std=c++11  example.cc 

All is ok, but if I run

/usr/bin/gcc  -std=c++11 example.cc

I get the same error"

cc1plus: error: unrecognized command line option "-std=c++11"

If I add the --verbose_failures option, I get this additional information:

(cd $HOME/.cache/bazel/_bazel_$USER/2d4cdeea5be55811d371414ca0f7bd15/tensorflow && \
  exec env - \
    PATH=$HOME/jdk1.8.0_65/bin/:$HOME/local.gcc-4.8.1/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:$HOME/bin \
  third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections '-std=c++11' -iquote . -iquote bazel-out/local_linux-opt/genfiles -iquote external/bazel_tools -iquote bazel-out/local_linux-opt/genfiles/external/bazel_tools -isystem google/protobuf/src -isystem bazel-out/local_linux-opt/genfiles/google/protobuf/src -isystem external/bazel_tools/tools/cpp/gcc3 -DHAVE_PTHREAD -Wall -Wwrite-strings -Woverloaded-virtual -Wno-sign-compare '-Wno-error=unused-function' -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' '-frandom-seed=bazel-out/local_linux-opt/bin/google/protobuf/_objs/protobuf_lite/google/protobuf/src/google/protobuf/wire_format_lite.o' -MD -MF bazel-out/local_linux-opt/bin/google/protobuf/_objs/protobuf_lite/google/protobuf/src/google/protobuf/wire_format_lite.d -c google/protobuf/src/google/protobuf/wire_format_lite.cc -o bazel-out/local_linux-opt/bin/google/protobuf/_objs/protobuf_lite/google/protobuf/src/google/protobuf/wire_format_lite.o)

How can I make it use the correct gcc ? Tensorflow comunity redirected me to bazel comunity :
tensorflow/tensorflow#336

Thanks,

@kchodorow
Copy link
Contributor

Try modifying the CROSSTOOL file: https://github.com/bazelbuild/bazel/blob/master/tools/cpp/CROSSTOOL#L86. This is where Bazel configures how to find gcc.

@jesusandres-ferrer
Copy link
Author

Hi,
in order to compile bazel, I had to modify the CROSSTOOL file as explained here #629 so that bazel compilation gets the correct gcc. Then I copy the output/bazel to my bin. When I try to compile tensor flow, then it fails, trying to get the wrong gcc.
Any Idea ?
thanks,

@kchodorow
Copy link
Contributor

The CROSSTOOL file isn't actually compiled into Bazel. You have to replace /usr/local/lib/bazel/base_workspace/tools/cpp/CROSSTOOL with your custom version for bazel to pick up on the change (and you shouldn't need to recompile Bazel).

@sethbruder
Copy link

Hi casiciaco,

I faced a similar problem.
Like many others, I have updated gcc (4.8.2), JDK (1.8....), and Python (2.7.10) installations in places other than those hard-wired into Bazel and TensorFlow.
I was compiling tensorflow (9c3043ff3bf31a6a81810b4ce9e87ef936f1f529) and protobuf (55ad57a235c009d0414aed1781072adda0c89137).
I also encountered some other miscellaneous issues compiling the TensorFlow -- again, mostly hard-wired environment assumptions.
My solution follows.

I have included this here, in reply to a Bazel issue because the challenges of this build may provide some more general insight into the Bazel build experience for new users outside of the Google ecosystem.

Note: This is just a dump of steps offered in case it saves someone time. I have made no effort to optimize (e.g., factor out cut-and-pasted paths) and I had not previously used Bazel -- my interest was in compiling TensorFlow. No doubt there are much better ways to do this.

Environment

For all steps that followed, I set my PATH and LD_LIBRARY_PATH to reference my desired gcc, JDK, and Python installations.

Whenever I ran Bazel , I ran with the --verbose_failures flag. This is a huge help.

Compiling Bazel

You already did this, but for completeness, here's how I did it.

To fix up the paths hard-coded into Bazel, I ran the script below from the root of the Bazel tarball. This seems to work for both Bazel 0.1.0 and 0.1.1.
You would need to update the /opt/rh... paths.
I was then able to run ./compile.sh .

#!/bin/bash

for file in tools/cpp/CROSSTOOL src/test/java/MOCK_CROSSTOOL; do

    for e in $( ls /opt/rh/devtoolset-2/root/usr/bin ); do
        sed -i 's/\"\/usr\/bin\/'$e'\"/\"\/opt\/rh\/devtoolset-2\/root\/usr\/bin\/'$e'\"/g' $file
    done

    sed -i 's/linker_flag: \"-B\/usr\/bin\/\"/linker_flag: \"-B\/opt\/rh\/devtoolset-2\/root\/usr\/bin\/\"/g' $file
    sed -i 's/cxx_builtin_include_directory: \"\/usr\/lib\/gcc\/\"/cxx_builtin_include_directory: \"\/opt\/rh\/devtoolset-2\/root\/usr\/lib\/gcc\/\"/g' $file
    sed -i 's/cxx_builtin_include_directory: \"\/usr\/local\/include\"/cxx_builtin_include_directory: \"\/opt\/rh\/devtoolset-2\/root\/usr\/local\/include\"/g' $file
    sed -i 's/cxx_builtin_include_directory: \"\/usr\/include\"/cxx_builtin_include_directory: \"\/opt\/rh\/devtoolset-2\/root\/usr\/include\"/g' $file

done

Compiling TensorFlow

Not having experience with Bazel, I found I had to copy the tools directory from the Bazel distribution into the tensorflow and protobuf local repositories (else I got some jdk-related error), appending the content of tools/basel.rc from the tensorflow repository to the analogous file from the Bazel distribution.

I had to again fix up some hard-coded paths, modifying the script used for Bazel:

file=third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc
e=gcc

sed -i "s/'\/usr\/bin\/"$e"'/'\/opt\/rh\/devtoolset-2\/root\/usr\/bin\/"$e"'/g" $file

for file in third_party/gpus/crosstool/CROSSTOOL; do

    for e in $( ls /opt/rh/devtoolset-2/root/usr/bin ); do
        sed -i 's/\"\/usr\/bin\/'$e'\"/\"\/opt\/rh\/devtoolset-2\/root\/usr\/bin\/'$e'\"/g' $file
    done

    sed -i 's/linker_flag: \"-B\/usr\/bin\/\"/linker_flag: \"-B\/opt\/rh\/devtoolset-2\/root\/usr\/bin\/\"/g' $file
    sed -i 's/cxx_builtin_include_directory: \"\/usr\/lib\/gcc\/\"/cxx_builtin_include_directory: \"\/opt\/rh\/devtoolset-2\/root\/usr\/lib\/gcc\/\"/g' $file
    sed -i 's/cxx_builtin_include_directory: \"\/usr\/local\/include\"/cxx_builtin_include_directory: \"\/opt\/rh\/devtoolset-2\/root\/usr\/local\/include\"/g' $file
    sed -i 's/cxx_builtin_include_directory: \"\/usr\/include\"/cxx_builtin_include_directory: \"\/opt\/rh\/devtoolset-2\/root\/usr\/include\"/g' $file

done

(Again, I've not optimized this script, just cut and paste bits from my earlier script with a few mods.)

I also had linker errors requiring references to libm and librt. I overcame these by adding the following lines to third_party/gpus/crosstool/CROSSTOOL:

  linker_flag: "-lrt"
  linker_flag: "-lm"

This seems to be an oversight at least a few others bumped into (see tensorflow/tensorflow#332). Perhaps those libraries are indirectly referenced elsewhere in the preferred Google environment.

I found I had to modify matrix_inverse_op.cc. I'm not sure whether this is a bona fide TensorFlow bug or a compiler version mismatch, but without a fix I had a compile error (without deep investigation the code looks not-obviously-wrong; my quick fix was to do a substitution I would have expected the compiler to do):

-            Matrix::Identity(input.rows(), input.cols()));
+            Eigen::Matrix<Scalar, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor>::Identity(input.rows(), input.cols()));

When building the PIP package, I had to modify tensorflow/python/BUILD and tensorflow/tensorflow.bzl to reference my Python includes. Without this modification, I got one of two errors: (1) "references a path outside of the execution root..." errors; or (2) "No such file or directory ... Python.h ..." errors.
It seems that Bazel requires the python include directory (the one containing Python.h) to be found under the "execution root" (which for me is somewhere under ~/.cache/...). The "execution root" seems to include symbolic links to directories in the tensorflow repository, so I created a symbolic link, inside the tensorflow/third-party directory, to the python include directory. I also include such a link to the numpy include directory. See tensorflow/tensorflow#327 and tensorflow/tensorflow#109.

This produced a working GPU-capable build of TensorFlow.

Hope this helps,
Seth

@jesusandres-ferrer
Copy link
Author

Hi thanks ,
this helped into getting through part of the compilation, but I still have errors when executing the protoc. I think I am really close to be able to compile it.

(cd $HOME/.cache/bazel/_bazel_$USER/2d4cdeea5be55811d371414ca0f7bd15/tensorflow && \
  exec env - \
  bazel-out/host/bin/google/protobuf/protoc '--cpp_out=bazel-out/local_linux-opt/genfiles/' -I. tensorflow/core/example/example.proto tensorflow/core/example/feature.proto tensorflow/core/framework/allocation_description.proto tensorflow/core/framework/attr_value.proto tensorflow/core/framework/config.proto tensorflow/core/framework/device_attributes.proto tensorflow/core/framework/function.proto tensorflow/core/framework/graph.proto tensorflow/core/framework/kernel_def.proto tensorflow/core/framework/op_def.proto tensorflow/core/framework/step_stats.proto tensorflow/core/framework/summary.proto tensorflow/core/framework/tensor.proto tensorflow/core/framework/tensor_description.proto tensorflow/core/framework/tensor_shape.proto tensorflow/core/framework/tensor_slice.proto tensorflow/core/framework/types.proto tensorflow/core/kernels/reader_base.proto tensorflow/core/lib/core/error_codes.proto tensorflow/core/util/event.proto tensorflow/core/util/saved_tensor_slice.proto)

The error is the following:

bazel-out/host/bin/google/protobuf/protoc: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.18' not found (required by bazel-out/host/bin/google/protobuf/protoc)

If I add LD_LIBRARY_PATH after env - it works.
Is there any way to export that ?

@jesusandres-ferrer
Copy link
Author

Hi,
I have been able to hack the problem by using this:

 mv bazel-out/host/bin/google/protobuf/protoc  bazel-out/host/bin/google/protobuf/orig_protoc
cat <<EOF >bazel-out/host/bin/google/protobuf/protoc
#!/bin/bash
LD_FLAGS=... $TENSOR_FLOW_DIR/bazel-out/host/bin/google/protobuf/protoc  bazel-out/host/bin/google/protobuf/orig_protoc "$@"

However, I get another error now :

ERROR: /path/to/tensorflow/tensorflow/core/BUILD:143:1: undeclared inclusion(s) in rule '//tensorflow/core:gpu_runtime':
this rule is missing dependency declarations for the following files included by 'tensorflow/core/common_runtime/gpu/gpu_device.cc':
  '/usr/local/cuda-7.0/include/cuda.h'
  '/usr/local/cuda-7.0/include/cufft.h'
  '/usr/local/cuda-7.0/include/cuComplex.h'
  '/usr/local/cuda-7.0/include/vector_types.h'
  '/usr/local/cuda-7.0/include/builtin_types.h'
  '/usr/local/cuda-7.0/include/device_types.h'
  '/usr/local/cuda-7.0/include/host_defines.h'
  '/usr/local/cuda-7.0/include/driver_types.h'
  '/usr/local/cuda-7.0/include/surface_types.h'
  '/usr/local/cuda-7.0/include/texture_types.h'
  '/usr/local/cuda-7.0/include/cuda_runtime.h'
  '/usr/local/cuda-7.0/include/host_config.h'
  '/usr/local/cuda-7.0/include/channel_descriptor.h'
  '/usr/local/cuda-7.0/include/cuda_runtime_api.h'
  '/usr/local/cuda-7.0/include/cuda_device_runtime_api.h'
  '/usr/local/cuda-7.0/include/driver_functions.h'
  '/usr/local/cuda-7.0/include/vector_functions.h'
  '/usr/local/cuda-7.0/include/vector_functions.hpp'.
Target //tensorflow/cc:tutorials_example_trainer failed to build

Any ideas of what is happening ?

Thanks !!

@damienmg
Copy link
Contributor

damienmg commented Dec 2, 2015

Yes you are using a recent version of Bazel and we are fixing those bug in the tensorflow build. You can workaround by using bazel 0.1.1

@jesusandres-ferrer
Copy link
Author

Hi,
I don't know why I cannot compile 0.1.1 bazel version. I do the following:

  1. Edit CROSSTOOL files as suggested above and in the files
  2. Add -Wl,-rpath,$HOME/local/lib64 to linker_options in CROSSTOLL and LDFLAGS
  3. Make a gcc_wrapper.sh as discussed here Cannot compile bazel to use a local gcc #629

Then I get the following error:

$CACHE/bazel-0.1.1/_bin/build-runfiles: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14' not found (required by $CACHE/bazel-0.1.1/_bin/build-runfiles )

executing the code below:

 (cd $HOME/.cache/bazel/_bazel_$USER/837720c8406857ec50d4aa182a9ab224/bazel-0.1.1 &&   exec env -   $HOME/.cache/bazel/_bazel_$USER/837720c8406857ec50d4aa182a9ab224/bazel-0.1.1/_bin/build-runfiles bazel-out/local_linux-fastbuild/bin/src/main/java/bazel-main.runfiles_manifest bazel-out/local_linux-fastbuild/bin/src/main/java/bazel-main.runfiles)
$HOME/.cache/bazel/_bazel_$USER/837720c8406857ec50d4aa182a9ab224/bazel-0.1.1/_bin/build-runfiles: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14' not found (required by$HOME/.cache/bazel/_bazel_$USER/837720c8406857ec50d4aa182a9ab224/bazel-0.1.1/_bin/build-runfiles)

Can you help ?

@sethbruder
Copy link

Somebody with Bazel experience will be able to give you more guidance on a fix, but presumably, if build-runfiles is an ELF binary, then an env - ldd .../build-runfiles will not find the desired version of libstdc++.so.6 (the one defining the GLIBCXX... version being sought). So, barring some more specific advice, you could: (1) look at the command that compiles and links build-runfiles to make sure your ... -rpath ... path options are getting through to the compiler and thus to the linker; (2) check that a libstdc++... in that path actually defines the desired version (say, with objdump -x).

(You may also want to look at whether you need such a recent libstdc++ at all. My non-default gcc 4.8.2 from devtoolset-2 uses my system libstdc++; to build with existing libraries, you could look at what devtoolset-2 is doing.)

Good luck!
Seth

@damienmg
Copy link
Contributor

@casiciaco did @sethbruder answer helped? Do you still need help?

@ffmpbgrnn
Copy link

Hi @damienmg , I also struggling to compile TF on Centos 6.6. I managed to build bazel, but when use bazel to build TF, I got the following error:

/home/ubuntu/.cache/bazel/_bazel_panpingbo/d5c6b2c9646c7abc2a20991eb90f71f1/tensorflow/_bin/build-runfiles: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14' not found (required by /home/panpingbo/.cache/bazel/_bazel_ubuntu/d5c6b2c9646c7abc2a20991eb90f71f1/tensorflow/_bin/build-runfiles)

I met the similar problem when compiling bazel, the way is to pass -Wl,-rpath,/home/local/lib64 when compile build-runfiles. What should I do when build TF? Thanks.

@ffmpbgrnn
Copy link

btw, I'm with bazel 0.1.2

@weijianwen
Copy link

@ffmpbgrnn I'll try and let you know. Perhaps it will take me 2~3 workdays due to too many chores at the end of the year.

@ffmpbgrnn
Copy link

Hi @damienmg , when bazel build tensorflow, which line generates build-runfiles?

@jesusandres-ferrer
Copy link
Author

Hi,
I still have the same problem, so no success.

@ffmpbgrnn
Copy link

Hi, I solved the runtime linking problem by simply adding

   linker_flag: "-L/home/user/clibs/lib64"
   linker_flag: "-lstdc++"
   linker_flag: "-Wl,-rpath,/home/user/clibs/lib64"

to third_party/gpus/crosstool/CROSSTOOL, hope it helps.

@hanwen
Copy link
Contributor

hanwen commented Jan 20, 2016

ping ? Does this still need attention?

@jesusandres-ferrer
Copy link
Author

Actually, you also need to modify the compile.sh
and ad at the beggining

export LDFLAGS="-Wl,-rpath,/path/to/your/lib64"

You can also add there your exports for CC CXX JAVA_HOME, etc.

With that I was able to compile bazel.0.1.1

@davenso
Copy link

davenso commented Apr 7, 2016

@ffmpbgrnn

In what part of the CROSSTOOL file did you add the following code

linker_flag: "-L/home/user/clibs/lib64"
linker_flag: "-lstdc++"
linker_flag: "-Wl,-rpath,/home/user/clibs/lib64"

And should existing linker_flag's in the file be removed?

@kchodorow
Copy link
Contributor

You can add them anywhere, it looks like -lstdc++ is already there (https://github.com/tensorflow/tensorflow/blob/master/third_party/gpus/crosstool/CROSSTOOL#L53). You shouldn't need to remove any flags unless they don't work for you.

@i3v
Copy link

i3v commented Dec 7, 2016

The link provided by @kchodorow seem to be dead. I'd like to add an updated link, (even though, of course, it is still easy to understand where this file is from the link itself). Also, I'd like to mention, that this modification might be not the only one you need.

@JackZ0
Copy link

JackZ0 commented Apr 18, 2017

NFO: You can skip this first step by providing a path to the bazel binary as second argument:
INFO: ./compile.sh compile /path/to/bazel
🍃 Building Bazel from scratch.......
🍃 Building Bazel with Bazel.
.WARNING: /tmp/bazel_W3ILy3qg/out/external/bazel_tools/WORKSPACE:1: Workspace name in /tmp/bazel_W3ILy3qg/out/external/bazel_tools/WORKSPACE (@io_bazel) does not match the name given in the repository's definition (@bazel_tools); this will cause a build error in future versions.
INFO: Found 1 target...
INFO: From Compiling src/main/tools/build-runfiles.cc:
src/main/tools/build-runfiles.cc: In member function 'void RunfilesCreator::EnsureDirReadAndWritePerms(const std::string&)':
src/main/tools/build-runfiles.cc:359: warning: comparison between signed and unsigned integer expressions
At global scope:
cc1plus: warning: unrecognized command line option "-Wno-free-nonheap-object"
ERROR: /home/TF/bazel/src/main/tools/BUILD:19:1: C++ compilation of rule '//src/main/tools:linux-sandbox' failed: process-wrapper failed: error executing command
(cd /tmp/bazel_W3ILy3qg/out/bazel-sandbox/6ebe2a0d-e078-4e44-8867-65ed0cf8810d-21/execroot/bazel &&
exec env -
PATH=/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/arm/usr/local/arm/arm-none-linux-gnueabi/bin:/home/jack/gdb-build/bin:/root/bin
/tmp/bazel_W3ILy3qg/out/execroot/bazel/bin/process-wrapper -1 5 - - /usr/bin/gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -Wl,-z,-relro,-z,now -B/usr/bin -B/usr/bin -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections '-std=c++0x' -MD -MF bazel-out/local-opt/bin/src/main/tools/objs/linux-sandbox/src/main/tools/linux-sandbox.d '-frandom-seed=bazel-out/local-opt/bin/src/main/tools/objs/linux-sandbox/src/main/tools/linux-sandbox.o' -iquote . -iquote bazel-out/local-opt/genfiles -iquote external/bazel_tools -iquote bazel-out/local-opt/genfiles/external/bazel_tools -isystem external/bazel_tools/tools/cpp/gcc3 -Wno-builtin-macro-redefined '-D__DATE="redacted"' '-D__TIMESTAMP_="redacted"' '-D__TIME__="redacted"' -c src/main/tools/linux-sandbox.cc -o bazel-out/local-opt/bin/src/main/tools/_objs/linux-sandbox/src/main/tools/linux-sandbox.o).
src/main/tools/linux-sandbox.cc: In function 'void CloseFds()':
src/main/tools/linux-sandbox.cc:104: error: 'nullptr' was not declared in this scope
At global scope:
cc1plus: warning: unrecognized command line option "-Wno-free-nonheap-object"
Target //src:bazel failed to build
INFO: Elapsed time: 9.075s, Critical Path: 2.30s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants