Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tfcompile fails to build #29

Open
jchia opened this issue Jun 12, 2021 · 0 comments
Open

tfcompile fails to build #29

jchia opened this issue Jun 12, 2021 · 0 comments

Comments

@jchia
Copy link

jchia commented Jun 12, 2021

System information

  • Docker image: nvcr.io/nvidia/tensorflow:21.05-tf1-py3
  • Linux Ubuntu 20.04
  • TensorFlow installed from: NA (build problem, not installation problem)
  • TensorFlow version: 1.15.5
  • Python version: 3.8.5
  • Installed using virtualenv? pip? conda?: NA
  • Bazel version (if compiling from source): 0.24.1
  • GCC/Compiler version (if compiling from source): 9.3.0
  • CUDA/cuDNN version: ?? (whatever is in the docker image)
  • GPU model and memory: 1080Ti

Describe the problem
From within a docker container running nvcr.io/nvidia/tensorflow:21.05-tf1-py3, tfcompile fails to build:

$ root@48f2340d016b:/opt/tensorflow/tensorflow-source# bazel build --config=opt --config=cuda //tensorflow/compiler/aot:tfcompile
...
INFO: Analysed target //tensorflow/compiler/aot:tfcompile (124 packages loaded, 11185 targets configured).
INFO: Found 1 target...
ERROR: /opt/tensorflow/tensorflow-source/tensorflow/compiler/aot/BUILD:190:1: C++ compilation of rule '//tensorflow/compiler/aot:embedded_protocol_buffers' failed (Exit 1)
tensorflow/compiler/aot/embedded_protocol_buffers.cc: In function ‘xla::StatusOr<std::__cxx11::basic_string<char> > tensorflow::tfcompile::CodegenModule(llvm::TargetMachine*, std::unique_ptr<llvm::Module>)’:
tensorflow/compiler/aot/embedded_protocol_buffers.cc:85:32: error: ‘CGFT_ObjectFile’ is not a member of ‘llvm::TargetMachine’
   85 |           llvm::TargetMachine::CGFT_ObjectFile)) {
      |                                ^~~~~~~~~~~~~~~
Target //tensorflow/compiler/aot:tfcompile failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 14.014s, Critical Path: 9.96s
INFO: 171 processes: 171 local.
FAILED: Build did NOT complete successfully

I have modified the .tf_configure.bazelrc but I think the changes are irrelevant to the failure:

build --action_env PYTHON_BIN_PATH="/usr/bin/python3.8"
build --action_env PYTHON_LIB_PATH="/usr/local/lib/python3.8/dist-packages"
build --python_path="/usr/bin/python3.8"
build:xla --define with_xla_support=true
build --config=xla
build --action_env TF_USE_CCACHE="0"
build --copt=-march=haswell
build:opt --define with_default_optimizations=true
build:v2 --define=tf_api_version=2
test --flaky_test_attempts=3
test --test_size_filters=small,medium
test --test_tag_filters=-benchmark-test,-no_oss,-oss_serial
test --build_tag_filters=-benchmark-test,-no_oss
test --test_tag_filters=-gpu
test --build_tag_filters=-gpu
build --action_env TF_CONFIGURE_IOS="0"

I believe nvidia-tensorflow has some llvm-related changes wrt to upstream but maybe the focus was on getting Python tensorflow working with Nvidia hardware without attention to less commonly-used parts like tfcompile. This failure looks like llvm code not being right for the llvm version. Upstream 1.15.5 builds with no problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant