Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

action_env variables (sometimes?) ignored in cc_library rules #12059

Open
Flamefire opened this issue Sep 7, 2020 · 12 comments
Open

action_env variables (sometimes?) ignored in cc_library rules #12059

Flamefire opened this issue Sep 7, 2020 · 12 comments
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) stale Issues or PRs that are stale (no activity for 30 days) team-Rules-CPP Issues for C++ rules type: bug

Comments

@Flamefire
Copy link
Contributor

Description of the problem / feature request:

I'm setting some env variables via action_env but they are sometimes ignored when building a library.

Feature requests: what underlying problem are you trying to solve with this feature?

In this specific case I'm trying to build TensorFlow 2.3.0 with Bazel 3.4.1 and am setting CPATH via action_env but the compilation fails to find some headers and by inspecting the output I see CPATH not being passed to all compiler invocations.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Build TensorFlow according to instructions and set --action_env=CPATH=/foo and observe the output

Any other information, logs, or outputs that you want to share?

I observed the issue for tensorflow/python/tools/BUILD:226:10 C++ compilation of rule '//tensorflow/core/platform/default:mutex'. The definition for that is at https://github.com/tensorflow/tensorflow/blob/v2.3.0/tensorflow/core/platform/default/BUILD#L206:

cc_library(
    name = "mutex",
    srcs = [
        "mutex.cc",
        "mutex_data.h",
    ],
    hdrs = ["//tensorflow/core/platform:mutex.h"],
    tags = [
        "manual",
        "no_oss",
        "nobuilder",
    ],
    textual_hdrs = ["mutex.h"],
    deps = [
        "//tensorflow/core/platform",
        "//tensorflow/core/platform:macros",
        "//tensorflow/core/platform:thread_annotations",
        "//tensorflow/core/platform:types",
        "@nsync//:nsync_cpp",
    ],
)

Log output:

  (cd /tmp/easybuild-tmp/eb-tHWBp_/tmphugHLy-bazel-build/execroot/org_tensorflow && \
  exec env - \
    LD_LIBRARY_PATH=<redacted> \
    PATH=<redacted> \
    PWD=/proc/self/cwd \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/ppc-opt-exec-50AE0418/bin/tensorflow/core/platform/default/_objs/mutex/mutex.d '-frandom-seed=bazel-out/ppc-opt-exec-50AE0418/bin/tensorflow/core/platform/default/_objs/mutex/mutex.o' -D__CLANG_SUPPORT_DYN_ANNOTATION__ -iquote . -iquote bazel-out/ppc-opt-exec-50AE0418/bin -iquote external/com_google_absl -iquote bazel-out/ppc-opt-exec-50AE0418/bin/external/com_google_absl -iquote external/nsync -iquote bazel-out/ppc-opt-exec-50AE0418/bin/external/nsync -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fPIE -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes -fno-canonical-system-headers -DNDEBUG -g0 -O2 -ffunction-sections -fdata-sections -g0 -g0 '-std=c++14' -c tensorflow/core/platform/default/mutex.cc -o bazel-out/ppc-opt-exec-50AE0418/bin/tensorflow/core/platform/default/_objs/mutex/mutex.o)
Execution platform: @local_execution_config_platform//:platform

As you can see only the default env vars are set but none of my action_envs (I set more than CPATH)

As the cc_library rule invocation is as simple as can be and I see that with other such invocations too (e.g. tensorflow/tensorflow#37861 (comment)) I assume a bug in Bazel or a misunderstanding of the rules by the TF team or me. In that case I'd kindly ask for clarification.

I've also checked that with TF 2.1.0 and Bazel 0.29.1 this was not an issue but worked as it should for the same file (at least the rule invocation was not changed)

However for other compilations it seems to work. E.g. for SUBCOMMAND: # @com_google_absl//absl/strings:strings [action 'Compiling external/com_google_absl/absl/strings/internal/memutil.cc', configuration: cab848d308c51b34c791977a9ba0d73e541cabe37f3e6da7b163b34d8bf29b6b, execution platform: @local_execution_config_platform//:platform] I see all my action_env variables being passed and that is also added with cc_library: https://github.com/abseil/abseil-cpp/blob/df3ea785d8c30a9503321a3d35ee7d35808f190d/absl/strings/BUILD.bazel#L30

What could be the reason for that?

@keith
Copy link
Member

keith commented Sep 8, 2020

Potentially related to #12049

@Flamefire
Copy link
Contributor Author

In examining the output I've seen that SUBCOMMAND: # <name> [action 'Compiling <cc>', configuration: <hash>] shows differing values for "configuration". For e.g. "5145bac2d59ecd1d49c139f258a492145fb9a97392eff5f5e72d0b94901ca2a6" all action_env values are there but for "10b8e9fd33cf42203bb848e842763cc669d976ce3de802a9b8c5cf443259dc2c" only the minimal env is used.

Does this help? What is this "configuration" and how is it influenced?

@benjaminp
Copy link
Collaborator

Likely to be #4008.

@Flamefire
Copy link
Contributor Author

I'm not sure it is the same because in my case it is not a genrule that fails but a cc_library. But let's see: Building TF with 1 process only it fails on '//tensorflow/cc:cc_op_gen_main' which is

cc_library(
    name = "cc_op_gen_main",
    srcs = [
        "framework/cc_op_gen.cc",
        "framework/cc_op_gen.h",
        "framework/cc_op_gen_main.cc",
    ],
    ...
)

That is only used by tf_gen_op_wrapper_cc:

def tf_gen_op_wrapper_cc(
        name,
        out_ops_file,
        pkg = "",
        op_gen = clean_dep("//tensorflow/cc:cc_op_gen_main"),
        deps = None,
        include_internal_ops = 0,
        # ApiDefs will be loaded in the order specified in this list.
        api_def_srcs = []):
    # Construct an op generator binary for these ops.
    tool = out_ops_file + "_gen_cc"
    if deps == None:
        deps = [pkg + ":" + name + "_op_lib"]
    tf_cc_binary(
        name = tool,
        copts = tf_copts(),
        linkopts = if_not_windows(["-lm", "-Wl,-ldl"]) + lrt_if_needed(),
        linkstatic = 1,  # Faster to link this one-time-use binary dynamically
        deps = [op_gen] + deps,
    )
    ...

So there it is in deps not tool. Not sure how to go further up the call chain. Can I make Bazel print what it is currently following?

@mihaimaruseac
Copy link

I think you can use -s?

@Flamefire
Copy link
Contributor Author

Flamefire commented Sep 11, 2020

That is --subcommands, yes. Already using it and lead me to seeing where the action_env is missing. Now I would like to know the rule invocation chain to see where it might be coming from and why the action_env is missing there.

I noticed that for SUBCOMMAND: # //tensorflow/core/framework:summary_proto_genproto [action 'ProtoCompile tensorflow/core/framework/summary.pb.cc', configuration: 8b4acdbe969a46c4d3fa5723b9b1930adc1b1ea5a72f6ca24e87d24e41e941cc, execution platform: @local_execution_config_platform//:platform] the env is also missing. According to #4008 you could add use_default_shell_env = True, but that is already there. So maybe not that issue?

Edit: My best trace so far is the change in "configuration" I mentioned in #12059 (comment)
Dependencies like llvm-project are compiled with action_env and the working configuration while the failing ones use another hash. What is that configuration? When/How does it change? How can I find that in the TF source?

@Flamefire
Copy link
Contributor Author

Ok, it looks like you were on the right track with #4008. I bisected the TF sources between 2.2 and 2.3 to find the commit which introduced the breaking changed and found it to be this one: tensorflow/tensorflow@f827c02 The parent commit 3006330ea0c3e88651195ac7c7de654291377ebb works. However in that case it isn't "tools" but "exec_tools" where the former worked.

It would be awesome if there could be some quick fix so action_env variables are passed through to dependencies build through exec_tools dependency

@mihaimaruseac
Copy link

Is still something we need to work on?

@Flamefire
Copy link
Contributor Author

Yes. As you noticed in tensorflow/tensorflow#43156 (comment) it isn't clear why and when tools or exec_tools are to be used and if exec_tools would be correct then there needs to be some way to pass environment variables through to it. Otherwise it will be impossible to build e.g. TF on some systems

@mihaimaruseac
Copy link

It's possible we no longer need exec_tools, it was an intermediate step in removing Py2 stuff. We can start making PRs to remove that and see what breaks/works.

Flamefire added a commit to Flamefire/tensorflow that referenced this issue Nov 16, 2020
Follow up to tensorflow#43156
Based on
bazelbuild/bazel#12059 (comment)
exec_tools might no longer be needed and hence can be replaced by tools.
This fixes various build failures caused by missing environment
variables in environments where they are required, e.g. using custom
compilers.
@oquenchil oquenchil added P3 We're not considering working on this, but happy to review a PR. (No assignee) type: bug and removed untriaged labels Nov 19, 2020
geetachavan1 pushed a commit to geetachavan1/tensorflow that referenced this issue Nov 23, 2020
Follow up to tensorflow#43156
Based on
bazelbuild/bazel#12059 (comment)
exec_tools might no longer be needed and hence can be replaced by tools.
This fixes various build failures caused by missing environment
variables in environments where they are required, e.g. using custom
compilers.
geetachavan1 pushed a commit to geetachavan1/tensorflow that referenced this issue Nov 23, 2020
Follow up to tensorflow#43156
Based on
bazelbuild/bazel#12059 (comment)
exec_tools might no longer be needed and hence can be replaced by tools.
This fixes various build failures caused by missing environment
variables in environments where they are required, e.g. using custom
compilers.
serach24 pushed a commit to serach24/Arbitor that referenced this issue Jun 5, 2021
Follow up to tensorflow#43156
Based on
bazelbuild/bazel#12059 (comment)
exec_tools might no longer be needed and hence can be replaced by tools.
This fixes various build failures caused by missing environment
variables in environments where they are required, e.g. using custom
compilers.
@Flamefire
Copy link
Contributor Author

This seems to be still an issue with Bazel 4.2.2, noticed with TF 2.8.4 when using CUDA where sometimes the --action_envs get ignored and compilation fails to find headers. E.g.:

Execution platform: @local_execution_config_platform//:platform
In file included from ./tensorflow/core/platform/status.h:32,
                 from ./tensorflow/c/tf_status_helper.h:20,
                 from tensorflow/c/env.cc:20:
bazel-out/ppc-opt-exec-50AE0418/bin/tensorflow/core/protobuf/error_codes.pb.h:10:10: fatal error: google/protobuf/port_def.inc: No such file or directory
   10 | #include <google/protobuf/port_def.inc>

This is really annoying especially as it is pretty much impossible to debug and/or isolate

Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Feb 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) stale Issues or PRs that are stale (no activity for 30 days) team-Rules-CPP Issues for C++ rules type: bug
Projects
None yet
Development

No branches or pull requests

6 participants