Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--action_env not passed to all spawn strategies #11163

Closed
deven-amd opened this issue Apr 20, 2020 · 5 comments
Closed

--action_env not passed to all spawn strategies #11163

deven-amd opened this issue Apr 20, 2020 · 5 comments
Labels
P2 We'll consider working on this in future. (Assignee optional) stale Issues or PRs that are stale (no activity for 30 days) team-Local-Exec Issues and PRs for the Execution (Local) team type: bug

Comments

@deven-amd
Copy link

ATTENTION! Please read and follow:

  • if this is a question about how to build / test / query / deploy using Bazel, or a discussion starter, send it to bazel-discuss@googlegroups.com
  • if this is a bug or feature request, fill the form below as best as you can.

Description of the problem / feature request:

When I build TF with --config=rocm, the settings from the .tf_configure.bazelrc (which is included via .bazelrc, do not get passed through to some of the subcommands issued.

Feature requests: what underlying problem are you trying to solve with this feature?

In TF build, the .tf_configure.bazelrc is used to set (amongst other things) action_envs needed to execute the sub commands correctly (i.e. in my builds, the need to set ROCM_PATH correrctly). When these envs are not set, it can lead to build errors. Mind you this issue is not limited to action_envs. Other settings in the .tf_configure.bazelrc like compiler flags also do not get passed through, leading to incorrectly built .os

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

This is the tricky part. Easiest way to reproduce this is to do Tensorflow build with --configure=rocm and the bazel -s option specified, and then scan through the output to look for this issue

What operating system are you running Bazel on?

Ubuntu 16.04

What's the output of bazel info release?

release 3.0.0

If bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.

Replace this line with your answer.

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

Replace this line with your answer.

Have you found anything relevant by searching the web?

No

Any other information, logs, or outputs that you want to share?

I will attach a tarball that contains an example of this issue. It will have the files

root@ixt-rack-04:/common# tar -zcvf bazel_bug.tar.gz bazel_bug/
bazel_bug/
bazel_bug/build_rocm_python3
bazel_bug/build_rocm.log
bazel_bug/find_bug.py
bazel_bug/.tf_configure.bazelrc

The build_rocm_python3 is the script used to do the TF build
The build_rocm.log is the resulting log file
The .tf_configure.bazelrc is the file that contains the action_env settings
The find_bug.py is a simple python script to scan the log file and dump the first instance of the bug/issue

bazel_bug.tar.gz

@deven-amd
Copy link
Author

/cc @chsigg

@jin
Copy link
Member

jin commented Apr 30, 2020

Thanks for the detailed repro, I see that the ROCM_* env variables aren't probably configured into the execution environment for a whole bunch of actions in the build, even though --action_env is expected to do that.

I don't have ROCM installed, so I'm just trying with the default .tf_configure.bazelrc with a custom FOOBAR env var. With aquery:

$ cat .tf_configure.bazelrc
build --action_env PYTHON_BIN_PATH="/usr/bin/python3"
build --action_env PYTHON_LIB_PATH="/usr/local/lib/python3.7/dist-packages"
build --python_path="/usr/bin/python3"
build --config=xla
build:opt --copt=-march=native
build:opt --copt=-Wno-sign-compare
build:opt --host_copt=-march=native
build:opt --define with_default_optimizations=true
build --action_env=FOOBAR=BARFOO
test --flaky_test_attempts=3
test --test_size_filters=small,medium
test:v1 --test_tag_filters=-benchmark-test,-no_oss,-gpu,-oss_serial
test:v1 --build_tag_filters=-benchmark-test,-no_oss,-gpu
test:v2 --test_tag_filters=-benchmark-test,-no_oss,-gpu,-oss_serial,-v1only
test:v2 --build_tag_filters=-benchmark-test,-no_oss,-gpu,-v1only
build --action_env TF_CONFIGURE_IOS="0"
$ bazel aquery 'deps(//tensorflow/tools/pip_package:build_pip_package)' > aquery.out

$ cat aquery.out | grep "Environment:.*" | head -n 5
  Environment: [FOOBAR=BARFOO, PYTHON_BIN_PATH=/usr/bin/python3, PYTHON_LIB_PATH=/usr/local/lib/python3.7/dist-packages, TF2_BEHAVIOR=1, TF_CONFIGURE_IOS=0, TF_ENABLE_XLA=1]
  Environment: [FOOBAR=BARFOO, PYTHON_BIN_PATH=/usr/bin/python3, PYTHON_LIB_PATH=/usr/local/lib/python3.7/dist-packages, TF2_BEHAVIOR=1, TF_CONFIGURE_IOS=0, TF_ENABLE_XLA=1]
  Environment: [FOOBAR=BARFOO, PYTHON_BIN_PATH=/usr/bin/python3, PYTHON_LIB_PATH=/usr/local/lib/python3.7/dist-packages, TF2_BEHAVIOR=1, TF_CONFIGURE_IOS=0, TF_ENABLE_XLA=1]
  Environment: [FOOBAR=BARFOO, PYTHON_BIN_PATH=/usr/bin/python3, PYTHON_LIB_PATH=/usr/local/lib/python3.7/dist-packages, TF2_BEHAVIOR=1, TF_CONFIGURE_IOS=0, TF_ENABLE_XLA=1]
  Environment: [FOOBAR=BARFOO, PYTHON_BIN_PATH=/usr/bin/python3, PYTHON_LIB_PATH=/usr/local/lib/python3.7/dist-packages, TF2_BEHAVIOR=1, TF_CONFIGURE_IOS=0, TF_ENABLE_XLA=1]

$ cat aquery.out | grep -e "action '" | wc -l
57059 # 57k actions

$ cat aquery.out | grep "Environment:.*" | wc -l
17551 # 17 of them have the custom environment

$ cat aquery.out | grep "Environment:.*FOOBAR=BARFOO" | wc -l
17551

$ cat aquery.out | grep "Environment:.*" | grep -v "FOOBAR" | wc -l
0

This suggests that the following is true, and that some (spawn) actions do not run with the set of 5 custom action_env variables.

Other settings in the .tf_configure.bazelrc like compiler flags also do not get passed through, leading to incorrectly built .os

@jin jin added team-Local-Exec Issues and PRs for the Execution (Local) team untriaged labels Apr 30, 2020
@susinmotion susinmotion added P1 I'll work on this now. (Assignee required) type: bug and removed untriaged labels May 4, 2020
@jmmv jmmv changed the title .bazelrc settings not getting passed through (sometimes) to subcommands for TF builds --action_env not passed to all spawn strategies May 14, 2020
@meisterT meisterT added P2 We'll consider working on this in future. (Assignee optional) and removed P1 I'll work on this now. (Assignee required) labels Dec 8, 2020
@meisterT
Copy link
Member

meisterT commented Dec 8, 2020

cc @larsrc-google

@github-actions
Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 2+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Apr 18, 2023
@github-actions
Copy link

github-actions bot commented May 4, 2023

This issue has been automatically closed due to inactivity. If you're still interested in pursuing this, please reach out to the triage team (@bazelbuild/triage). Thanks!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) stale Issues or PRs that are stale (no activity for 30 days) team-Local-Exec Issues and PRs for the Execution (Local) team type: bug
Projects
None yet
Development

No branches or pull requests

4 participants