Running shell tests with `bazel run` intermittently omits outputs #17754

crydell-ericsson · 2023-03-13T15:21:49Z

Description of the bug:

We have a use case wherein we want to pipe the output of a Bazel test target implemented with a shell script to an external tool. There are certain parts of the output that are essential to pass to the external tool. In addition, we want to be able to run this target with bazel run in addition to bazel test.

Assuming that the target is named :foo and that the external tool is named my_tool, we can do this with the command:

$ bazel run :foo | my_tool

This usually works, but my_tool will intermittently not receive the full intended output from the :foo run. The failures seem to be caused by a race condition that sometimes results in subprocesses being killed before they have had the opportunity to write to stdout. This problem seems to have been introduced by the commit 9051faa.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Create a new workspace with the files:

# File: BUILD

sh_test(
  name = "foo",
  srcs = ["foo.sh"],
)

# File: foo.sh

echo "useless echo"
echo "useful echo"
exit 0

Then, try to grep "useful echo" from the script output repeatedly until failure. This can for example be done with:

$ while bazel run :foo 2> /dev/null | grep "useful echo" > /dev/null ; do echo -n "." ; done && echo "Failed to grep 'useful echo'"

Alternatively, it's faster to use the --script_path option to create a separate script for running the test.

$ bazel run :foo --script_path=foo_script.sh
$ while ./foo_script.sh 2> /dev/null | grep "useful echo" > /dev/null ; do echo -n "." ; done && echo "Failed to grep 'useful echo'"

We have found that this usually fails within 1000 iterations, but under heavy load from other processes on the machine it been observed to go below 50.

Which operating system are you running Bazel on?

Red Hat Enterprise Linux Server 7.9

What is the output of `bazel info release`?

No response

If `bazel info release` returns `development version` or `(@non-git)`, tell us how you built Bazel.

No response

What's the output of `git remote get-url origin; git rev-parse master; git rev-parse HEAD` ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

The text was updated successfully, but these errors were encountered:

meisterT · 2023-03-14T10:25:43Z

Can you please try this with bazel from head? We recently fixed issues with bazel run that sound similar.

crydell-ericsson · 2023-03-14T12:56:40Z

Can you please try this with bazel from head? We recently fixed issues with bazel run that sound similar.

I do not currently have an environment set up where I can build the latest head, but I have tentatively tested the following:

Create a script with bazel run :foo --script_path=foo_script.sh -- this was done with release 7.0.0-pre.20230302.1
Replace the path to test-setup.sh on the last line of the generated script with a path to the test-setup.sh in the source repository of head (dbb09c9)
Run the same command as in the original issue, i.e. while ./foo_script.sh 2> /dev/null | grep "useful echo" > /dev/null ; do echo -n "." ; done && echo "Failed to grep 'useful echo'"

This still fails as described in the issue. While foo_script.sh was not built with a Bazel built from head, it does use the test-setup.sh found on head, which I reckon is where the problem is. I can try to investigate this further once I have an environment where I can build Bazel from head set up.

crydell-ericsson · 2023-03-16T09:59:36Z

I have now tried this with a Bazel built from head (8a23169) and can verify that the problem still exists.

Fixes #17754. What we have seen prior to this change was that sometimes for quick tests the output was swallowed. After a lot of poking it became clear that the culprit is the use of subshell and `tee`, e.g. if you remove `tee` completely from the picture the behavior never shows up. The issue is that with a fast test, `tee` seems to be killed (or its parent subshell) before the printing the output to stdout. With this change, we reduce the number of subshells and processes to set up and reduce the chance of the race condition but not remove it. However, for practical purposes, the race condition is gone. With the reproduction steps in #17754, and this command ``` for i in {1..10000}; do /tmp/bazel run :foo &> /tmp/log ; grep -q "useful echo" /tmp/log ; if [ $? -eq 0 ]; then echo -n '+'; else echo -n '-'; fi; done ``` a bazel from head fails ~3900 out of 10000 times. After this commit, it never failed.

Fixes bazelbuild#17754. What we have seen prior to this change was that sometimes for quick tests the output was swallowed. After a lot of poking it became clear that the culprit is the use of subshell and `tee`, e.g. if you remove `tee` completely from the picture the behavior never shows up. The issue is that with a fast test, `tee` seems to be killed (or its parent subshell) before the printing the output to stdout. With this change, we reduce the number of subshells and processes to set up and reduce the chance of the race condition but not remove it. However, for practical purposes, the race condition is gone. With the reproduction steps in bazelbuild#17754, and this command ``` for i in {1..10000}; do /tmp/bazel run :foo &> /tmp/log ; grep -q "useful echo" /tmp/log ; if [ $? -eq 0 ]; then echo -n '+'; else echo -n '-'; fi; done ``` a bazel from head fails ~3900 out of 10000 times. After this commit, it never failed. Closes bazelbuild#17846. PiperOrigin-RevId: 518794237 Change-Id: I8c1862d3a274799b864f0f5f42b85d6df5af78c7

) Fixes #17754. What we have seen prior to this change was that sometimes for quick tests the output was swallowed. After a lot of poking it became clear that the culprit is the use of subshell and `tee`, e.g. if you remove `tee` completely from the picture the behavior never shows up. The issue is that with a fast test, `tee` seems to be killed (or its parent subshell) before the printing the output to stdout. With this change, we reduce the number of subshells and processes to set up and reduce the chance of the race condition but not remove it. However, for practical purposes, the race condition is gone. With the reproduction steps in #17754, and this command ``` for i in {1..10000}; do /tmp/bazel run :foo &> /tmp/log ; grep -q "useful echo" /tmp/log ; if [ $? -eq 0 ]; then echo -n '+'; else echo -n '-'; fi; done ``` a bazel from head fails ~3900 out of 10000 times. After this commit, it never failed. Closes #17846. PiperOrigin-RevId: 518794237 Change-Id: I8c1862d3a274799b864f0f5f42b85d6df5af78c7 Co-authored-by: Tobias Werth <twerth@google.com>

Fixes bazelbuild#17754. What we have seen prior to this change was that sometimes for quick tests the output was swallowed. After a lot of poking it became clear that the culprit is the use of subshell and `tee`, e.g. if you remove `tee` completely from the picture the behavior never shows up. The issue is that with a fast test, `tee` seems to be killed (or its parent subshell) before the printing the output to stdout. With this change, we reduce the number of subshells and processes to set up and reduce the chance of the race condition but not remove it. However, for practical purposes, the race condition is gone. With the reproduction steps in bazelbuild#17754, and this command ``` for i in {1..10000}; do /tmp/bazel run :foo &> /tmp/log ; grep -q "useful echo" /tmp/log ; if [ $? -eq 0 ]; then echo -n '+'; else echo -n '-'; fi; done ``` a bazel from head fails ~3900 out of 10000 times. After this commit, it never failed. Closes bazelbuild#17846. PiperOrigin-RevId: 518794237 Change-Id: I8c1862d3a274799b864f0f5f42b85d6df5af78c7

ShreeM01 added type: bug untriaged team-Remote-Exec Issues and PRs for the Execution (Remote) team labels Mar 13, 2023

zhengwei143 added team-CLI Console UI and removed team-Remote-Exec Issues and PRs for the Execution (Remote) team labels Mar 14, 2023

zhengwei143 added the awaiting-user-response Awaiting a response from the author label Mar 14, 2023

tjgq assigned meisterT Mar 21, 2023

tjgq added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged awaiting-user-response Awaiting a response from the author labels Mar 21, 2023

meisterT mentioned this issue Mar 22, 2023

Use less subshells and tees in running tests with bazel run. #17846

Closed

copybara-service bot closed this as completed in c04f0d4 Mar 23, 2023

ShreeM01 mentioned this issue Mar 23, 2023

[6.2.0]Use less subshells and tees in running tests with bazel run. #17869

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running shell tests with `bazel run` intermittently omits outputs #17754

Running shell tests with `bazel run` intermittently omits outputs #17754

crydell-ericsson commented Mar 13, 2023 •

edited

Loading

meisterT commented Mar 14, 2023

crydell-ericsson commented Mar 14, 2023 •

edited

Loading

crydell-ericsson commented Mar 16, 2023

Running shell tests with bazel run intermittently omits outputs #17754

Running shell tests with bazel run intermittently omits outputs #17754

Comments

crydell-ericsson commented Mar 13, 2023 • edited Loading

Description of the bug:

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Which operating system are you running Bazel on?

What is the output of bazel info release?

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

Have you found anything relevant by searching the web?

Any other information, logs, or outputs that you want to share?

meisterT commented Mar 14, 2023

crydell-ericsson commented Mar 14, 2023 • edited Loading

crydell-ericsson commented Mar 16, 2023

Running shell tests with `bazel run` intermittently omits outputs #17754

Running shell tests with `bazel run` intermittently omits outputs #17754

crydell-ericsson commented Mar 13, 2023 •

edited

Loading

What is the output of `bazel info release`?

If `bazel info release` returns `development version` or `(@non-git)`, tell us how you built Bazel.

What's the output of `git remote get-url origin; git rev-parse master; git rev-parse HEAD` ?

crydell-ericsson commented Mar 14, 2023 •

edited

Loading