Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running shell tests with bazel run intermittently omits outputs #17754

Closed
crydell-ericsson opened this issue Mar 13, 2023 · 3 comments
Closed
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-CLI Console UI type: bug

Comments

@crydell-ericsson
Copy link
Contributor

crydell-ericsson commented Mar 13, 2023

Description of the bug:

We have a use case wherein we want to pipe the output of a Bazel test target implemented with a shell script to an external tool. There are certain parts of the output that are essential to pass to the external tool. In addition, we want to be able to run this target with bazel run in addition to bazel test.

Assuming that the target is named :foo and that the external tool is named my_tool, we can do this with the command:

$ bazel run :foo | my_tool

This usually works, but my_tool will intermittently not receive the full intended output from the :foo run. The failures seem to be caused by a race condition that sometimes results in subprocesses being killed before they have had the opportunity to write to stdout. This problem seems to have been introduced by the commit 9051faa.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Create a new workspace with the files:

# File: BUILD

sh_test(
  name = "foo",
  srcs = ["foo.sh"],
)
# File: foo.sh

echo "useless echo"
echo "useful echo"
exit 0

Then, try to grep "useful echo" from the script output repeatedly until failure. This can for example be done with:

$ while bazel run :foo 2> /dev/null | grep "useful echo" > /dev/null ; do echo -n "." ; done && echo "Failed to grep 'useful echo'"

Alternatively, it's faster to use the --script_path option to create a separate script for running the test.

$ bazel run :foo --script_path=foo_script.sh
$ while ./foo_script.sh 2> /dev/null | grep "useful echo" > /dev/null ; do echo -n "." ; done && echo "Failed to grep 'useful echo'"

We have found that this usually fails within 1000 iterations, but under heavy load from other processes on the machine it been observed to go below 50.

Which operating system are you running Bazel on?

Red Hat Enterprise Linux Server 7.9

What is the output of bazel info release?

No response

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@ShreeM01 ShreeM01 added type: bug untriaged team-Remote-Exec Issues and PRs for the Execution (Remote) team labels Mar 13, 2023
@zhengwei143 zhengwei143 added team-CLI Console UI and removed team-Remote-Exec Issues and PRs for the Execution (Remote) team labels Mar 14, 2023
@meisterT
Copy link
Member

Can you please try this with bazel from head? We recently fixed issues with bazel run that sound similar.

@zhengwei143 zhengwei143 added the awaiting-user-response Awaiting a response from the author label Mar 14, 2023
@crydell-ericsson
Copy link
Contributor Author

crydell-ericsson commented Mar 14, 2023

Can you please try this with bazel from head? We recently fixed issues with bazel run that sound similar.

I do not currently have an environment set up where I can build the latest head, but I have tentatively tested the following:

  1. Create a script with bazel run :foo --script_path=foo_script.sh -- this was done with release 7.0.0-pre.20230302.1
  2. Replace the path to test-setup.sh on the last line of the generated script with a path to the test-setup.sh in the source repository of head (dbb09c9)
  3. Run the same command as in the original issue, i.e. while ./foo_script.sh 2> /dev/null | grep "useful echo" > /dev/null ; do echo -n "." ; done && echo "Failed to grep 'useful echo'"

This still fails as described in the issue. While foo_script.sh was not built with a Bazel built from head, it does use the test-setup.sh found on head, which I reckon is where the problem is. I can try to investigate this further once I have an environment where I can build Bazel from head set up.

@crydell-ericsson
Copy link
Contributor Author

I have now tried this with a Bazel built from head (8a23169) and can verify that the problem still exists.

@tjgq tjgq added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged awaiting-user-response Awaiting a response from the author labels Mar 21, 2023
meisterT added a commit that referenced this issue Mar 22, 2023
Fixes #17754.

What we have seen prior to this change was that sometimes for quick
tests the output was swallowed. After a lot of poking it became clear
that the culprit is the use of subshell and `tee`, e.g. if you remove
`tee` completely from the picture the behavior never shows up.

The issue is that with a fast test, `tee` seems to be killed (or its
parent subshell) before the printing the output to stdout.

With this change, we reduce the number of subshells and processes to set
up and reduce the chance of the race condition but not remove it.

However, for practical purposes, the race condition is gone.

With the reproduction steps in #17754, and this command
```
for i in {1..10000}; do /tmp/bazel run :foo &> /tmp/log ; grep -q "useful echo" /tmp/log ; if [ $? -eq 0 ]; then echo -n '+'; else  echo -n '-'; fi; done
```
a bazel from head fails ~3900 out of 10000 times.

After this commit, it never failed.
ShreeM01 pushed a commit to ShreeM01/bazel that referenced this issue Mar 23, 2023
Fixes bazelbuild#17754.

What we have seen prior to this change was that sometimes for quick tests the output was swallowed. After a lot of poking it became clear that the culprit is the use of subshell and `tee`, e.g. if you remove `tee` completely from the picture the behavior never shows up.

The issue is that with a fast test, `tee` seems to be killed (or its parent subshell) before the printing the output to stdout.

With this change, we reduce the number of subshells and processes to set up and reduce the chance of the race condition but not remove it.

However, for practical purposes, the race condition is gone.

With the reproduction steps in bazelbuild#17754, and this command
```
for i in {1..10000}; do /tmp/bazel run :foo &> /tmp/log ; grep -q "useful echo" /tmp/log ; if [ $? -eq 0 ]; then echo -n '+'; else  echo -n '-'; fi; done
```
a bazel from head fails ~3900 out of 10000 times.

After this commit, it never failed.

Closes bazelbuild#17846.

PiperOrigin-RevId: 518794237
Change-Id: I8c1862d3a274799b864f0f5f42b85d6df5af78c7
ShreeM01 added a commit that referenced this issue Mar 24, 2023
)

Fixes #17754.

What we have seen prior to this change was that sometimes for quick tests the output was swallowed. After a lot of poking it became clear that the culprit is the use of subshell and `tee`, e.g. if you remove `tee` completely from the picture the behavior never shows up.

The issue is that with a fast test, `tee` seems to be killed (or its parent subshell) before the printing the output to stdout.

With this change, we reduce the number of subshells and processes to set up and reduce the chance of the race condition but not remove it.

However, for practical purposes, the race condition is gone.

With the reproduction steps in #17754, and this command
```
for i in {1..10000}; do /tmp/bazel run :foo &> /tmp/log ; grep -q "useful echo" /tmp/log ; if [ $? -eq 0 ]; then echo -n '+'; else  echo -n '-'; fi; done
```
a bazel from head fails ~3900 out of 10000 times.

After this commit, it never failed.

Closes #17846.

PiperOrigin-RevId: 518794237
Change-Id: I8c1862d3a274799b864f0f5f42b85d6df5af78c7

Co-authored-by: Tobias Werth <twerth@google.com>
fweikert pushed a commit to fweikert/bazel that referenced this issue May 25, 2023
Fixes bazelbuild#17754.

What we have seen prior to this change was that sometimes for quick tests the output was swallowed. After a lot of poking it became clear that the culprit is the use of subshell and `tee`, e.g. if you remove `tee` completely from the picture the behavior never shows up.

The issue is that with a fast test, `tee` seems to be killed (or its parent subshell) before the printing the output to stdout.

With this change, we reduce the number of subshells and processes to set up and reduce the chance of the race condition but not remove it.

However, for practical purposes, the race condition is gone.

With the reproduction steps in bazelbuild#17754, and this command
```
for i in {1..10000}; do /tmp/bazel run :foo &> /tmp/log ; grep -q "useful echo" /tmp/log ; if [ $? -eq 0 ]; then echo -n '+'; else  echo -n '-'; fi; done
```
a bazel from head fails ~3900 out of 10000 times.

After this commit, it never failed.

Closes bazelbuild#17846.

PiperOrigin-RevId: 518794237
Change-Id: I8c1862d3a274799b864f0f5f42b85d6df5af78c7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-CLI Console UI type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants