Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_interrupt_kills_child failing in 0.11.0rc3 #4625

Closed
katre opened this issue Feb 12, 2018 · 8 comments
Closed

test_interrupt_kills_child failing in 0.11.0rc3 #4625

katre opened this issue Feb 12, 2018 · 8 comments
Assignees
Labels
category: misc > misc P1 I'll work on this now. (Assignee required) type: bug

Comments

@katre
Copy link
Member

katre commented Feb 12, 2018

https://ci.bazel.build/blue/organizations/jenkins/Global%2Fbazel-tests/detail/bazel-tests/500/tests

Test is failing on the platforms:

  • linux-x86_64,python_version=2
  • linux-x86_64,python_version=3
  • ubuntu_16.04-x86_64

Error:

INFO: $TEST_TMPDIR defined: output root default is '/home/ci/.cache/bazel/_bazel_ci/65ed5a6a0442d902ec235fee508231f8/bazel-sandbox/1483943166230936815/execroot/io_bazel/_tmp/2e8cdad9647bde18379f08ad523677f3' and max_idle_secs default is '15'.
INFO: $TEST_TMPDIR defined: output root default is '/home/ci/.cache/bazel/_bazel_ci/65ed5a6a0442d902ec235fee508231f8/bazel-sandbox/1483943166230936815/execroot/io_bazel/_tmp/2e8cdad9647bde18379f08ad523677f3' and max_idle_secs default is '15'.
Loading:
Loading: 0 packages loaded
Analyzing: target //foo:sleep-minute (10 packages loaded)
INFO: Analysed target //foo:sleep-minute (10 packages loaded).
INFO: Found 1 target...
[0 / 4] [-----] Creating source manifest for //foo:sleep-minute
Target //foo:sleep-minute up-to-date:
bazel-bin/foo/sleep-minute
INFO: Elapsed time: 1.451s, Critical Path: 0.08s
INFO: Build completed successfully, 4 total actions
INFO: Running command line: bazel-bin/foo/sleep-minute
-- Test log: -----------------------------------------------------------
------------------------------------------------------------------------
test_interrupt_kills_child FAILED: Shell script still running after SIGINT sent to server .
/home/ci/.cache/bazel/_bazel_ci/65ed5a6a0442d902ec235fee508231f8/bazel-sandbox/1483943166230936815/execroot/io_bazel/bazel-out/k8-fastbuild/bin/src/test/shell/integration/run_test.runfiles/io_bazel/src/test/shell/integration/run_test:206: in call to test_interrupt_kills_child
INFO[run_test 2018-02-12 16:26:54 (+0000)] Cleaning up workspace
@katre
Copy link
Member Author

katre commented Feb 12, 2018

Blocking release #3959

@katre
Copy link
Member Author

katre commented Feb 12, 2018

Failure is repeatable, but the entire test must be run with the RC, as so:

$ bazel build //src:bazel
...
INFO: Build completed successfully, 1 total action

$ cp bazel-bin/src/bazel /tmp/bazel-0.11.0rc3

$ /tmp/bazel-0.11.0rc3 test //src/test/shell/integration:run_test
....
INFO: Build completed, 1 test FAILED, 1244 total actions
//src/test/shell/integration:run_test                                    FAILED in 1 out of 3 in 76.5s
  Stats over 3 runs: max = 76.5s, min = 17.0s, avg = 36.9s, dev = 28.0s
    ERROR   .test_interrupt_kills_child (61.5s)

Executed 1 out of 1 test: 1 fails locally.

@katre
Copy link
Member Author

katre commented Feb 12, 2018

Test passes with the above procedure if I revert 28bd997.

@katre
Copy link
Member Author

katre commented Feb 12, 2018

Linking to #4608.

@agoulti
Copy link

agoulti commented Feb 12, 2018

Thanks for the repro, looking into this

bazel-io pushed a commit that referenced this issue Feb 13, 2018
*** Reason for rollback ***

#4625

What I thought was a short fix is turning into a long hunt, so I better roll this back to get the build green again.

I'm not yet 100% certain what the interactions are, but there's a chance that it's back to the drawing board.

*** Original change description ***

Fixing test-setup.sh occasionally missing stdout/stderr, on systems where "tail --pid" is supported.

The solutions aren't mine, the new test was taken from Ola's unknown commit and the way to avoid race condition courtesy of sethkoehler@

Mitigates #4608 for compatible Linux systems.

TESTED=manual scripts and new test case.
RELNOTES: None
PiperOrigin-RevId: 185482604
@agoulti
Copy link

agoulti commented Feb 13, 2018

I've rolled back the original change to get this green again. The problem is more subtle than I initially thought, so I don't have an ETA of when I'll get to the bottom of it.

If I understand this correctly, test-setup.sh does not affect anything in the "bazel run" behavior.
So, the object of the test ("SIGINT kills the underlying command in bazel run") shouldn't have been affected.

We are only modifying the wrapper under which the test itself was ran, and the way the test was ran. I'm still trying to figure out what effect it might have.

An obvious thought is, we added a level of indirection (the test is now ran as a subprocess of test-setup.sh rather than inside that process), which might make a difference; however, in my attempts, simply adding such a level of indirection wasn't enough to fail the test.

I will continue with the investigation tomorrow.

@katre
Copy link
Member Author

katre commented Feb 13, 2018

Rollback is 3904ac3.

@katre
Copy link
Member Author

katre commented Feb 20, 2018

The breakage was rolled back, so removing release blocker.

katre pushed a commit that referenced this issue Feb 20, 2018
*** Reason for rollback ***

#4625

What I thought was a short fix is turning into a long hunt, so I better roll this back to get the build green again.

I'm not yet 100% certain what the interactions are, but there's a chance that it's back to the drawing board.

*** Original change description ***

Fixing test-setup.sh occasionally missing stdout/stderr, on systems where "tail --pid" is supported.

The solutions aren't mine, the new test was taken from Ola's unknown commit and the way to avoid race condition courtesy of sethkoehler@

Mitigates #4608 for compatible Linux systems.

TESTED=manual scripts and new test case.
RELNOTES: None
PiperOrigin-RevId: 185482604
katre pushed a commit that referenced this issue Feb 20, 2018
…ess in a sub-shell.

Apparently, nested background processes interfere with SIGINT handling in bash. I don't 100% understand why and how, but I do have a small bash script that demonstrates the problem: script A that spawns a background process, sends it a SIGINT, and verifies it was received. The script works, *unless* run in the background by a process B; this extra layer of backgrounding cause process A's logic to stop working. See experimental/users/olaola/shell/ for examples. See also https://stackoverflow.com/questions/48847722/nested-background-processes-and-sigint-handling

*** Original change description ***

Fixing test-setup.sh occasionally missing stdout/stderr, on systems where "tail --pid" is supported.

The solutions aren't mine, the new test was taken from Ola's unknown commit and the way to avoid race condition courtesy of sethkoehler@

Mitigates #4608 for compatible Linux systems.

TESTED=presubmits, manual shell tests on new bazel
RELNOTES: None
PiperOrigin-RevId: 186312008
bazel-io pushed a commit that referenced this issue Feb 23, 2018
Baseline: 00d781a

Cherry picks:
   + ea2d4c4:
     Update stub_finds_runfiles_test to be a real sh_test.
   + d855d81:
     java,runfiles: fix bugs in runfiles library
   + 56aeb04:
     Fixing #4585: broken re-execution of orphaned actions.
   + cf3f81a:
     remote: Add support for HTTP Basic Auth
   + 28bd997:
     Fixing test-setup.sh occasionally missing stdout/stderr, on
     systems where "tail --pid" is supported.
   + 109e4b4:
     Automated rollback of commit
     7e6837c.
   + b3d52b1:
     Fix incorrect include directories when -no-canonical-prefixes is
     passed to clang
   + 3904ac3:
     Automated rollback of commit
     28bd997.
   + 1001141:
     Roll forward of
     3904ac33a983fd8faebba1
     b52bcac5a3ff942029
     (3904ac33a983fd8faebba
     1b52bcac5a3ff942029). Fix #4625 by running the test process in a
     sub-shell.

Incompatible changes:

  - ctx.fragments.jvm is not available anymore.

New features:

  - java,runfiles: You can now depend on
    `@bazel_tools//tools/runfiles:java-runfiles` to get a
    platform-independent runfiles library for Java. See JavaDoc of
    https://github.com/bazelbuild/bazel/blob/master/src/tools/runfiles
    /java/com/google/devtools/build/runfiles/Runfiles.java for usage
    information.

Important changes:

  - The --[no]experimental_disable_jvm command line option is not
    supported anymore.
  - Allow expanding TreeArtifacts for libraries_to_link
  - Proguarded Android binaries can be built with incremental dexing.
  - aar_import now supports assets.
  - Crash in OutputJar::Close has been fixed
  - generator_* attributes are nonconfigurable.
  - Introduces --[no]keep_state_after_build
  - Add support for merged object files needed for -flto-unit.
  - Fix how libraries to link is specified to archiver actions.
  - Replace //tools/defaults:android_jar with
    @bazel_tools//tools/android:android_jar.
    //tools/defaults:android_jar will be removed in a future release.
  - java_common.compile supports neverlink
  - Resolved an issue where a failure in the remote cache would not
    trigger local re-execution of an action.
mmorearty pushed a commit to Asana/bazel that referenced this issue Feb 23, 2018
Baseline: 00d781a

Cherry picks:
   + ea2d4c4:
     Update stub_finds_runfiles_test to be a real sh_test.
   + d855d81:
     java,runfiles: fix bugs in runfiles library
   + 56aeb04:
     Fixing bazelbuild#4585: broken re-execution of orphaned actions.
   + cf3f81a:
     remote: Add support for HTTP Basic Auth
   + 28bd997:
     Fixing test-setup.sh occasionally missing stdout/stderr, on
     systems where "tail --pid" is supported.
   + 109e4b4:
     Automated rollback of commit
     7e6837c.
   + b3d52b1:
     Fix incorrect include directories when -no-canonical-prefixes is
     passed to clang
   + 3904ac3:
     Automated rollback of commit
     28bd997.
   + 1001141:
     Roll forward of
     bazelbuild@3904ac33a983fd8faebba1
     b52bcac5a3ff942029
     (bazelbuild@3904ac33a983fd8faebba
     1b52bcac5a3ff942029). Fix bazelbuild#4625 by running the test process in a
     sub-shell.

Incompatible changes:

  - ctx.fragments.jvm is not available anymore.

New features:

  - java,runfiles: You can now depend on
    `@bazel_tools//tools/runfiles:java-runfiles` to get a
    platform-independent runfiles library for Java. See JavaDoc of
    https://github.com/bazelbuild/bazel/blob/master/src/tools/runfiles
    /java/com/google/devtools/build/runfiles/Runfiles.java for usage
    information.

Important changes:

  - The --[no]experimental_disable_jvm command line option is not
    supported anymore.
  - Allow expanding TreeArtifacts for libraries_to_link
  - Proguarded Android binaries can be built with incremental dexing.
  - aar_import now supports assets.
  - Crash in OutputJar::Close has been fixed
  - generator_* attributes are nonconfigurable.
  - Introduces --[no]keep_state_after_build
  - Add support for merged object files needed for -flto-unit.
  - Fix how libraries to link is specified to archiver actions.
  - Replace //tools/defaults:android_jar with
    @bazel_tools//tools/android:android_jar.
    //tools/defaults:android_jar will be removed in a future release.
  - java_common.compile supports neverlink
  - Resolved an issue where a failure in the remote cache would not
    trigger local re-execution of an action.
katre pushed a commit that referenced this issue Feb 28, 2018
…ess in a sub-shell.

Apparently, nested background processes interfere with SIGINT handling in bash. I don't 100% understand why and how, but I do have a small bash script that demonstrates the problem: script A that spawns a background process, sends it a SIGINT, and verifies it was received. The script works, *unless* run in the background by a process B; this extra layer of backgrounding cause process A's logic to stop working. See experimental/users/olaola/shell/ for examples. See also https://stackoverflow.com/questions/48847722/nested-background-processes-and-sigint-handling

*** Original change description ***

Fixing test-setup.sh occasionally missing stdout/stderr, on systems where "tail --pid" is supported.

The solutions aren't mine, the new test was taken from Ola's unknown commit and the way to avoid race condition courtesy of sethkoehler@

Mitigates #4608 for compatible Linux systems.

TESTED=presubmits, manual shell tests on new bazel
RELNOTES: None
PiperOrigin-RevId: 186312008
philwo pushed a commit that referenced this issue Mar 6, 2018
…ess in a sub-shell.

Apparently, nested background processes interfere with SIGINT handling in bash. I don't 100% understand why and how, but I do have a small bash script that demonstrates the problem: script A that spawns a background process, sends it a SIGINT, and verifies it was received. The script works, *unless* run in the background by a process B; this extra layer of backgrounding cause process A's logic to stop working. See experimental/users/olaola/shell/ for examples. See also https://stackoverflow.com/questions/48847722/nested-background-processes-and-sigint-handling

*** Original change description ***

Fixing test-setup.sh occasionally missing stdout/stderr, on systems where "tail --pid" is supported.

The solutions aren't mine, the new test was taken from Ola's unknown commit and the way to avoid race condition courtesy of sethkoehler@

Mitigates #4608 for compatible Linux systems.

TESTED=presubmits, manual shell tests on new bazel
RELNOTES: None
PiperOrigin-RevId: 186312008
bazel-io pushed a commit that referenced this issue Mar 6, 2018
Baseline: 00d781a

Cherry picks:
   + ea2d4c4:
     Update stub_finds_runfiles_test to be a real sh_test.
   + d855d81:
     java,runfiles: fix bugs in runfiles library
   + 56aeb04:
     Fixing #4585: broken re-execution of orphaned actions.
   + cf3f81a:
     remote: Add support for HTTP Basic Auth
   + 28bd997:
     Fixing test-setup.sh occasionally missing stdout/stderr, on
     systems where "tail --pid" is supported.
   + 109e4b4:
     Automated rollback of commit
     7e6837c.
   + b3d52b1:
     Fix incorrect include directories when -no-canonical-prefixes is
     passed to clang
   + 1001141:
     Roll forward of
     3904ac33a983fd8faebba1
     b52bcac5a3ff942029
     (3904ac33a983fd8faebba
     1b52bcac5a3ff942029). Fix #4625 by running the test process in a
     sub-shell.
   + fc98b44:
     android,windows: bugfix in aar_resources_extractor

Important changes:

  - Fixes regression building Android rules on Windows.
mmorearty pushed a commit to Asana/bazel that referenced this issue Mar 7, 2018
Baseline: 00d781a

Cherry picks:
   + ea2d4c4:
     Update stub_finds_runfiles_test to be a real sh_test.
   + d855d81:
     java,runfiles: fix bugs in runfiles library
   + 56aeb04:
     Fixing bazelbuild#4585: broken re-execution of orphaned actions.
   + cf3f81a:
     remote: Add support for HTTP Basic Auth
   + 28bd997:
     Fixing test-setup.sh occasionally missing stdout/stderr, on
     systems where "tail --pid" is supported.
   + 109e4b4:
     Automated rollback of commit
     7e6837c.
   + b3d52b1:
     Fix incorrect include directories when -no-canonical-prefixes is
     passed to clang
   + 1001141:
     Roll forward of
     bazelbuild@3904ac33a983fd8faebba1
     b52bcac5a3ff942029
     (bazelbuild@3904ac33a983fd8faebba
     1b52bcac5a3ff942029). Fix bazelbuild#4625 by running the test process in a
     sub-shell.
   + fc98b44:
     android,windows: bugfix in aar_resources_extractor

Important changes:

  - Fixes regression building Android rules on Windows.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: misc > misc P1 I'll work on this now. (Assignee required) type: bug
Projects
None yet
Development

No branches or pull requests

2 participants