Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows, testing: update tests to not call Bazel in a subshell #3148

Closed
laszlocsomor opened this issue Jun 8, 2017 · 4 comments
Closed

Windows, testing: update tests to not call Bazel in a subshell #3148

laszlocsomor opened this issue Jun 8, 2017 · 4 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) platform: windows type: process

Comments

@laszlocsomor
Copy link
Contributor

Description of the problem / feature request / question:

In Bazel's own shell integration tests, when they run Bazel it always starts a new server.
Culprit summary: MinGW Bash waits for all child processes of a process running in a subshell to terminate before terminating the subshell. Linux Bash's subshell exits as soon as the direct child process exited.

Finding the culprit required a solid day of debugging.

Bug journey

Does this always happen?

No, it only happens if Bazel runs in a subshell. If the test runs:

pid1=$(bazel info server_pid)
pid2=$(bazel info server_pid)
echo "pid1=$pid1, pid2=$pid2"

the two PIDs are different. It also runs very slowly (~half a minute), the reason I'll explain later.

Running them directly is fast and prints the same PID:

bazel info server_pid
bazel info server_pid

Why is a new server started every time?

Because the server.pid.txt file is missing. It took a long time to find out that it was the server itself deleting that file, as part of an orderly shutdown.

Why does the server shut down?

Not because anything asks it... It does so because --max_idle_secs is 15 for tests, which elapses, and the server neatly shuts down and cleans up after itself, WAI. This is why two Bazel invocations in a subshell takes about half a minute.

Why does the subshell wait for the server to time out idling?

Because it turns out MSYS (and Cygwin) Bash waits for all children of a process running in a subshell to terminate, even if those processes are in a different process group (CREATE_NEW_PROCESS_GROUP).

The Bazel client starts the server process with CREATE_NEW_PROCESS_GROUP. The process tree (observed in Sysinternals' Process Explorer) shows that as long as the parent is running, the child is displayed as a child, but after the parent terminates the child becomes a top-level process. MSYS however doesn't know or doesn't care, and waits for the child process to exit.

Can we convince MSYS to not wait for the child process?

I don't know. I tried:

  • double-CreateProcess (similar to double-fork idiom on Unixes), the middle process exiting thus orphaning the grandchild -- the process tree looks fine (orphaned process becomes a top-level one), yet Bash still waits for it
  • creating the child process in a different job object and breaking away from the current one to avoid nesting job objects (see CREATE_BREAKAWAY_FROM_JOB)
  • dynamically loading msys-2.0.dll and the fork and setsid methods, calling those to emulate a double-fork

None of the above works.

If possible, provide a minimal example to reproduce the problem:

C:\work>type minibazel.cc
#include <windows.h>
#include <stdio.h>

void log(const char *format, ...) {
  FILE* f = fopen("c:\\work\\minibazel.txt", "at");
  va_list ap;
  va_start(ap, format);
  fprintf(f, "(pid=%d) ", GetCurrentProcessId());
  vfprintf(f, format, ap);
  va_end(ap);
  fclose(f);

  va_start(ap, format);
  fprintf(stdout, "(pid=%d) ", GetCurrentProcessId());
  vfprintf(stdout, format, ap);
  va_end(ap);
}

int ExecuteDaemon(const char* argv0) {
  SECURITY_ATTRIBUTES sa;
  sa.nLength = sizeof(SECURITY_ATTRIBUTES);
  sa.bInheritHandle = FALSE;
  sa.lpSecurityDescriptor = NULL;

  PROCESS_INFORMATION processInfo = {0};
  STARTUPINFOA startupInfo = {0};

  char cmdline[1000];
  size_t len = strlen(argv0);
  strncpy(cmdline, argv0, len);
  cmdline[len] = ' ';
  cmdline[len + 1] = 'x';
  cmdline[len + 2] = 0;

  BOOL ok = CreateProcessA(
      /* lpApplicationName */ NULL,
      /* lpCommandLine */ cmdline,
      /* lpProcessAttributes */ NULL,
      /* lpThreadAttributes */ NULL,
      /* bInheritHandles */ TRUE,
      /* dwCreationFlags */ DETACHED_PROCESS | CREATE_NEW_PROCESS_GROUP,
      /* lpEnvironment */ NULL,
      /* lpCurrentDirectory */ NULL,
      /* lpStartupInfo */ &startupInfo,
      /* lpProcessInformation */ &processInfo);

  if (!ok) {
    log("ERROR[child] CreateProcess, err: %d\n", GetLastError());
    return 1;
  }

  CloseHandle(processInfo.hProcess);
  CloseHandle(processInfo.hThread);
  return 0;
}

int main(int argc, char** argv) {
  if (argc > 1) {
    log("INFO[child] Sleep 10 sec\n");
    Sleep(10000);
    log("INFO[child] Done\n");
    return 0;
  } else {
    log("INFO[parent] start -------------------\n");
    int x = ExecuteDaemon(argv[0]);
    log("INFO[parent] Created process, sleep 5 sec\n");
    Sleep(5000);
    log("INFO[parent] Done\n");
    return x;
  }
  return 0;
}

C:\work>cl minibazel.cc
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24213.1 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

minibazel.cc
Microsoft (R) Incremental Linker Version 14.00.24213.1
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:minibazel.exe
minibazel.obj
$ cat ./subshell.sh
#!/bin/bash
echo "$(date +%H:%M:%S) start subshell"
out=$(c:/work/minibazel.exe)
echo "$(date +%H:%M:%S) done subshell"

$ ./subshell.sh
15:24:01 start subshell
15:24:11 done subshell

Environment info

  • Operating System: Windows 10

  • Bazel version (output of bazel info release): all, the problem is not in Bazel AFAICT

Have you found anything relevant by searching the web?

Sadly, no.

@laszlocsomor laszlocsomor self-assigned this Jun 8, 2017
@laszlocsomor laszlocsomor added platform: windows P2 We'll consider working on this in future. (Assignee optional) labels Jun 8, 2017
@meteorcloudy
Copy link
Member

Such a clear explanation and nice reproduce! Thanks for looking into this!

bazel-io pushed a commit that referenced this issue Jun 9, 2017
This allows writing integration tests in Python
rather than shell, so we can avoid
#3148

This change also adds a test to verify that
running "bazel info server_pid" twice yields the
same PID. Again, this is testing that we indeed
avoid the aformentioned bug.

Change-Id: Ic800965b16ab87179370fd2cd43908286120e8d5
PiperOrigin-RevId: 158517192
bazel-io pushed a commit that referenced this issue Oct 6, 2017
In this commit:
- testenv.sh no longer runs Bazel in a subshell
- testenv.sh prints informative log messages that
  show the origin (testenv.sh) and the date
- testenv.sh logs to $TEST_log by appending to it,
  not by overwriting it

This change speeds up
//src/test/shell/bazel:bazel_windows_example_test from
~260s to ~165s.

On Windows, Bash (MSYS) waits for all subprocesses
of a subshell to terminate before allowing the
subshell itself to terminate too, even if some
subprocesses are daemons.

I think this is a bug, and it's the culprit of
#3148

Change-Id: If97ea6dfeb904af560b291a7acebd09e43863ab4
PiperOrigin-RevId: 171267462
@laszlocsomor laszlocsomor changed the title Windows: Bazel server is always restarted in shell integration tests Windows, testing: update tests to not call Bazel in a subshell Dec 14, 2017
@laszlocsomor
Copy link
Contributor Author

This bug is part of #4292 so I'm leaving it as P2 and adding the Q1-2018 label.

@meteorcloudy
Copy link
Member

You know what, calling Bazel in a subshell might no longer be a problem after 9c97bf9

@laszlocsomor
Copy link
Contributor Author

You're right!

laszlocsomor added a commit to laszlocsomor/bazel that referenced this issue Jul 30, 2018
testenv.sh no longer runs "bazel info bazel-bin"
and "bazel info bazel-genfiles", because the only
test that needs this information is
bazel_sandboxing_test.sh

This saves 1-2 seconds on Windows in every shell
test setup.

Also, on Windows we can safely run Bazel in a
subshell since bazelbuild#3148 is fixed.

Change-Id: If9873221d536f1acfd4849afbfc83b94311de2bd
bazel-io pushed a commit that referenced this issue Jul 31, 2018
testenv.sh no longer runs "bazel info bazel-bin"
and "bazel info bazel-genfiles", because the only
test that needs this information is
bazel_sandboxing_test.sh

This saves 1-2 seconds on Windows in every shell
test setup.

Also, on Windows we can safely run Bazel in a
subshell since #3148 is fixed.

Change-Id: If9873221d536f1acfd4849afbfc83b94311de2bd

Closes #5707.

Change-Id: I04e37c4d5f794017a6050253d8d1b8a8ac43a6d1
PiperOrigin-RevId: 206714921
werkt pushed a commit to werkt/bazel that referenced this issue Aug 2, 2018
testenv.sh no longer runs "bazel info bazel-bin"
and "bazel info bazel-genfiles", because the only
test that needs this information is
bazel_sandboxing_test.sh

This saves 1-2 seconds on Windows in every shell
test setup.

Also, on Windows we can safely run Bazel in a
subshell since bazelbuild#3148 is fixed.

Change-Id: If9873221d536f1acfd4849afbfc83b94311de2bd

Closes bazelbuild#5707.

Change-Id: I04e37c4d5f794017a6050253d8d1b8a8ac43a6d1
PiperOrigin-RevId: 206714921
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) platform: windows type: process
Projects
None yet
Development

No branches or pull requests

2 participants