-
-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid deadlocks while joining the output streamer #817
Conversation
73f6367
to
c744656
Compare
streamer_deadline = time() + 3 | ||
while output_streamer.is_alive() and time() < streamer_deadline: | ||
sleep(0.1) | ||
self._log_return_code(popen_process.poll()) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This waiting around for the streamer is rather arbitrary.
In any expected case, is_alive()
will return False
and we continue on.
It's at least theoretically possible is_alive()
returns True
. This should mean the streamer is flooding the console with output or readline()
is stuck waiting for EOF
......or something worse.
That said, waiting around forever (like we were) would probably be worse....and moving forward prematurely presents the possibly to mix subprocess output with briefcase
output....although, at that point, the output wouldn't likely be very valuable anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed this is a bit arbitrary; but it's kinda forced on us by the limitations of the tools we have available. I can't see any obvious improvement to the approach that doesn't also involve deprecating Python 3.7 and 3.8 on Windows.
The one minor cosmetic improvement I can see is to add some console output to track when these conditions occur.
b8c5077
to
c768048
Compare
mock_sub.cleanup.assert_called_once_with("testing", popen_process) | ||
|
||
|
||
def test_stuck_streamer(mock_sub, popen_process, monkeypatch, capsys): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely open to any feedback for potential improvement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - this is hard to test; this is about as good as it's going to get AFAICT.
I've pushed a change that mocks time.time()
; this allows us to shortcut the loop so it doesn't take a full 3 seconds to finish. Ideally, we'd mock time.sleep()
as well - but if you do that, the parent thread never releases control to the child thread (at least, it doesn't in my testing), so some sleeping is required to test the behaviour of the subthread.
9b80045
to
6f61a78
Compare
6f61a78
to
90d2d89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, and works well in my testing. I'll wait for @mhsmith to confirm it works for him (since he was the original reporter), then merge once he's given a thumbs up.
streamer_deadline = time() + 3 | ||
while output_streamer.is_alive() and time() < streamer_deadline: | ||
sleep(0.1) | ||
self._log_return_code(popen_process.poll()) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed this is a bit arbitrary; but it's kinda forced on us by the limitations of the tools we have available. I can't see any obvious improvement to the approach that doesn't also involve deprecating Python 3.7 and 3.8 on Windows.
The one minor cosmetic improvement I can see is to add some console output to track when these conditions occur.
mock_sub.cleanup.assert_called_once_with("testing", popen_process) | ||
|
||
|
||
def test_stuck_streamer(mock_sub, popen_process, monkeypatch, capsys): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - this is hard to test; this is about as good as it's going to get AFAICT.
I've pushed a change that mocks time.time()
; this allows us to shortcut the loop so it doesn't take a full 3 seconds to finish. Ideally, we'd mock time.sleep()
as well - but if you do that, the parent thread never releases control to the child thread (at least, it doesn't in my testing), so some sleeping is required to test the behaviour of the subthread.
capsys.readouterr().out == "output line 1\n" | ||
"\n" | ||
"output line 3\n" | ||
"Stopping...\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - this is really weird; I'm not seeing the CI failure on any of my test configurations. Investigating...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found the source of the problem. Since the streamer thread isn't doing any substantial work, it was completely draining and exiting before we did our second stop check. The emergence of this behavior will be completely non-deterministic, as it depends on OS-level thread scheduling; however, we can strongly encourage the right behavior by adding a sleep to gently suggest to the OS that the streamer thread should suspend for a bit.
…terminate prematurely.
Codecov Report
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works fine for me in both MSYS2 and cmd, when interrupting both Gradle and Logcat. Just one maintainability comment:
self.cleanup(label, popen_process) | ||
output_streamer.join() | ||
streamer_deadline = time.time() + 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This timeout should always be the same duration as the timeout in cleanup
, so both numbers should come from a single source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please help me understand why these waits should be the same?
As best I can tell, they will happen sequentially and quite independently of each other. Further, I was initially considering making this final wait shorter....lest the user gets frustrated that their first CTRL+C wasn't working out and they just started sending ever more...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, you're quite right, I got confused by the threading. The first timeout is to shut down the subprocess, and the second timeout is to shut down the thread reading the process output, which may be delayed slightly in buffers. So this is fine the way it is.
I didn't want to add any more changes to this at this point but the originating issue of interrupting an Android build or the emulator got me trying to interrupt arbitrary tasks. I realized the output streamer is masking CTRL+C; therefore, a lot of upstream code is perceiving a failure leading to alerting the user and creating a briefcase log. Instead, the streamer should probably be re-raising the Additionally, fyi, I found a real-world use-case where |
@rmartin16 Agreed that handling the KeyboardInterrupt in an output stream so that it doesn't generate a log would be preferable; but also agreed that (as well as any other improvements around streamer interaction) can be handled as future work. |
Interrupting the joining of a thread with CTRL+C can lead to deadlocking certain versions of Python (python/cpython#89437).
Changes:
Replaces calls to
join()
the streamer thread with waiting onis_alive()
.Instantiate the streamer thread as a daemon to prevent Python itself trying to join the thread.
Remove all logic from streamer thread that is not necessary to stream output.
As seen in Python itself, give some cushion of time for the CTRL+C to propagate to child processes.
briefcase
can exit differently based on whether they are aborted by the user or encounter a different error.adb
is streaming logs, it will emit a return code of0xC000013A
when exiting from CTRL+C.Fixes On Windows, pressing Ctrl-C during an Android
build
orrun
causes briefcase to hang #809CC: @mhsmith
PR Checklist: