New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for build cancellation #42

Merged
merged 1 commit into from Dec 12, 2016

Conversation

Projects
None yet
4 participants
@neonichu
Member

neonichu commented Nov 25, 2016

Right now, cancellation is only supported when using the C API.

Missing:

  • Cancel running build when the CLI receives SIGINT
  • Tests

I opted to not refactor the ninja stuff for now, which has its own implementation of cancellation. I think that can be done in a separate PR.

@neonichu neonichu changed the title from WIP: Support for build cancellation to Support for build cancellation Dec 6, 2016

@neonichu

This comment has been minimized.

Show comment
Hide comment
@neonichu

neonichu Dec 6, 2016

Member

@ddunbar Got a basic test for the cancellation that is modelled after the existing Ninja one. I think we should also have one for testing the C API, but not sure what exactly to test there.

Member

neonichu commented Dec 6, 2016

@ddunbar Got a basic test for the cancellation that is modelled after the existing Ninja one. I think we should also have one for testing the C API, but not sure what exactly to test there.

@ddunbar

This looks like a fine start, but there are a number of race conditions (and a possible use after free) I noted inline.

For development purposes, can you split this into an infrastructure change which adds the cancellation support to LaneBasedExecutionQueue, with a unit test (since that is the most subtle area, and worth ensuring we have coverage of)?

It is important that we have coverage of things like the SIGINT -> SIGKILL escalation, because that has historically caused problems in practice for build systems.

We can test this by just shelling out to things like sleep or a shell script (or command line tool) which ignores SIGINT.

Show outdated Hide outdated lib/BuildSystem/LaneBasedExecutionQueue.cpp
Show outdated Hide outdated lib/BuildSystem/LaneBasedExecutionQueue.cpp
Show outdated Hide outdated lib/BuildSystem/LaneBasedExecutionQueue.cpp
@@ -332,6 +376,12 @@ class LaneBasedExecutionQueue : public BuildExecutionQueue {
return false;

This comment has been minimized.

@ddunbar

ddunbar Dec 6, 2016

Member

We should make sure to remove this pid from the watch set here too, right? I think we can just move removal from below up.

@ddunbar

ddunbar Dec 6, 2016

Member

We should make sure to remove this pid from the watch set here too, right? I think we can just move removal from below up.

This comment has been minimized.

@neonichu

neonichu Dec 7, 2016

Member

@neonichu
Show outdated Hide outdated include/llbuild/BuildSystem/BuildExecutionQueue.h
@@ -405,13 +409,78 @@ static int executeParseCommand(std::vector<std::string> args) {
class BasicBuildSystemFrontendDelegate : public BuildSystemFrontendDelegate {
std::unique_ptr<basic::FileSystem> fileSystem;
/// The previous SIGINT handler.
struct sigaction previousSigintHandler;

This comment has been minimized.

@ddunbar

ddunbar Dec 6, 2016

Member

This is never actually used (but probably should be).

@ddunbar

ddunbar Dec 6, 2016

Member

This is never actually used (but probably should be).

This comment has been minimized.

@neonichu

neonichu Dec 7, 2016

Member

Yep, it was saved but not restored. Now done in the destructor, along with closing the pipe.

@neonichu

neonichu Dec 7, 2016

Member

Yep, it was saved but not restored. Now done in the destructor, along with closing the pipe.

This comment has been minimized.

@neonichu

neonichu Dec 7, 2016

Member

@neonichu
Show outdated Hide outdated lib/BuildSystem/LaneBasedExecutionQueue.cpp
Show outdated Hide outdated lib/BuildSystem/LaneBasedExecutionQueue.cpp
@@ -2041,3 +2041,10 @@ bool BuildSystem::enableTracing(StringRef path,
bool BuildSystem::build(StringRef name) {
return static_cast<BuildSystemImpl*>(impl)->build(name);
}
void BuildSystem::cancel() {
if (impl) {

This comment has been minimized.

@ddunbar

ddunbar Dec 6, 2016

Member

I think we need a flag to track whether we are cancelling, to cause jobs to stop trying to enqueue work if we are trying to cancel.

We should fix the implementation to properly merge this and the getDelegate().isCancelled() API, so that inside the BuildSystem we only have one hook.

@ddunbar

ddunbar Dec 6, 2016

Member

I think we need a flag to track whether we are cancelling, to cause jobs to stop trying to enqueue work if we are trying to cancel.

We should fix the implementation to properly merge this and the getDelegate().isCancelled() API, so that inside the BuildSystem we only have one hook.

This comment has been minimized.

@neonichu

neonichu Dec 8, 2016

Member

I moved up the flag from the C API's delegate and merged it with the existing isCancelled() implementation in the frontend delegate class. Since we are calling cancel() on the frontend delegate from both the CLI and the C API, that seemed like the best place to track cancellation.

@neonichu

neonichu Dec 8, 2016

Member

I moved up the flag from the C API's delegate and merged it with the existing isCancelled() implementation in the frontend delegate class. Since we are calling cancel() on the frontend delegate from both the CLI and the C API, that seemed like the best place to track cancellation.

@ddunbar ddunbar assigned neonichu and unassigned ddunbar Dec 6, 2016

@ddunbar

ddunbar approved these changes Dec 9, 2016

Looks great, I found one more (minor) race and a couple minor cleanup, otherwise LGTM!

@ddunbar

This comment has been minimized.

Show comment
Hide comment
@ddunbar

ddunbar Dec 9, 2016

Member

@swift-ci please test

Member

ddunbar commented Dec 9, 2016

@swift-ci please test

@neonichu

This comment has been minimized.

Show comment
Hide comment
@neonichu

neonichu Dec 9, 2016

Member

@swift-ci please test

Member

neonichu commented Dec 9, 2016

@swift-ci please test

<rdar://problem/28676638> Need BuildSystem support for build
cancellation

- Implement the ability to cancel running builds
- Invoke `cancel()` on `SIGINT` when executing build systems commands via
CLI.
- Test for SIGINT handling and for SIGKILL escalation
- Unit test for cancellation in `LaneBasedExecutionQueue`

@neonichu neonichu merged commit 0fff081 into apple:master Dec 12, 2016

@neonichu neonichu deleted the neonichu:feature/cancellation branch Dec 12, 2016

@jrose-apple

This comment has been minimized.

Show comment
Hide comment
@jrose-apple

jrose-apple Dec 12, 2016

Member

Your latest change doesn't seem to build…

Member

jrose-apple commented Dec 12, 2016

Your latest change doesn't seem to build…

@jrose-apple

This comment has been minimized.

Show comment
Hide comment
@ddunbar

This comment has been minimized.

Show comment
Hide comment
@ddunbar

ddunbar Dec 12, 2016

Member

Looks like missing #include, unfortunately libc++ has a tendency to over export things in its headers so libstdc++ builds sometimes find missing #include statements.

Member

ddunbar commented Dec 12, 2016

Looks like missing #include, unfortunately libc++ has a tendency to over export things in its headers so libstdc++ builds sometimes find missing #include statements.

@neonichu

This comment has been minimized.

Show comment
Hide comment
@neonichu

neonichu Dec 12, 2016

Member

Sorry for that, fixed in e7a6df8

Member

neonichu commented Dec 12, 2016

Sorry for that, fixed in e7a6df8

@jrose-apple

This comment has been minimized.

Show comment
Hide comment
@jrose-apple

jrose-apple Dec 12, 2016

Member

With that the tests build but fail. I reverted in #46 to get the bots back.

Can llbuild move to gated commits too?

Member

jrose-apple commented Dec 12, 2016

With that the tests build but fail. I reverted in #46 to get the bots back.

Can llbuild move to gated commits too?

@ddunbar

This comment has been minimized.

Show comment
Hide comment
@ddunbar

ddunbar Dec 12, 2016

Member

I'm not sure what you mean, I believe this did go past CI previously. @neonichu can you take a look?

Member

ddunbar commented Dec 12, 2016

I'm not sure what you mean, I believe this did go past CI previously. @neonichu can you take a look?

@jrose-apple

This comment has been minimized.

Show comment
Hide comment
@jrose-apple

jrose-apple Dec 12, 2016

Member

It passed CI, then the branch was tweaked in response to review feedback. I assume that must have broken something.

By "gated commits" I mean that GitHub won't let you merge a PR without CI passing on the current version of the PR. You'd also have to be in the habit of always going through PRs, but that's not hard to get used to

Member

jrose-apple commented Dec 12, 2016

It passed CI, then the branch was tweaked in response to review feedback. I assume that must have broken something.

By "gated commits" I mean that GitHub won't let you merge a PR without CI passing on the current version of the PR. You'd also have to be in the habit of always going through PRs, but that's not hard to get used to

@ddunbar

This comment has been minimized.

Show comment
Hide comment
@ddunbar

ddunbar Dec 12, 2016

Member

I don't think that is what is going on here, I suspect non-determinism more. Almost all interesting commits already go through PRs.

Member

ddunbar commented Dec 12, 2016

I don't think that is what is going on here, I suspect non-determinism more. Almost all interesting commits already go through PRs.

@neonichu neonichu restored the neonichu:feature/cancellation branch Dec 12, 2016

@neonichu

This comment has been minimized.

Show comment
Hide comment
@neonichu

neonichu Dec 13, 2016

Member

Jordan is right, I did make some final changes which I thought were trivial, but the test in question is failing reliably on Linux now.

I also agree that enforcing CI to pass for PRs would be a good idea, especially if the other projects are already doing that.

Member

neonichu commented Dec 13, 2016

Jordan is right, I did make some final changes which I thought were trivial, but the test in question is failing reliably on Linux now.

I also agree that enforcing CI to pass for PRs would be a good idea, especially if the other projects are already doing that.

@neonichu

This comment has been minimized.

Show comment
Hide comment
@neonichu

neonichu Dec 13, 2016

Member

I'm unsure how to fix the test, I think it runs into a similar problem as the existing Ninja one, which is deactivated for Linux (see https://github.com/apple/swift-llbuild/blob/master/tests/Ninja/Build/signal-handling.ninja#L16-L17).

Additionally, the "sigkill-escalation" one hangs when run through lit, but seems to work fine when I perform the same commands manually, not sure how to debug that.

Member

neonichu commented Dec 13, 2016

I'm unsure how to fix the test, I think it runs into a similar problem as the existing Ninja one, which is deactivated for Linux (see https://github.com/apple/swift-llbuild/blob/master/tests/Ninja/Build/signal-handling.ninja#L16-L17).

Additionally, the "sigkill-escalation" one hangs when run through lit, but seems to work fine when I perform the same commands manually, not sure how to debug that.

@neonichu

This comment has been minimized.

Show comment
Hide comment
@neonichu

neonichu Dec 13, 2016

Member

Turns out the issue was sh vs. bash in the script used by the lit tests and it was masked by them both being bash on Darwin.

Member

neonichu commented Dec 13, 2016

Turns out the issue was sh vs. bash in the script used by the lit tests and it was masked by them both being bash on Darwin.

@aciidb0mb3r

This comment has been minimized.

Show comment
Hide comment
@aciidb0mb3r
Member

aciidb0mb3r commented Dec 13, 2016

😱

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment