Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pending state is fragile to CI/builder failures #121

Closed
larsbergstrom opened this issue Dec 31, 2015 · 6 comments
Closed

Pending state is fragile to CI/builder failures #121

larsbergstrom opened this issue Dec 31, 2015 · 6 comments

Comments

@larsbergstrom
Copy link

In cases where homu sets a pull to pending, but something goes wrong with the builders or CI systems, it's easy for homu to end up stuck but with nothing going on. Worse, the Synchronize button doesn't help, and issuing a retry appears to do nothing, either.

What's the right thing to do here? We've run into this a bit when either Travis CI or linode (where we host buildbot) or GitHub goes wonky under a DDOS, as we start having lost messages, aborted builders w/o any status messages, etc.

Today, I manually restart all the services on the server, go through GH and re-deliver messages, and close PRs as necessary to get things going again. It would be great if there were either:

  1. A different form of synchronize that just "forgot" all pending work and transitioned things back to the approved state
  2. Something we could put in the PRs (e.g., @homu reset?) to clear the pending back to approved

cc @Manishearth for more feedback/ideas and @metajack @edunham since it's been an ugly week or two :-)

@Manishearth
Copy link
Contributor

retry force clean might be the right invocation.

I don't know if clean on it's own is the reset you want; it clears build details but doesn't touch pending.

@frewsxcv
Copy link

In my (probably controversial) opinion, if we run clean on every single build and it causes fewer issues than what we experience now, I'd consider that an improvement

@Manishearth
Copy link
Contributor

That could probably be arranged in the (probably inevitable) fork. I think things like closing and reopening should clean. I'm okay with retry not cleaning.

@metajack
Copy link

Why is a fork inevitable?

@barosl
Copy link
Owner

barosl commented Jan 18, 2016

issuing a retry appears to do nothing, either.

In general, retry should trigger a rebuild even when the PR is in the pending state. If that didn't work, I guess this is more likely due to a bad interaction between Homu and Buildbot. As Travis is directly informed about the state change using the GitHub webhooks, it is more reliable, at least better than Buildbot. Working with Buildbot using the auto branch is quite fragile.

  1. A different form of synchronize that just "forgot" all pending work and transitioned things back to the approved state

This is exactly what retry does. The problem might be in the fragile communication with Buildbot, which is discovered as follows: Homu sets up the auto branch in the hope of Buildbot responding, but Buildbot's response packets are somehow lost. Homu does not wait for Travis in the same scenario, because the merge commit is directly reported to Travis through the GitHub webhooks. So I've been thinking the right way to do is introducing some timeouts regarding the connection between Homu and Buildbot.

Also note that force and clean are Buildbot-specific commands, so they are not applicable to Travis. (Actualyl I've just found that clean is mistakenly enabled for Travis-enabled repositories too, this seems to be a severe bug.)

  1. Something we could put in the PRs (e.g., @homu reset?) to clear the pending back to approved

As I said above, that's what retry does!

@larsbergstrom
Copy link
Author

@barosl Aha! I'll close this, then. I suspect the issue is related to our buildbot problems, rather than things in homu itself.

We were also using force way too much during these errors, which made the problems even worse.

alexcrichton pushed a commit to alexcrichton/homu that referenced this issue Mar 19, 2018
Don't stop processing the queue when tree is closed

<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/homu/121)
<!-- Reviewable:end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants