Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
build server: master builds unreliable #2191
Steps to Reproduce the Problem
I pushed f3e7630 to master.
That either the build is successful or I get a email.
Build failed without error message (only " Sending interrupt signal to process") and without email:
i don't understand the problem.
the build got interrupted as expected as a new build of master was queued due to your changes (https://build.libelektra.org/jenkins/blue/organizations/jenkins/libelektra/detail/master/215/pipeline).
AND you made sure to request to not get emails in that case.
so what do you want to be different now? Should we remove the 'abort should not send mail' section you wanted to have earlier this month?
Please take a look when what happened. You are really fast with accusations. I started 215 when I wrote this issue.
Like the ticket says, master builds should be more reliable. For starters: I do not want build jobs aborted without any apparent reason (an apparent reason would be another push to master after a very short time. But even this reason is questionable as I already stated multiple times: It is good to know which commit broke master. It would be better if you can re-prioritize and not abort.).
i did. probably longer than i should have.
your commit started 214.
where is the reliability issue? the system behaved as expected as far as i can tell as you wanted it to behave.
there is no reprioritization of jenkins tasks as far as i know https://issues.jenkins-ci.org/browse/JENKINS-1878.
If you do not want to abort old runs of the same job for master we can do so, but I wonder why you would not request it with that instead of issues about reliability?
I only started one job today manually, immediately after 214 showed what I reported above.
In https://build.libelektra.org/jenkins/job/libelektra/job/master/215 it says "Aborted by user Aborted by Build#216" in 214 I cannot find something similar?
The next time it happens we hopefully have time and we can debug it before I need to issue a new build.
Btw. why are the Debian packages not published if the build server fails at a later stage (building the homepage)?
That the PRs do not build everything is also a major issue of the reliability. Failures in the homepage and in the Debian packages do not show up before they are in master.
Ah I finally see what what you mean...
Sadly we have no influence on that as it probably was caused by a network interruption or something similar.
I mentioned that the way the jenkins issue solve the abort -> do not send mail issue is hacky and I did not like it (matching on the error message). But I did not think a case where it would effect the build server so soon.
Can you point to a concrete example?
As pointed out several times before it is a compromise.
What about simply turning the interruption of build jobs off for master and instead wait for, lets say, one minute before starting master jobs (so that pushes short after each other to masters get ignored)?
If it is a lot of work to do that, simply document the current limitations.
Can you document what you know about the exceptions and their messages? I think you know a lot about that.
Seems like http://build.libelektra.org/jenkins/blue/organizations/jenkins/libelektra/detail/master/216/ was aborted some seconds before it published, so it was not an error of the build server but only bad luck.
Actually I don't. I only took a look when we needed to detect aborts where I saw that they depended on what the current command that was executed was for their type.
I do not think that is possible.