New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build server: master builds unreliable #2191

Open
markus2330 opened this Issue Aug 18, 2018 · 8 comments

Comments

Projects
None yet
2 participants
@markus2330
Contributor

markus2330 commented Aug 18, 2018

Steps to Reproduce the Problem

I pushed f3e7630 to master.

Expected Result

That either the build is successful or I get a email.

Actual Result

Build failed without error message (only " Sending interrupt signal to process") and without email:

https://build.libelektra.org/jenkins/blue/organizations/jenkins/libelektra/detail/master/214/pipeline/

System Information

  • Elektra Version: master
@ingwinlu

This comment has been minimized.

Show comment
Hide comment
@ingwinlu

ingwinlu Aug 18, 2018

Contributor

i don't understand the problem.

the build got interrupted as expected as a new build of master was queued due to your changes (https://build.libelektra.org/jenkins/blue/organizations/jenkins/libelektra/detail/master/215/pipeline).

AND you made sure to request to not get emails in that case.

so what do you want to be different now? Should we remove the 'abort should not send mail' section you wanted to have earlier this month?

Contributor

ingwinlu commented Aug 18, 2018

i don't understand the problem.

the build got interrupted as expected as a new build of master was queued due to your changes (https://build.libelektra.org/jenkins/blue/organizations/jenkins/libelektra/detail/master/215/pipeline).

AND you made sure to request to not get emails in that case.

so what do you want to be different now? Should we remove the 'abort should not send mail' section you wanted to have earlier this month?

@markus2330

This comment has been minimized.

Show comment
Hide comment
@markus2330

markus2330 Aug 18, 2018

Contributor

Please take a look when what happened. You are really fast with accusations. I started 215 when I wrote this issue.

so what do you want to be different now?

Like the ticket says, master builds should be more reliable. For starters: I do not want build jobs aborted without any apparent reason (an apparent reason would be another push to master after a very short time. But even this reason is questionable as I already stated multiple times: It is good to know which commit broke master. It would be better if you can re-prioritize and not abort.).

Contributor

markus2330 commented Aug 18, 2018

Please take a look when what happened. You are really fast with accusations. I started 215 when I wrote this issue.

so what do you want to be different now?

Like the ticket says, master builds should be more reliable. For starters: I do not want build jobs aborted without any apparent reason (an apparent reason would be another push to master after a very short time. But even this reason is questionable as I already stated multiple times: It is good to know which commit broke master. It would be better if you can re-prioritize and not abort.).

@markus2330

This comment has been minimized.

Show comment
Hide comment
@markus2330

markus2330 Aug 18, 2018

Contributor

I might have pushed to another branch (maybe debian?) though, I hope this info helps.

Contributor

markus2330 commented Aug 18, 2018

I might have pushed to another branch (maybe debian?) though, I hope this info helps.

@ingwinlu

This comment has been minimized.

Show comment
Hide comment
@ingwinlu

ingwinlu Aug 18, 2018

Contributor

Please take a look when what happened. You are really fast with accusations. I started 215 when I wrote this issue.

i did. probably longer than i should have.

your commit started 214.
you later started a manual build (215, your user is mentioned) that aborted 214. no mail was sent for 214 as you specified that you did not want to have abort mail notifications.

where is the reliability issue? the system behaved as expected as far as i can tell as you wanted it to behave.

there is no reprioritization of jenkins tasks as far as i know https://issues.jenkins-ci.org/browse/JENKINS-1878.

If you do not want to abort old runs of the same job for master we can do so, but I wonder why you would not request it with that instead of issues about reliability?

Contributor

ingwinlu commented Aug 18, 2018

Please take a look when what happened. You are really fast with accusations. I started 215 when I wrote this issue.

i did. probably longer than i should have.

your commit started 214.
you later started a manual build (215, your user is mentioned) that aborted 214. no mail was sent for 214 as you specified that you did not want to have abort mail notifications.

where is the reliability issue? the system behaved as expected as far as i can tell as you wanted it to behave.

there is no reprioritization of jenkins tasks as far as i know https://issues.jenkins-ci.org/browse/JENKINS-1878.

If you do not want to abort old runs of the same job for master we can do so, but I wonder why you would not request it with that instead of issues about reliability?

@markus2330

This comment has been minimized.

Show comment
Hide comment
@markus2330

markus2330 Aug 18, 2018

Contributor

I only started one job today manually, immediately after 214 showed what I reported above.

In https://build.libelektra.org/jenkins/job/libelektra/job/master/215 it says "Aborted by user Aborted by Build#216" in 214 I cannot find something similar?

The next time it happens we hopefully have time and we can debug it before I need to issue a new build.

Btw. why are the Debian packages not published if the build server fails at a later stage (building the homepage)?

That the PRs do not build everything is also a major issue of the reliability. Failures in the homepage and in the Debian packages do not show up before they are in master.

Contributor

markus2330 commented Aug 18, 2018

I only started one job today manually, immediately after 214 showed what I reported above.

In https://build.libelektra.org/jenkins/job/libelektra/job/master/215 it says "Aborted by user Aborted by Build#216" in 214 I cannot find something similar?

The next time it happens we hopefully have time and we can debug it before I need to issue a new build.

Btw. why are the Debian packages not published if the build server fails at a later stage (building the homepage)?

That the PRs do not build everything is also a major issue of the reliability. Failures in the homepage and in the Debian packages do not show up before they are in master.

@ingwinlu

This comment has been minimized.

Show comment
Hide comment
@ingwinlu

ingwinlu Aug 18, 2018

Contributor

Ah I finally see what what you mean...

Sadly we have no influence on that as it probably was caused by a network interruption or something similar.

I mentioned that the way the jenkins issue solve the abort -> do not send mail issue is hacky and I did not like it (matching on the error message). But I did not think a case where it would effect the build server so soon.

Btw. why are the Debian packages not published if the build server fails at a later stage (building the homepage)?

Can you point to a concrete example?

That the PRs do not build everything is also a major issue of the reliability. Failures in the homepage and in the Debian packages do not show up before they are in master.

As pointed out several times before it is a compromise.

Contributor

ingwinlu commented Aug 18, 2018

Ah I finally see what what you mean...

Sadly we have no influence on that as it probably was caused by a network interruption or something similar.

I mentioned that the way the jenkins issue solve the abort -> do not send mail issue is hacky and I did not like it (matching on the error message). But I did not think a case where it would effect the build server so soon.

Btw. why are the Debian packages not published if the build server fails at a later stage (building the homepage)?

Can you point to a concrete example?

That the PRs do not build everything is also a major issue of the reliability. Failures in the homepage and in the Debian packages do not show up before they are in master.

As pointed out several times before it is a compromise.

@markus2330

This comment has been minimized.

Show comment
Hide comment
@markus2330

markus2330 Aug 18, 2018

Contributor

What about simply turning the interruption of build jobs off for master and instead wait for, lets say, one minute before starting master jobs (so that pushes short after each other to masters get ignored)?
(For PRs it works very good anyway, so imho no changes are needed there.)

If it is a lot of work to do that, simply document the current limitations.

matching on the error message

Can you document what you know about the exceptions and their messages? I think you know a lot about that.

Can you point to a concrete example?

Seems like http://build.libelektra.org/jenkins/blue/organizations/jenkins/libelektra/detail/master/216/ was aborted some seconds before it published, so it was not an error of the build server but only bad luck.

Contributor

markus2330 commented Aug 18, 2018

What about simply turning the interruption of build jobs off for master and instead wait for, lets say, one minute before starting master jobs (so that pushes short after each other to masters get ignored)?
(For PRs it works very good anyway, so imho no changes are needed there.)

If it is a lot of work to do that, simply document the current limitations.

matching on the error message

Can you document what you know about the exceptions and their messages? I think you know a lot about that.

Can you point to a concrete example?

Seems like http://build.libelektra.org/jenkins/blue/organizations/jenkins/libelektra/detail/master/216/ was aborted some seconds before it published, so it was not an error of the build server but only bad luck.

@ingwinlu

This comment has been minimized.

Show comment
Hide comment
@ingwinlu

ingwinlu Aug 18, 2018

Contributor

Can you document what you know about the exceptions and their messages? I think you know a lot about that.

Actually I don't. I only took a look when we needed to detect aborts where I saw that they depended on what the current command that was executed was for their type.

What about simply turning the interruption of build jobs off for master and instead wait for, lets say, one minute before starting master jobs (so that pushes short after each other to masters get ignored)?
(For PRs it works very good anyway, so imho no changes are needed there.)

I do not think that is possible.

Contributor

ingwinlu commented Aug 18, 2018

Can you document what you know about the exceptions and their messages? I think you know a lot about that.

Actually I don't. I only took a look when we needed to detect aborts where I saw that they depended on what the current command that was executed was for their type.

What about simply turning the interruption of build jobs off for master and instead wait for, lets say, one minute before starting master jobs (so that pushes short after each other to masters get ignored)?
(For PRs it works very good anyway, so imho no changes are needed there.)

I do not think that is possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment