@coqbot could report failures from GitLab CI and restart spurious ones #3

Zimmi48 · 2018-05-20T12:12:51Z

When a GitLab CI pipeline completes with failures, first check if the pipeline is up-to-date with respect to the PR.

If yes, check if some of the failures are spurious:

"runner system failure" detected by GitLab, example error message: ERROR: Job failed (system failure): Cannot connect to the Docker daemon at tcp://10.142.0.123:2376. Is the docker daemon running?

connection trouble:

error: RPC failed; HTTP 500 curl 22 The requested URL returned error: 500 Internal Server Error
fatal: The remote end hung up unexpectedly
ERROR: Job failed: exit code 1

uploading artifacts failed:

Uploading artifacts to coordinator... ok            id=71082452 responseStatus=201 Created token=G7Azf-fN
ERROR: Job failed: exit code 1

If yes, restart the corresponding jobs.
(For the spurious failures we have control upon, we should fix them instead.)

Otherwise, post a message in the PR thread with the last few lines of the failing job logs and direct links to these logs.

The text was updated successfully, but these errors were encountered:

Zimmi48 · 2018-06-12T15:49:42Z

This is going to be way easier than I thought thanks to the job webhook, the build trace API endpoint and the build retry API endpoint (cf. bf62f4b and https://docs.gitlab.com/ee/api/jobs.html#retry-a-job).

Zimmi48 · 2018-06-13T12:04:22Z

Actually something is missing from the webhook load or the trace to be able to tell whether the failure is due to a failing runner. Cf. https://gitlab.com/gitlab-org/gitlab-ee/issues/6408

Zimmi48 · 2018-06-13T12:51:02Z

Another unrelated problem to put this in practice would be to stop relying on Heroku's free dynos as GitLab job webhook generates way too many requests to let the bot have the 7 hours of statutory sleep.

ejgallego · 2018-07-11T16:28:04Z

Please disable the report functionality until the "stale build problem" is fixed as detailed in coq/coq#7871 (comment)

Zimmi48 · 2018-07-12T08:22:37Z

OK, this is fixed now.

ejgallego · 2018-07-12T12:37:46Z

Thanks!!!

Zimmi48 · 2018-07-16T21:24:48Z

This is basically implemented now and further enhancements can be treated in separate issues.

ejgallego · 2018-07-16T23:23:43Z

Thanks for this great work.

Zimmi48 added the enhancement New feature or request label May 20, 2018

Zimmi48 mentioned this issue May 30, 2018

Gitlab: retry failed jobs once coq/coq#7642

Merged

Zimmi48 mentioned this issue Jul 4, 2018

Print something after the build completed if it wasn't a runner failure. coq/coq#7992

Merged

Zimmi48 closed this as completed Jul 16, 2018

Zimmi48 added this to the 0.1.0 milestone Sep 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

@coqbot could report failures from GitLab CI and restart spurious ones #3

@coqbot could report failures from GitLab CI and restart spurious ones #3

Zimmi48 commented May 20, 2018 •

edited

Loading

Zimmi48 commented Jun 12, 2018

Zimmi48 commented Jun 13, 2018

Zimmi48 commented Jun 13, 2018

ejgallego commented Jul 11, 2018

Zimmi48 commented Jul 12, 2018

ejgallego commented Jul 12, 2018

Zimmi48 commented Jul 16, 2018

ejgallego commented Jul 16, 2018

@coqbot could report failures from GitLab CI and restart spurious ones #3

@coqbot could report failures from GitLab CI and restart spurious ones #3

Comments

Zimmi48 commented May 20, 2018 • edited Loading

Zimmi48 commented Jun 12, 2018

Zimmi48 commented Jun 13, 2018

Zimmi48 commented Jun 13, 2018

ejgallego commented Jul 11, 2018

Zimmi48 commented Jul 12, 2018

ejgallego commented Jul 12, 2018

Zimmi48 commented Jul 16, 2018

ejgallego commented Jul 16, 2018

Zimmi48 commented May 20, 2018 •

edited

Loading