Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideas for reducing the CI runner load #248

Open
jfehrle opened this issue Oct 18, 2022 · 3 comments
Open

Ideas for reducing the CI runner load #248

jfehrle opened this issue Oct 18, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@jfehrle
Copy link
Member

jfehrle commented Oct 18, 2022

  • Suppose the PR has a failure only in ci_fiat_crypto and you make a fix to address that. It would be helpful to be able to rerun only that job and its prerequisites--or just the jobs that failed in the last full run, or a specific named group of jobs and their prerequisites. Of course, you would later run a full build before the merge. A possible user interface (if it can be implemented) is to define labels for each CI step and have the build process remove labels for jobs that succeed and add labels for jobs that fail or don't run because of failures in prerequisites.
  • Detect whether a build job is only updating documentation and limit the CI jobs accordingly
  • Apply time limits to jobs. For example, not long ago we saw the test suite timing out after 3 hours. Maybe a limit of 1 hour is enough?
  • Do we currently detect the number of cores on the runner and configure NJOBS to match?
@ticket-tagger ticket-tagger bot added the enhancement New feature or request label Oct 18, 2022
@Zimmi48
Copy link
Member

Zimmi48 commented Oct 19, 2022

A possible user interface (if it can be implemented) is to define labels for each CI step and have the build process remove labels for jobs that succeed and add labels for jobs that fail or don't run because of failures in prerequisites.

This looks technically feasible but also potentially too many labels being added and removed at each step, risking polluting the PR timeline.

Detect whether a build job is only updating documentation and limit the CI jobs accordingly

This should be feasible much more easily. I guess we would still want to run the linter, the build (dependency of the doc) and the doc jobs.

Apply time limits to jobs. For example, not long ago we saw the test suite timing out after 3 hours. Maybe a limit of 1 hour is enough?

I think this has been changed. To check in the Coq repo (but in any case, outside of scope here).

Do we currently detect the number of cores on the runner and configure NJOBS to match?

No, but I think we also control what our runners are and the question of whether the current value of NJOBS was appropriate was asked and answered. Anyway, also out of scope for here.

@SkySkimmer
Copy link
Contributor

I think this has been changed. To check in the Coq repo (but in any case, outside of scope here).

coq/coq#16669

@ejgallego
Copy link
Member

Detect whether a build job is only updating documentation and limit the CI jobs accordingly

If we could cache our builds that's solved by itself, also a bit the first point, tho in general we will have a very low hit rate for that kind of fixes. See also coq/coq#16201

Dune will detect the number of cores available, but in some runners setup that's dangerous if such runner drives parallel jobs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants