-
Notifications
You must be signed in to change notification settings - Fork 932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to re-run single jobs #432
Comments
@chrispat from the product team for this feedback. 😄 |
Any updates? |
would love to see this feature |
Any updates on this? In a parallelized Workflow, we always spend 100 build minutes even if one of the 1-minute-jobs fails. |
💯 this is a much needed addition. I maintain Beekeeper Studio, and random timeouts cause 1/5 jobs to fail fairly regularly. Being able to re-run only the jobs that failed would save us so much time. I would also love to see being able to retry a single-step, and have other jobs not abort on a single failed job. |
This is a crucial feature for anyone doing multi-arch package builds and deployments with tooling that cannot build multiple architectures in parallel. WIthout this, much more complicated workflows are required to ensure packages don’t get deployed multiple times just because one of the jobs failed. |
This is a much needed feature for my team, which is building a CI pipeline for build/testing/deploying multiple packages for multiple platforms, not having this feature seems very wasteful in both time and electrons. |
have to chime in. My matrix generates 61 jobs and one of them usually fails because of where it collects data from. The next time I re-run 61 jobs another one fails ... |
Our team is currently exploding 1 workflow into many jobs, this is a terrible hack to get retries to work because you cannot make much sense of the status of a branch/commit anymore (the Actions UI does not do any grouping at that point) |
It's been a year, @TingluoHuang is there any status update on this? Has it been at least considered? |
Is there any updated on this? we have migrated from GitLab CI to GitHub actions and I am already regretting about it, we have a pipeline that deploys infrastructure and takes around 1 hour, if somethings fails retrying the whole pipeline is a big lost. |
I'm honestly quite disappointed that this doesn't seem to have been prioritized, and no response as to when/if it can be expected (just that it was "definitely on the backlog" back in 2019). In order to provide some substance, and not just spam everyone with complaints and a +1, here's a summary about what I found with respect to workarounds: https://github.community/t/re-run-jobs/16145/11
Unfortunately, this is a bit clunky to use, as at least last I checked neither includes nor yaml anchors are supported in workflows, so code reuse and maintainability across projects will be a pain. Also, I don't really understand how I could make rules like "deploy to staging once all builds pass". The matrix feature is also really nice, and I'd hate to lose it. There is another interesting hack, which stores the last run status in cache and then skips jobs based on that status: How they used it: - name: Set default run status
run: echo "::set-output name=last_run_status::default" > last_run_status
- name: Restore last run status
id: last_run
uses: actions/cache@v2
with:
path: |
last_run_status
key: ${{ github.run_id }}-${{ matrix.os }}-${{ matrix.node-version }}-${{ matrix.webpack }}-${{ steps.date.outputs.date }}
restore-keys: |
${{ github.run_id }}-${{ matrix.os }}-${{ matrix.node-version }}-${{ matrix.webpack }}-
- name: Set last run status
id: last_run_status
run: cat last_run_status
- name: Checkout ref
uses: actions/checkout@v2
with:
ref: ${{ github.event.workflow_dispatch.ref }}
- name: Use Node.js ${{ matrix.node-version }}
if: steps.last_run_status.outputs.last_run_status != 'success'
uses: actions/setup-node@v1
with:
node-version: ${{ matrix.node-version }} |
+1 we would be grateful for this feature. |
+1 We are evaluating different CI providers right now (after potentially migrating from Travis) and I'm sure this is a really important feature for many other developers as well |
+1 we need this feature. |
looking for this feature too, otherwise, have to split the workflow into multiple ones |
+1 please safe the planet! |
To avoid spamming everyone with notifications please use GitHub's reaction buttons instead of commenting "+1 we want this". Thanks 😃 |
Did you know that *.visualstudio.com can do this in devops? |
Hope to see it this quarter 🙏 |
Hello everyone! I strongly agree that this is a thing we need - and in fact this is a thing that we're working on. However, this is a part of Actions itself, it's not a part of the runner application (meaning: the software that's in this repository). In order to keep things tidy for the runner team - the developers who are working on this application - I'm going to close this issue where it will stay off of their bug list. This is being tracked in our feedback repository which is where you can request features in GitHub Actions. Thanks for all the feedback, everyone, and I hope to see you in our feedback repo. |
NOTE: Avoiding rerunning jobs that have already passed is more than just saving computing cycles. It also is critical to avoid the cumulative probability of failures in the different jobs that can significantly increase the number of testing iterations needed to get all passing jobs. For example, if you have a GitHub Actions setup with seven independent jobs that run to test a PR, if there is a 20% chance of a random failure in any one of the seven PR builds, then the chance of having at least one of the PR builds having a failure jumps to 1 - (1 - 0.2)^7 = 0.79 or 80%! And if any job fails, you have to rerun all of the jobs and the probability of failure the next time is still 80% and so on. The result is that it can take many PR testing iterations to get all of the jobs to pass. This occurs relatively frequently, for example, in the Trilinos PR testing system (which currently uses a custom PR testing system which also lacks the ability to rerun individual jobs and where each job has a non-trivial random probability of failure). What this means is that if you can't rerun single jobs that fail, then you just can't effectively scale to a large number of testing jobs. As another example, if you have 100 GitHub Actions jobs with just 1% chance of experiencing a failure (which is about the frequency of failure of just being able to fetch dependencies in a GitHub Actions job), then the cumulative probability of failure across these 100 jobs is 1 - (1-0.01)^100 = 0.63 or 63%! But if you can rerun individual jobs, the number of GitHub Actions jobs needed to pass goes way down and getting a set of passing jobs becomes much more probable after the first GitHub Actions jobs run that has a 63% cumulative probability of failure. If there is just a single job that failed in the first running of all of the GHA jobs (due to a random failure), then the rerunning of that one job would have just a 1% chance of failing or a 99% of passing. That reduces wasted computing resources and speeds up the testing cycle wall-clock time. This is a big deal for projects that need many testing jobs and have a higher probability of failure in any individual job. |
Seems like it is live now and we can rerun single jobs. I'm really grateful for devs for implementing this 🙏 🎉 And now some tiny rant 😅It's kinda broken when using job matrix and Cypress parallel tests... Here is how it worked and why it is not working well with failed job rerun feature
Why it does not work with rerunning only one matrix job? Because unique ID is the same and Cypress consider this run as finished and do not run tests again. I did not find a way to force running them again with same ID. What can be done with this is:
First approach is slow since all need to be rerun again. Second approach seems best at first because we could use So to sum upMaybe we could add some flag to mark whole matrix as failed when one of jobs inside matrix fails? When we rerun all failed jobs it would rerun whole matrix again. What do you think? Thanks |
I saw the release announcement for supporting re-running single jobs, is this being released in phases or something? The GitHub Enterprise repos I'm working on still do not have any ability to re-run individual jobs. I thought maybe it just wouldn't work with old runs, so I kicked off new ones and still nothing, just the re-run all jobs option. |
It is currently available on GitHub.com only and is slated to ship in the next update to GitHub enterprise. In addition there are still some issues related to reusable workflows that we are ironing out. |
This is amazing work. Not seeing the option to re-run failed jobs for reusable workflows. Wasn't sure if it's because the call to the reusable we're using is dependent upon another job or not. EDIT: For more context: the first job is reading configuration and then passing the config to the reusable workflow call that starts several jobs in a matrix. |
We have temporarily disabled the feature for any run that references a reusable workflow while we iron out the issues. We hope to have those resolved towards the end of this week or early next week. |
@debugger24 This is my guess only but this is probably by design. Rerunning any job creates whole new run attempt for whole workflow. All jobs that are not dependent on job you want to rerurn, are "cloned" into new run attempt. But to clone you need a job result first so you need to wait for all jobs to finish. |
Found another unexpected behaviour:
This is a problem to us: we have a manual check called "Rubocop" (submitted manually using reviewdog) that is required for a pull request to be merged. If we retry the workflow, we have all jobs passing, but the manual check is missing, so a PR can't be merged. Screenshots of such case (1st with failed jobs, then 2nd re-ran, but without the manual rubocop status check) |
Do you have any public issue opened for this case? I'd like to track progress of this issue |
This is now working, thanks! |
It is not yet @madhavajay, so I created a feedback discussion to suggest that: https://github.com/orgs/community/discussions/73156 Leave an upvote or reaction over there! (also the 43 other people that upvote the previous comment optimally 😆) |
@abhilash1in Are you sure that all jobs inside workflow are finished? If they are not done yet there will be no option to rerun. GitHub requires for full workflow to finish before it can be rerun. |
Erm, okay all jobs had not finished running when I was looking for the re-run failed jobs button. But also, that doesn't make sense. If I see failed jobs, I should be able to re-run them individually without having to wait for all the jobs to finish. |
Yeah would be nice to be able to do this. However, I think that GitHub needs to store state of whole workflow run before you can rerun parts of it again. Some jobs are depending on results of other jobs (even if those jobs failed). They probably wait for all to be executed (skipped, failed, cancelled or successful), store workflow run state somewhere, and then they are able to know what jobs state to "copy" from previous run, and what to rerun again. |
in azure, i have to wait for the jobs to finish before i can rerun them............. |
Please add the ability to re-run single jobs of a workflow. This is such a basic feature.
Please keep the environment in mind while prioritizing features.
🙏
The text was updated successfully, but these errors were encountered: