Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skipping a matrix job hangs the Pull Request when inner-matrix jobs are required status checks #952

Closed
minut1bc opened this issue Jan 30, 2021 · 9 comments
Labels
enhancement New feature or request

Comments

@minut1bc
Copy link

minut1bc commented Jan 30, 2021

Describe the bug
Jobs with a build matrix configuration can be skipped from the top level, before expanding the matrix. This is problematic when the inner-matrix jobs are set as required status checks for a branch protection rule, resulting in the Pull Request hanging forever.

The current dirty fix for this is copying the top level if conditional, for each and every step inside the job, which results in a lot of duplicated code, as well as still having to create and run each inner-matrix job, even though every step is skipped. Additionally, this has the downside of setting the job result to success, instead of skipped. This may be unwanted behaviour in cases where another job would depend on this one, and on it's result being skipped for some conditions.

It would be possible to instead only require the whole matrix as the single status check, but this is not ideal in cases where only some of the inner-matrix jobs are actually a requirement. Again, splitting off into two jobs, with required checks and optional checks, would present a lot of duplicated code, as well as extra internal work for creating the additional top level job.

It would be a lot more efficient to internally expand the job matrix, without running all the jobs, and instantly skipping them in such cases.

The Pull Request that demonstrates this behaviour can be found here, with the workflow code here.

To Reproduce
Steps to reproduce the behavior:

  1. Create a job, with conditional if: false, and a build matrix strategy.
  2. Create a branch protection rule for main, and add one of the inner matrix jobs as a required status check
  3. Branch off main, push a commit, and create a Pull Request from the new branch into main.
  4. The workflow is now hanging, as the top level job was skipped, and it is waiting for the inner-matrix job.

Expected behavior
The workflow would realise that the whole matrix job was skipped, and thus all it's inner-matrix jobs are implicitly skipped as well, and the Pull Request would not hang.

Runner Version and Platform

GitHub-hosted runner, on ubuntu-latest.

What's not working?

The Pull Request is hanging, as the job was skipped.

@minut1bc minut1bc added the bug Something isn't working label Jan 30, 2021
billyvg added a commit to getsentry/sentry that referenced this issue Mar 3, 2021
This reverts commit d8c003f.

This doesn't work because of a bug in GH runners: actions/runner#952

Because the backend test has 2 matrix instances, when the job is
skipped, it does not create the needed Checks to satisfy our branch
protection requirements
billyvg added a commit to getsentry/sentry that referenced this issue Mar 3, 2021
…" (#24214)

This reverts commit d8c003f.

This doesn't work because of a bug in GH runners: actions/runner#952

Because the backend test has 2 matrix instances, when the job is
skipped, it does not create the needed Checks to satisfy our branch
protection requirements
@hross hross added the papercut label Mar 30, 2021
@hross
Copy link
Contributor

hross commented Mar 30, 2021

I don't think we would implement this behavior because we would not want to implicitly make a decision about what to do in the "deadlock" case (is it really skipped?) and would want to make sure there aren't cases where you could maliciously cause deadlocks to skip checks.

TLDR this is probably not a high enough priority feature with enough edge cases that it will not make the cut in our current implementation.

@hross hross closed this as completed Mar 30, 2021
@j-mie6
Copy link

j-mie6 commented Mar 30, 2021

(is it really skipped?)

@hross It seems to me that if the entire job was skipped then yes, each job in the matrix has been skipped transitively. The behaviour here isn't an implicit decision based on resolving any deadlock, but specifically those that are caused by the parent job being skipped. Simply put, the matrix was never expanded, and so the jobs were never even created - but we now wait for these non-existent jobs forever. That is, to me, really not the desired behaviour you'd expect and the fix is both computationally wasteful (spinning up extra jobs that are just going to be torn back down again immediately) and a little jarring.

@j-mie6
Copy link

j-mie6 commented Mar 30, 2021

It's actually even worse than a little jarring. I've got the quickfix implemented on my CI, so that when we skip a job it will expand out the matrix and then skip each of the steps in turn. The result is incredibly misleading:

image

Each of those jobs on the left were meant to have been skipped. But the matrix has to be expanded first because I am blocking on their completion. However, while it skips the sub-steps for each expanded matrix job, it is marking them as completed: this is incredibly misleading and has caused me to overlook some very nasty pipeline failures!

@billyvg
Copy link

billyvg commented Mar 30, 2021

@j-mie6 We're facing the same issues where we'd like to skip certain test suites based on the type of file that was changed (front vs backend), but in the PR it just looks like all of the suites were run and passed. This also ends up throwing off our CI metrics that we collect so now we have to assume that jobs under a certain threshold are "skipped".

I'd be ok if we had a way to short-circuit a workflow with a certain status (in this case, skipped).

@hross hross reopened this Mar 31, 2021
@hross
Copy link
Contributor

hross commented Mar 31, 2021

(reopening this based on the above feedback)

@hross hross added enhancement New feature or request and removed papercut bug Something isn't working labels Mar 31, 2021
@macobo
Copy link

macobo commented Nov 4, 2021

👍 for solving this issue. It comes up if you have:

  1. A large test suite requiring parallelization or need matrix testing for other reasons
  2. A monorepo where you only want to run tests that have changed.
  3. Want to use automerge to merge when "required" tests pass

This use-case is currently broken due to this bug.

@ethomson
Copy link
Contributor

Hi! I agree that this is a pain point for sure. However, suggestions for GitHub Actions functionality belong in the GitHub feedback forum, where the product managers and engineers will be able to review them. This repository is really only focused on the runner application, which is a very small component of GitHub Actions. If you could open this issue there, that would be very helpful for us! 😄

In the meantime, I'm closing this issue here. Thanks again!

@billyvg
Copy link

billyvg commented Dec 20, 2021

Created the suggestion post here community/community#9141

@ethomson
Copy link
Contributor

Thanks @billyvg !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants