Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect commit is being used in PR builds #36

Closed
briantist opened this issue Apr 24, 2022 · 6 comments · Fixed by #38
Closed

Incorrect commit is being used in PR builds #36

briantist opened this issue Apr 24, 2022 · 6 comments · Fixed by #38
Labels
bug Something isn't working

Comments

@briantist
Copy link
Collaborator

Originally posted here:

Re: using the merge SHA:

I am beginning to worry that there may be a problem with this.. I noticed in a previous PR, and now in a new one, that the docs build is always 1 push behind, somehow.

In this PR: ansible-collections/community.hashi_vault#223

I made my second push, which contained both of these commits (in this order):

The docs build ran and failed: https://github.com/ansible-collections/community.hashi_vault/runs/5401472234?check_suite_focus=true

The HEAD checkout it did was using the merge_commit_sha, 0a8a68704a308eba6eac1decd31ac031b7593049

The build failed because the lookup plugin contained a reference to the module; this problem existed in my first push. It was fixed in the second, but is building as though it wasn't.

If I look directly at that commit, it looks like my first push, wherein the module was missing:
ansible-collections/community.hashi_vault@0a8a687

I don't have time to dig into it further now, but it looks a lot like this:
actions/checkout#237

I had been putting this off a bit due to this resuable workflow issue, in part because I saw the issues around the same time and couldn't be certain if they were related or not, and in part because even if they were unrelated, the issue made debugging and troubleshooting difficult.

With that issue solved, I am still seeing this happen.

Symptoms

This is easiest seen when docs are being published. The latest push will trigger the build and the docs built will be the contents of your previous push. I say "push" because a push containing multiple commits does not seem to build the docs of the second-to-last commit, but rather the last commit of the last push.

In a repo where the docs are not published the easiest way to see this would be to add errors egregious enough that the build will fail.

Triggers

I do not know yet all of the conditions under which this does or does not occur:

  • is merge SHA field itself flawed?
    • it's not readily clear what it is supposed to represent, and my original tests showed it having the same content as the merge ref when both were available
  • is it related to reusable workflows somehow?
    • my original tests were in direct workflows
    • could test that again (to be sure) and also test the equivalent with reusable workflows
  • is it related to pull_request vs pull_request_target workdlows?
    • I don't remember what my original test used as a trigger, will need to confirm
    • could be a confounding factor with reusable workflows
  • is it a regression of on pull request, checkout@v2 built some ref other than the one that was claimed actions/checkout#237 ?

Resolution

Ideally we can more clearly identify the triggers, and if it's a bug on GitHub's side we can report it.. past experience shows that niche issues like this will take a long time to address, if we can even get them to.

We may want to change the precedence of when we use the merge ref vs the merge SHA field.

For now we start with the SHA because it's easier to know if it's available or not, to then fall back to the merge commit ref.

Since we have nowhere to retrieve the merge ref and instead we build it, we need to either try the checkout and ignore failure, or do additional git operations and/or API calls to determine its validity if we want to start with that and "fall back" to merge SHA.

briantist added a commit to briantist/community.hashi_vault that referenced this issue Apr 24, 2022
@briantist
Copy link
Collaborator Author

@felixfontein if you have a chance to look into whether this is happening in community.docker that would be great 🙏.
Any ideas are appreciated too.

@felixfontein
Copy link
Collaborator

I'm seeing the same behavior in community.docker: ansible-collections/community.docker#340 After the initial commit I pushed two times, and both times the previous version got published (as you described).

@briantist
Copy link
Collaborator Author

Thanks a lot @felixfontein , I was following along and saw the same behavior as well. I can do some more tests in my GHA test repo that try to compare reusable vs. not reusable, pull_request vs. pull_request_target etc.

@felixfontein
Copy link
Collaborator

Maybe #37 can help to get some more information on what is actually built.

@briantist
Copy link
Collaborator Author

I think I have figured this out:

Via this PR and these workflows:

I have confirmed that:

  • the issue is not related to reusable workflows
  • the issue affects both the pull_request and pull_request_target events equally

It seems that the merge_commit_sha is a deeply confusing member, so much so that it was deprecated, and then un-deprecated: https://developer.github.com/changes/2013-04-25-deprecating-merge-commit-sha/

This page has some detailed explanation of what this value is: https://docs.github.com/en/rest/pulls/pulls#get-a-pull-request

It's related to the "mergeability" check that you see that tells you if a PR can be merged cleanly. It turns out this check happens asynchronously in a background job.

As described in the above link, the mergeable field will then be updated to true or false based on the result of that check, and if it's not done, the value of mergeable is null.

Like this: https://github.com/briantist/gha-junk/runs/6148748842?check_suite_focus=true#step:3:1886

We know for a fact that jobs being queued still exhibit this problem (in my repo they often have to wait a long time until the rest of CI runs) so I can only guess that the event data that gets sent to a workflow event is gathered rather immediately, when the job is queued, and not when the job runs; in fact the initial event data is probably what triggers both the workflow and anything else (like this mergeability background job), and so it's likely that the mergeable field is always null in these jobs, and that subsequently, the commit is always wrong, or perhaps behind (still containing the last value).

This also (partially kinda sorta) explains why this value is missing on the first event, when the PR is first opened (the mergeable field is completely missing too).


What can we do

  • we could do some variation of using the merge ref; as I mentioned that's not as straightforward as checking this sha value, but we could do it
    • we know we can't use the merge ref on closed events, so we still want to use the SHA as a fallback, and we would prefer it to be correct; but perhaps it is correct on a closed event, since there's no need for a new check
  • we could make API calls to wait until the mergeability check is finished, and pull down an updated SHA
    • I think this would be quick, but I don't know for certain

@briantist
Copy link
Collaborator Author

I'm also going back over the original PR/issue to ensure we don't lose sight of the reasons for not using the PR's HEAD in the first place:

@briantist briantist added the bug Something isn't working label Apr 24, 2022
briantist added a commit to ansible-collections/community.hashi_vault that referenced this issue Apr 29, 2022
* add vault_kv1_get module

* add vault_kv2_get module

* kv1 sanity and missing condition

* kv2 sanity, sample, missing condition

* add kv1 response fixture

* add kv2 response fixture

* add units for kv1_get

* kv1 more thorough tests

* add units for kv2_get

* fix seealso references

* distinguish kv paths from api paths in integration

* kv2_get units - group(1) should always exist

* add integration tests for kv1_get

* add vault_ci_kv2_destroy_all test plugin

* set up versioned secrets in initial configuration

* add kv2_get integration tests

* add codecov flags for modules

* style nit

* add codecov flags for kv_get lookups

* add lookup vault_kv1_get

* add units for lookup vault_kv1_get

* fix kv1_get lookup units

* really fix it this time

* assertion niceties

* add vault_kv2_get lookup

* add units for kv2_get lookup

* add integration for kv1_get lookup

* add integration for kv2_get lookup

* add check mode coverage to module integrations

* docsite updates

* update seealso references

* add examples

* ansible-community/github-docs-build#36

* update action group
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants