Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide guidance for workflows broken by @4? #472

Open
nedbat opened this issue Dec 15, 2023 · 84 comments
Open

Provide guidance for workflows broken by @4? #472

nedbat opened this issue Dec 15, 2023 · 84 comments
Labels
documentation Improvements or additions to documentation

Comments

@nedbat
Copy link

nedbat commented Dec 15, 2023

What files would you like to change?

I can see that @4 brought improvements in speed and reliability, that's great. But it also broke a number of common patterns in workflows. Can you provide some guidance about ways to achieve the same outcomes with the new version? This might involved new features in actions/download-artifact.

For example:

What are your suggested changes?

I'm looking for help, I don't have solutions yet myself.

@nedbat nedbat added the documentation Improvements or additions to documentation label Dec 15, 2023
@henryiii
Copy link

henryiii commented Dec 15, 2023

Personally, I think two changes would really help: If there was some way for download-artifact to extract multiple artifacts into a single directory, and if there was some way to use a glob to express the artifacts to download. They could even be together, such as specifying name: coverage-* as the name extracting all matching files into one directory, like setting one name does. Or it could be a toggle, like merge: true/false/auto.

FYI, this affects basically every cibuildwheel user (there are over a thousand), as it was the way you'd merge binaries on different platforms before uploading to PyPI. It also affects the recommendations from the PyPI and Scientific-Python Development Guidelines, etc. It also affects publishing websites that are built in multiple steps (like building WebAssembly components).

FYI, people are merging the v4 updates without realizing the breakage, as Dependabot is making them separately1, so if they appear broken, they still go with it assuming it's the v3/v4 incompatibility, and if it's only used for the release step (common with cibuildwheel workflows), then they won't even know it's broken until they try to release. I'm currently having to rush a release of cmake for Python to fix a segfault and someone already merged v4 updates, so that pipeline is broken. It would be great if this breaking change could be listed in the dependabot update directly, rather than behind a link, but I assume that's too late.

Happy to update as much as I can to whatever the recommendation is. My current plan (in pypa/cibuildwheel#1699 & scientific-python/cookie#335) is:

    - uses: actions/upload-artifact@v4
      with:
        name: Wheel-${{ matrix.<whatever> }}
...

    - uses: actions/download-artifact@v4
      with:
        path: all

    - name: Merge files
      run: |
        mkdir dist
        mv all/Wheel-*/* .

But besides the extra shell step, will also download everything, even if there's stuff like coverage files that are not needed for that job.

Footnotes

  1. I'm going to start recommending the grouped dependabot updates feature that was recently released soon! That would have helped here.

@webknjaz
Copy link

webknjaz commented Dec 15, 2023

Even in the context wider than just uploading to PyPI, when I build a bunch of wheels on different platforms, there should be a way to collect them all for testing (examples: https://github.com/aio-libs/yarl/blob/88900a1/.github/workflows/ci-cd.yml#L208-L265 / https://github.com/aio-libs/frozenlist/blob/b5ffcd7/.github/workflows/ci-cd.yml#L207-L264)

@webknjaz
Copy link

webknjaz commented Dec 15, 2023

This also breaks the use of composite actions that themselves make use of either upload-artifact or download-artifact:

I'm curious if the change is an attempt to reduce occurrences of the race condition I reported a while back: actions/toolkit#1235 / actions/download-artifact#140 (comment).

@woodruffw
Copy link

As an additional datapoint here: GitHub's own docs for publishing to PyPI via OIDC encourage the "upload-then-download" pattern, which (I believe) will be broken by these action changes when used with multiple uploaders (which is common in Python, for builds on different hosts or platforms).

(This has the unfortunate potential to break a large percentage of Python publishing workflows.)

Ref: https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/configuring-openid-connect-in-pypi

@per1234
Copy link

per1234 commented Dec 15, 2023

If there was some way for download-artifact to extract multiple artifacts into a single directory

From my experiments while trying to find a way to adapt the hundreds of workflows I maintain that rely on the ability to upload multiple times from matrix jobs to the same artifact for transfer to a dependent job that consumes the content of the artifact, I find the actions/download-artifact@v4 action already has that capability.

For example, a workflow job with these steps will succeed and show that the files from the foo and bar artifacts are present under the baz folder as expected.

- uses: actions/download-artifact@v4
  with:
    name: foo
    path: baz

- uses: actions/download-artifact@v4
  with:
    name: bar
    path: baz

- run: ls baz

If you found some conditions where the action is not able to do this, please do submit a report to the issue tracker for the actions/download-artifact as I agree that it will be a common use pattern for those of us who are migrating from using a single artifact to multiple artifacts for transferring files from one job to another.

if there was some way to use a glob to express the artifacts to download

I agree. This request is already tracked at actions/download-artifact#6

@henryiii
Copy link

henryiii commented Dec 15, 2023

The problem with repeating the action over and over is that for a normal matrix for coverage or wheel builds, there might now be dozens of artifacts (one per Python version × OS × arch × PyPy/CPython, for example). Also, if one or more jobs fail, you still might want to upload coverage, which means you'd also need to do something like always(). Last I checked I think cibuildwheel could produce about 70+ different wheels if you targeted everything it can handle1. You can also dynamically generate the matrix, see https://iscinumpy.dev/post/cibuildwheel-2-10-0/#only, which is how MyPyC does it. A normal coverage set is about 20 or so (5 Python versions + 3 PyPy versions × 3 OSs).

Footnotes

  1. I'm absolutely describing "worst case" - often (and for the default cibuildwheel examples), it's just 3 OSs, with all the wheels being built in one job per OS. Plus an SDist job. So 4 artifacts, which isn't terrible listed out.

@webknjaz
Copy link

(my attempt to summarize the important pain point contexts)

@per1234 your example features a linear workflow with a known static artifact names, but this issue refers to a different practice of CI configuration which involves matrixes.

Note all other people who commented here, combined maintain hundreds of workflows too. What unites us is that we all come from the Python ecosystem and rely on the ability to upload a number of artifacts from jobs in matrixes of varying sizes and different factor naming strategies. We essentially need a way to combine many files coming from different jobs (not necessarily one file per job, though — but usually, all uploaded in one go) in a "synchronization point" — a job that waits for that matrix and does something with those files.

Two major contexts in which this is happening are:

  1. Publishing a bunch of Python distribution packages to PyPI — this has to happen together for all files, once we know all the previous jobs are complete.
  2. Combining coverage data produced in separate jobs into one, for determining if the overall metrics meet the configured threshold.

With different artifact names, we'd need to loop over the matrix artifacts somehow to get all those files together. And GitHub Actions workflows don't have syntax for such loops. Moreover, this means every project reinventing a wheel of naming the workflows differently and figuring out how to extract only matching ones into that sync job.

@ofek
Copy link

ofek commented Dec 17, 2023

I don't have time to experiment with upgrading this action this weekend but I plan to manually download like this https://github.com/WordPress/openverse/blob/00a26a65e93fe8ea2785d4a26c3ebce2efc935db/.github/actions/load-img/action.yml#L26-L35

However, rather than an enumeration I will perform pattern matching after I figure out the APIs.

@webknjaz
Copy link

I don't have time to experiment with upgrading this action this weekend but I plan to manually download like this https://github.com/WordPress/openverse/blob/00a26a65e93fe8ea2785d4a26c3ebce2efc935db/.github/actions/load-img/action.yml#L26-L35

However, rather than an enumeration I will perform pattern matching after I figure out the APIs.

I also have some hacking ideas with random names, pattern matching and CLI. Working on a composite action based experiment...

@TWiStErRob
Copy link

There are mentions of separate upgrades of v3/v4 here breaking and I've had problems in many of my repositories. If anyone is using Renovate and has incompatibility problems, this rule can fix the problem easily (I added it to my user level Renovate config to affect all repos at once.):

"packageRules": [
	{
		"groupName": "actions/upload-artifact and actions/download-artifact",
		"description": "Artifact management is coupled in GitHub Actions. https://github.blog/changelog/2023-12-14-github-actions-artifacts-v4-is-now-generally-available/",
		"matchManagers": ["github-actions"],
		"matchPackageNames": [
			"actions/upload-artifact",
			"actions/download-artifact"
		],
		"matchUpdateTypes": ["major"]
	}
]

Note: this above is really explicit, it could work without matching updateTypes and managers too, if you want to generalize.

@per1234
Copy link

per1234 commented Dec 17, 2023

your example features a linear workflow with a known static artifact names

@webknjaz as indicated by the quote in that comment, my example is specifically in response to @henryiii's comment implying that the actions/download-artifact is unable to download multiple artifacts to the same path:

If there was some way for download-artifact to extract multiple artifacts into a single directory

The sole purpose of my example was to supplement my comment that I have found that to be untrue. The actions/download-artifact is able to download multiple artifacts to the same path.

You seem to somehow be interpreting my comment as being intended to dispute the impact of the change of only allowing one upload per artifact. It was not.

Note all other people who commented here, combined maintain hundreds of workflows too. What unites us is that we all come from the Python ecosystem

That is false. I also maintain hundreds of workflows that were broken by this change and these workflows have nothing to do with Python. It is counterproductive to imply that this impact is specific to Python. The impact is much more widespread than that.

@webknjaz
Copy link

@per1234 I apologize! I meant to augment the context, not dismiss your input. I did indeed misinterpret your comment, though, and felt like it turns the discussion away from the most problematic part of the breaking change. I didn't mean to imply that this problem is only present in our ecosystem, just wanted to give the context of who all those other people besides you.

@webknjaz
Copy link

webknjaz commented Dec 18, 2023

@ofek @henryiii @hynek @woodruffw so this is the UX I'm shooting for: https://github.com/webknjaz/re-actors--download-artifact--test/actions/runs/7242013168/workflow#L38-L42. Feel free to lurk and experiment, but don't rely on the Git identifiers — I'll make my usual versioning setup with immutable tags and rolling branches and will probably force-push the current thing a couple of times later today, before declaring it ready. It gets artifacts matching the wildcard into a temporary directory and then moves the underlying dirs into the user-requested path.

I'll also rename the action repo — originally, I thought of it as a wrapper for the actions/download-artifact action but I ended up not calling it and calling GH CLI directly instead. Keeping it as is would be misleading. Still, it'll keep using the original action's inputs list (with the same names and defaults). With that in mind, I was thinking of something like "fuse" / "meld" / "merge" / "combine" / "squash" and/or maybe "group" in the name. E.g. re-actors/meld-artifact-group.
Accepting the naming ideas until I'm finished with the polishing. Also, need a name for the marketplace thing that may be different (but not necessarily).

@burnettk
Copy link

burnettk commented Feb 2, 2024

i was following the https://github.com/actions/download-artifact/blob/main/docs/MIGRATION.md, which talks about setting the artifact name to something like my-artifact-${{ matrix.runs-on }}. this won't actually work, right? this results in artifact names like "my-artifact-" for me, which seems to imply matrix.runs-on is not a thing you can call in this way.

@TWiStErRob

This comment was marked as resolved.

@burnettk
Copy link

burnettk commented Feb 2, 2024

ok, fair enough, thanks.

@henryiii
Copy link

henryiii commented Feb 2, 2024

That works because the matrix in the example has a runs-on key. You need to set it to your key(s), or you can use the less-descriptive but universal strategy.job-index, which works for any matrix but just uses numbers.

@henryiii
Copy link

henryiii commented Feb 2, 2024

(https://learn.scientific-python.org/development/guides/repo-review/ will even suggest this if you put in a repo that would fail with v4)

@jsoref
Copy link

jsoref commented Feb 2, 2024

#472 (comment) feedback is valid.

Conceptually, here's a change to (hopefully) improve that experience:

 In v4, Artifacts are immutable (unless deleted).
-So you must change each of the uploaded Artifacts to have a different name
+So you give each uploaded Artifact a distinct name
+(in this example the matrix has a `.runs-on` property which is used to provide a distinct name,
+but how you choose a distinct name will vary)
 and filter the downloads by name to achieve the same effect:

@nowakweronika
Copy link

In v4, Artifacts are immutable. So you must change each of the uploaded Artifacts to have a different name:

name: Upload unit tests results
  if: always()
- uses: actions/upload-artifact@v3
+ uses: actions/upload-artifact@v4
  with:
-     name: test-results
+     name: test-results-unit
          path: |
          */build/test-results/unit
name: Upload instrumentation tests results
  if: always()
- uses: actions/upload-artifact@v3
+ uses: actions/upload-artifact@v4
  with:
-     name: test-results
+     name: test-results-instrumentation
          path: |
          */build/test-results/instrumentation

And then you can merge it:

merge:
  runs-on: ubuntu-latest
  needs:
    - android-tests
    - unit-tests
  steps:
    - name: Merge Artifacts
      uses: actions/upload-artifact/merge@v4
      with:
        name: test-results
        pattern: test-results-*

You can see full PR here: https://github.com/appunite/Loudius/pull/190/files

@rcdailey
Copy link

rcdailey commented Mar 9, 2024

Version 4 has been ultimately a downgrade for me, as it removes the ability to replace archives. I have a job that runs in my workflow for only certain architectures that does apple code signing. In this case, it downloads a artifact (which contains a binary), signs it, and then re-uploads the artifact, replacing the original. But I cannot do this with v4. The merge action does not address this use case either.

My artifacts are named after architectures in a matrix build; for example: linux-x64, win-x64, osx-x64. It's important that the names remain the same, even if they get replaced, because my final release job iterates the artifacts and attaches them to a Github Release. So it needs to be able to find osx-x64, not something like osx-x64-signed, because I won't have equivalent versions for linux, for example (e.g. linux-x64-signed doesn't make sense). The name the artifacts start with is the name they have to end with.

Immutable packages don't need to change, but we should have the ability to replace them.

@ofek
Copy link

ofek commented Mar 9, 2024

The merge action does not address this use case either.

I just successfully upgraded to v4 a few days ago where I do the exact same thing that you're talking about... pypa/hatch@03b1bca#diff-c4330c2cfbadba3e17aef4105738e5644a48e08e86096a32e2a6625fcd83b957

rcdailey added a commit to recyclarr/recyclarr that referenced this issue Mar 9, 2024
The Apple Signing step grabs an existing artifact, signs the binaries,
and then re-uploads the artifact, replacing the original one. The
upgrade to v4 of the `actions/upload-artifact` action broke this
behavior due to a breaking change documented in [the release notes][1]:

> Due to how Artifacts are created in this new version, it is no longer
> possible to upload to the same named Artifact multiple times. You must
> either split the uploads into multiple Artifacts with different names,
> or only upload once. Otherwise you will encounter an error.

I left [a comment][2] on a relevant issue as to why I am not able to
upgrade to v4 for now.

This change simply reverts the action back to v3 for now, even though
this will still cause a Node v16 deprecation warning in workflow runs.

[1]: https://github.com/actions/upload-artifact/blob/main/README.md#breaking-changes
[2]: actions/upload-artifact#472 (comment)
@rcdailey
Copy link

rcdailey commented Mar 9, 2024

The merge action does not address this use case either.

I just successfully upgraded to v4 a few days ago where I do the exact same thing that you're talking about... pypa/hatch@03b1bca#diff-c4330c2cfbadba3e17aef4105738e5644a48e08e86096a32e2a6625fcd83b957

Your workflow file appears to be structured differently than mine. Here is how I'm doing things:
https://github.com/recyclarr/recyclarr/blob/master/.github/workflows/build.yml

Taking a peek at your workflow, it looks like you make staged- artifacts for every platform, including macos. However in my case, I do not have additional preparation work to do, so technically only macos needs staging artifacts. Upgrading to v4 shouldn't force me to create unnecessary staging/temporary artifacts when in v3 a simple "reupload" was sufficient for the special cases.

@jsoref
Copy link

jsoref commented Mar 10, 2024

@rcdailey there's now a overwrite: true option...

rcdailey added a commit to recyclarr/recyclarr that referenced this issue Mar 10, 2024
The Apple Signing step grabs an existing artifact, signs the binaries,
and then re-uploads the artifact, replacing the original one. The
upgrade to v4 of the `actions/upload-artifact` action broke this
behavior due to a breaking change documented in [the release notes][1]:

> Due to how Artifacts are created in this new version, it is no longer
> possible to upload to the same named Artifact multiple times. You must
> either split the uploads into multiple Artifacts with different names,
> or only upload once. Otherwise you will encounter an error.

I left [a comment][2] on a relevant issue as to why I am not able to
upgrade to v4 for now.

This change simply reverts the action back to v3 for now, even though
this will still cause a Node v16 deprecation warning in workflow runs.

[1]: https://github.com/actions/upload-artifact/blob/main/README.md#breaking-changes
[2]: actions/upload-artifact#472 (comment)
@rcdailey
Copy link

@rcdailey there's now a overwrite: true option...

How embarrassing; I completely missed this in the documentation. Thank you very much for pointing it out. I tested with this setting and it works great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests