Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache trying to use git for windows' tar.exe on self-hosted runner, failing to find correct gzip #1311

Open
jeremyd2019 opened this issue Jan 12, 2023 · 28 comments
Labels
bug Something isn't working

Comments

@jeremyd2019
Copy link

jeremyd2019 commented Jan 12, 2023

On a self-hosted Windows runner, I just noticed that caches are failing now with errors like:

"C:\Program Files\Git\usr\bin\tar.exe" --posix -cf cache.tgz --exclude cache.tgz -P -C C:/runner/_work/msys2-autobuild/msys2-autobuild --files-from manifest.txt --force-local -z
/bin/sh: line 1: gzip: command not found
/usr/bin/tar: cache.tgz: Cannot write: Broken pipe
/usr/bin/tar: Child returned status 127
/usr/bin/tar: Error is not recoverable: exiting now
Warning: Failed to save: "C:\Program failed with error: The process 'C:\Program Files\Git\usr\bin\tar.exe' failed with exit code 2

Caches used to work fine, using the version of tar.exe in system32. My understanding is that the programs in git for windows usr\bin are supposed to be regarded as internal to Git for Windows and should not be relied upon by external things like this. @dscho?

Current theory is that Git for Windows' tar is looking on the path for gzip, either not finding it in my case, or in other cases finding one incompatible with that tar (due to being based on a different Cygwin).

@jeremyd2019 jeremyd2019 added the bug Something isn't working label Jan 12, 2023
@jeremyd2019
Copy link
Author

Maybe it is failing to find gzip because git for windows' usr\bin is not on my path? But it really shouldn't be on the path.

@me-and
Copy link

me-and commented Jan 12, 2023

I just came to report what is, I suspect, the same bug. Do you have Cygwin installed on these runners?

The problem I'm seeing is that Git for Windows is based on MSYS2, which is downstream from Cygwin, and in particular is based on Cygwin DLLs that are a few years old. If you have a more recent regular Cygwin installation on the same system, and it's in the PATH, the caching code will still run the GfW tar, but that will try to link against the Cygwin DLL from your Cygwin installation, and promptly fall over with the error you're seeing.

Assuming it is the same problem, I've just created a simple testcase using GitHub hosted runners, at https://github.com/me-and/repro-cygwin-cache-woe.

@jeremyd2019
Copy link
Author

jeremyd2019 commented Jan 13, 2023

I don't have cygwin on the path, but I could see where that could cause problems too.

What's worse, we just discovered this over at MSYS2:

exportVariable('MSYS', 'winsymlinks:nativestrict')

This exports the variable for all subsequent steps too, switching the default behavior of any MSYS2 used later in the workflow.

So there are at least two pretty major issues with this change:

  1. tar is looking on the path for the compression program, which may not be on the path, or may be an incompatible variant (such as one from a Cygwin or different MSYS2 install).
  2. the code here is setting (and/or potentially overriding an existing setting) an environment variable globally, affecting all future actions unexpectedly (based on whether something used a cache prior to those steps)

lazka added a commit to msys2/msys2-autobuild that referenced this issue Jan 14, 2023
@lazka

This comment was marked as outdated.

lazka added a commit to lazka/setup-msys2 that referenced this issue Jan 14, 2023
@lazka
Copy link
Contributor

lazka commented Jan 14, 2023

I've created a separate issue for the env var issue, to not sidetrack this issue: #1312

lazka added a commit to msys2/setup-msys2 that referenced this issue Jan 14, 2023
@me-and
Copy link

me-and commented Jan 14, 2023

The problem I'm seeing is that Git for Windows is based on MSYS2, which is downstream from Cygwin, and in particular is based on Cygwin DLLs that are a few years old.

No, MSYS2 is based on the latest Cygwin.

"A few years" was an overstatement (I think I got confused by the the calendar turning over, and overcompensated), but I believe Git for Windows is, as of v2.38.0.windows.1, based on Cygwin 3.3.6, while the latest upstream Cygwin release is v3.4.3.

Downgrading to use Cygwin v3.3.6 gets the tar call working, so I'm pretty sure that the Cygwin DLL compatibility is the problem (although I don't think just switching library versions is the correct solution).

@dscho
Copy link

dscho commented Jan 23, 2023

Git for Windows is based on MSYS2,

correct

which is downstream from Cygwin,

correct

and in particular is based on Cygwin DLLs that are a few years old.

Incorrect. Your information is more than 7 years old; Git for Windows v1.x was based on MSys which indeed was forked off of an ancient Cygwin version. But MSYS2 frequently updates to the latest Cygwin; Git for Windows lags behind a little to take off the edge of some of the more fragile developments, at the moment Git for Windows uses a derivative of Cygwin v3.3.6 (which was released on September 6th 2022, hardly "several years old").

the caching code will still run the GfW tar, but that will try to link against the Cygwin DLL from your Cygwin installation

That, too, is incorrect. Git for Windows' tar.exe links to msys-2.0.dll and will never "magically" link to cygwin1.dll. In other words, it will never pick up the DLL from the Cygwin installation.

So how about the MSYS2 installation that's in C:\msys64 on hosted runners? Git for Windows' tar.exe won't use C:\msys64\usr\bin\msys-2.0.dll either because there is a msys-2.0.dll right next to tar.exe, in C:\Program Files\Git\usr\bin, and that will be used.

My understanding is that the programs in git for windows usr\bin are supposed to be regarded as internal to Git for Windows and should not be relied upon by external things like this. @dscho?

I tried to make the case and I even almost got a change accepted to prevent the internal tools of Git for Windows to be picked up. However, too many users already relied on them tools to be in the PATH and we could not merge that change. I've made my peace with it.

@jeremyd2019
Copy link
Author

OK, in that case this bug is that it is trying to find a gzip.exe on the path (rather than right next to tar.exe), and either not finding one or finding one that doesn't work with that tar.exe. My theory for the cygwin situation is that the code for starting a cygwin process from a cygwin process was getting confused, rather than actually trying to load the wrong cygwin/msys dll.

@jeremyd2019 jeremyd2019 changed the title cache trying to use git for windows' tar.exe on self-hosted runner, failing cache trying to use git for windows' tar.exe on self-hosted runner, failing to find correct gzip Jan 23, 2023
@me-and
Copy link

me-and commented Jan 23, 2023

@dscho you're obviously correct about versions; I'd realised I'd vastly overstated the age of Git for Windows' builds, and corrected myself.

The thing that pointed me at Git for Windows / Cygwin compatibility is having run the same tests on GitHub runners using old Cygwin versions: when I built using an old Cygwin archive, the cache action worked. When I build using an archive that used v3.4.0, the cache action fails.

@jeremyd2019 gave me an idea that does work: if I remove Cygwin's zstd from the PATH before running the caching action, things seem to work. This seems baffling to me: why would Git for Windows tar be able to call Cygwin zstd when the cygwin1.dll is from 3.3.6, but not when it's from 3.4.0? The version of zstd doesn't change between the two, only the cygwin1.dll version.

I've demonstrated this behaviour with a test run at https://github.com/me-and/repro-cygwin-cache-woe/actions/runs/3990249408. This is a matrix test, with the following variables:

  • Should Cygwin be added to the system PATH?
  • Should Cygwin's zstd be renamed before running the cache action?
  • What Cygwin mirror should be used, between 3 Dec 2022 (Cygwin v3.3.6), 4 Dec 2022 (Cygwin v3.4.0) or the latest from mirrors.kernel.org (currently v3.4.5).

When either (a) Cygwin isn't in the PATH, or (b) Cygwin is in the PATH but Cygwin's zstd isn't, everything works reliably. When using the the v3.4.x builds, I see the "Broken pipe" error. When using older Cygwin builds, I'm fairly sure I've seen it Just Work™ previously, but I'm now seeing a different error: "0 [main] zstd (2092) C:\cygwin\bin\zstd.exe: *** fatal error - cygheap base mismatch detected - 0x210351408/0x18034C408. // This problem is probably due to using incompatible versions of the cygwin DLL."

@dscho
Copy link

dscho commented Jan 24, 2023

this bug is that it is trying to find a gzip.exe on the path

@jeremyd2019 I guess it depends on your point of view whether you consider this a bug. The cache library clearly expects tar to be in the PATH, why not also gzip? The easiest fix for you might be to set your runner up with a gzip in the PATH.

I've demonstrated this behaviour with a test run at https://github.com/me-and/repro-cygwin-cache-woe/actions/runs/3990249408.

Curious. Thank you for the record, that's very helpful. In https://github.com/me-and/repro-cygwin-cache-woe/actions/runs/3990249408/jobs/6843755788#step:5:8 you clearly see the problem: C:\Program Files\Git\usr\bin\tar.exe tries to call Cygwin's zstd.exe and then runs into the woes where it detects a Cygwin heap mismatch.

What throws me is that I seem to remember that we go out of our way in the MSYS2 runtime to differentiate enough from the Cygwin runtime so that the MSYS2 runtime's heap is not mistaken for a Cygwin runtime heap.

However, the Cygwin runtime startup code run as part of C:\cygwin\bin\zstd.exe's startup clearly misidentifies the MSYS2 runtime heap for a Cygwin heap. Maybe I misremember? Well, let's see.

clicketyclick

Half an hour later, I am even more puzzled than before. We do call hook_or_detect_cygwin() to detect whether the spawned executable is a Cygwin (or in MSYS2's case, and MSYS) program. And there, we clearly look for the correct DLL (and should not pick up executables linking to "the other DLL"): https://github.com/msys2/msys2-runtime/blob/108a4aca5610d4e4d74caaa65fce3342e36fd10e/winsup/cygwin/hookapi.cc#L378-L387

@me-and
Copy link

me-and commented Jan 25, 2023

this bug is that it is trying to find a gzip.exe on the path

@jeremyd2019 I guess it depends on your point of view whether you consider this a bug. The cache library clearly expects tar to be in the PATH, why not also gzip? The easiest fix for you might be to set your runner up with a gzip in the PATH.

So this bit isn't true, at least as of the latest versions of the library, which are hard-coded to use Git for Windows' tar, despite then not specifying a path for zstd.

Possibly at least part of the fix here is for the action to use an explicit, hard-coded path to zstd or gzip, at least when it's also using an explicit hard-coded path to tar.

@jeremyd2019
Copy link
Author

So this bit isn't true, at least as of the latest versions of the library, which are hard-coded to use Git for Windows' tar, despite then not specifying a path for zstd.

Possibly at least part of the fix here is for the action to use an explicit, hard-coded path to zstd or gzip, at least when it's also using an explicit hard-coded path to tar.

I think their intent is to look for zstd on the PATH, I don't think GfW includes a /usr/bin/zstd - it doesn't really have a reason to need it, and that /usr/bin is really just for things necessary for git. I think they intend to find the zstd from https://github.com/actions/runner-images/blob/main/images/win/scripts/Installers/Install-Zstd.ps1 on the PATH and use that.

I also think it's kind of silly to be sitting here guessing what their intentions are in an issue on their repository, that no developer has commented on. I don't know if they're stumped and hoping we puzzle out a solution, or if they just don't care about these cases.

@me-and
Copy link

me-and commented Jan 26, 2023

So this bit isn't true, at least as of the latest versions of the library, which are hard-coded to use Git for Windows' tar, despite then not specifying a path for zstd.
Possibly at least part of the fix here is for the action to use an explicit, hard-coded path to zstd or gzip, at least when it's also using an explicit hard-coded path to tar.

I think their intent is to look for zstd on the PATH, I don't think GfW includes a /usr/bin/zstd - it doesn't really have a reason to need it, and that /usr/bin is really just for things necessary for git. I think they intend to find the zstd from https://github.com/actions/runner-images/blob/main/images/win/scripts/Installers/Install-Zstd.ps1 on the PATH and use that.

GfW doesn't have zstd, no, and I'm pretty sure you're right that the code is going to be picking up zstd from that script on GitHub hosted runners at least. I suspect there isn't much deliberate intent here, though, and an appropriate and effective solution would be to hard-code the path to that zstd executable, just as the path to the tar executable is hardcoded.

If nobody else gets to it, I'll look at writing up a patch to do that when I get a chance. Although "when I get a chance" is probably not going to be until next month at the earliest, and I'd be very happy for someone else to do the work…

@dscho
Copy link

dscho commented Feb 1, 2023

What throws me is that I seem to remember that we go out of our way in the MSYS2 runtime to differentiate enough from the Cygwin runtime so that the MSYS2 runtime's heap is not mistaken for a Cygwin runtime heap.

However, the Cygwin runtime startup code run as part of C:\cygwin\bin\zstd.exe's startup clearly misidentifies the MSYS2 runtime heap for a Cygwin heap. Maybe I misremember? Well, let's see.

clicketyclick

Half an hour later, I am even more puzzled than before. We do call hook_or_detect_cygwin() to detect whether the spawned executable is a Cygwin (or in MSYS2's case, and MSYS) program. And there, we clearly look for the correct DLL (and should not pick up executables linking to "the other DLL")

I believe that I have identified the issue and implemented a viable work-around in git-for-windows/msys2-runtime#48. If you want to verify this claim, please install a Git for Windows snapshot instead of the official Git for Windows release on the runner (snapshots are very similar to official Git for Windows releases, they are even code-signed by me, the only thing setting them apart is that the snapshots have "funny" version numbers reported by git version).

@me-and
Copy link

me-and commented Feb 3, 2023

@dscho I'll check this as soon as I can, thank you! There's probably going to be some delay – I've had some urgent personal issues come up that have taken priority over a lot of my life – but it's on my to-do list.

@jeremyd2019
Copy link
Author

For the record, the original issue I reported still occurs with GfW 2.40.0.windows.1 installed. I did not expect that to help, because in this case the error is not the result of finding another "cygwin" gzip.exe, but in not finding a gzip.exe on the PATH at all. In this case it really should look next to tar.exe for gzip.exe.

@dscho
Copy link

dscho commented Mar 20, 2023

@jeremyd2019 since the tar.exe we're talking about comes from MSYS2, and since you're a frequent contributor to that project, how about teaching that tar package the trick to append the directory containing tar.exe to the PATH when searching for gzip.exe (or for that matter, any (de-)compressor)?

@OscarCarrilloTR
Copy link

does anybody have fixed this warning with gzip in their self-hosted servers with Windows?

@jeremyd2019
Copy link
Author

the reason this doesn't happen on the github-hosted runners I think is because they have a zstd.exe on the PATH. Doing that would probably take care of it. What's buggy is the supposed fallback case when zstd is not present and it should use gzip, it doesn't take into account that GNU tar looks for gzip on the PATH too, rather than being linked in like in the bsdtar that it used to use from Windows\System32. There is a gzip.exe, right next to the tar.exe this code took the effort to track down inside Git for Windows, but it is not on the PATH, they should pass the full path to gzip.exe to tar most likely.

@annlumia
Copy link

annlumia commented Sep 6, 2023

Footprint

@xv-aleksandr-b
Copy link

Any reliable workarounds?

@jeremyd2019
Copy link
Author

if I cared about the cache, I'd put zstd.exe on the PATH on my runner

@xv-aleksandr-b
Copy link

if I cared about the cache, I'd put zstd.exe on the PATH on my runner

it worked for me. Thanks!

@xv-aleksandr-b
Copy link

Update: cache, created on the self-hosted windows runner with mentioned zstd hack, cannot be used somehow. When I try to retrieve it, I get Error: Failed to restore cache entry. Exiting as fail-on-cache-miss is set
Switching to the cloud image. e.g. windows-latest fixes the issue.

@OscarCarrilloTR
Copy link

if I cared about the cache, I'd put zstd.exe on the PATH on my runner

it worked for me. Thanks!

What is the correct path for zstd.exe? @xv-aleksandr-b

@xv-aleksandr-b
Copy link

xv-aleksandr-b commented Dec 4, 2023

if I cared about the cache, I'd put zstd.exe on the PATH on my runner

it worked for me. Thanks!

What is the correct path for zstd.exe? @xv-aleksandr-b

I've just downloaded the latest version from the FB repo, put it in the custom folder and added to the PATH env variable.

@lukedukeus
Copy link

Update: cache, created on the self-hosted windows runner with mentioned zstd hack, cannot be used somehow. When I try to retrieve it, I get Error: Failed to restore cache entry. Exiting as fail-on-cache-miss is set Switching to the cloud image. e.g. windows-latest fixes the issue.

Currently facing the same issue, has anyone been able to get cache working on a self hosted runner?

@lukedukeus
Copy link

Update: cache, created on the self-hosted windows runner with mentioned zstd hack, cannot be used somehow. When I try to retrieve it, I get Error: Failed to restore cache entry. Exiting as fail-on-cache-miss is set Switching to the cloud image. e.g. windows-latest fixes the issue.

Currently facing the same issue, has anyone been able to get cache working on a self hosted runner?

🤦 restarting the runner fixed the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants