Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tar corrupts archive contents on macOS #2619

Closed
1 of 7 tasks
mzabaluev opened this issue Feb 4, 2021 · 40 comments
Closed
1 of 7 tasks

tar corrupts archive contents on macOS #2619

mzabaluev opened this issue Feb 4, 2021 · 40 comments
Assignees
Labels
Area: Common Tools bug report investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: macOS

Comments

@mzabaluev
Copy link

mzabaluev commented Feb 4, 2021

Description
The bsdtar binary installed in the virtual environment produces tarballs with corrupted file contents.
This was first observed months ago:
actions/upload-artifact#151
https://github.community/t/bizarre-issue-with-tar-archive-for-macos-build/145377

The silent corruption resulted in our GitHub workflows producing unusable release assets for months before anyone noticed, and cost some in our team half a day of rubbing our eyes, running our build system locally to check for broken output, and questioning our sanity. #1534 provides a band-aid by installing GNU tar into a location that is not listed in PATH, but it assumes that the workflow users are either aware about this bug or have other reasons not to use bsdtar.

Area for Triage:
Apple

Question, Bug, or Feature?:
Bug

Virtual environments affected

  • Ubuntu 16.04
  • Ubuntu 18.04
  • Ubuntu 20.04
  • macOS 10.15
  • macOS 11.0
  • Windows Server 2016 R2
  • Windows Server 2019

Image version
20210130.1

Expected behavior
The tarballs produced by tar contain files with their original content intact.

Actual behavior
The tarball created to pack some binaries is unpacked to corrupted files that cannot be executed.

Repro steps
This workflow:

https://github.com/mzabaluev/jormungandr/blob/2d9ad4c97a7051d2a19bec5bfc42c36243d37f23/.github/workflows/release.yml#L273

The resulting macOS asset with corrupted binaries: https://github.com/mzabaluev/jormungandr/releases/download/nightly.20210204/jormungandr-0.10.0-nightly.20210204-x86_64-apple-darwin-generic.tar.gz

@mzabaluev
Copy link
Author

This incident, and the duration for which it has remained unfixed since it has been reported and discussed in the community forums, disturbs me enough that I would like the resolution to include a post-analysis on how the problem has occurred.

@maxim-lobanov
Copy link
Contributor

@mzabaluev, bsdtar is pre-installed on clean macOS system by Apple and it is not something that we install during image-generation from our side. So I guess it is better to report this issue on Apple forum.
Also images contain GNU tar so you can use it instead of bsdtar.

@mzabaluev
Copy link
Author

@maxim-lobanov I admit it was a bit presumptive of me to think that Apple would not let that kind of breakage go unnoticed due to having actually good QA... But I appear to be right: tar on my laptop macOS 11.2, with the same version as reported by the virtual environment, does not corrupt the locally built binaries on archival.

I don't have a macOS 10.15 installation at hand to verify there, but I assume that if this issue was present on Macs in general, we would have heard of this.

@Darleev
Copy link
Contributor

Darleev commented Feb 5, 2021

Hello @mzabaluev,
Could you please provide minimal steps to reproduce the issue to speed up the investigation?
We are looking forward to your reply.

@Darleev Darleev added OS: macOS Area: Common Tools investigate Collect additional information, like space on disk, other tool incompatibilities etc. and removed needs triage labels Feb 5, 2021
@spinicist
Copy link

Hello,
I reported this issue to Github support back in December. Links to my reports are in the top post.

@maxim-lobanov I can confirm that tar on my multiple physical Mac machines works as expected. It is only on the Github runners that this problem manifests.

@Darleev The minimal steps appear to be running an Action that uses the tar command on a Mac runner. I assume it is possible for someone at Github to ssh into a runner and run interactive commands? I think it would be instructive to see what the output of running tar on some files like this would be.

@andy-mishechkin andy-mishechkin self-assigned this Feb 5, 2021
@mzabaluev
Copy link
Author

It does not seem to be straightforward to repro: a test run with archiving a smaller hello world binary had it come out as Mach-O (MacOS won't let me run it after unpacking it from the downloaded artifact, so it's possible that it is still corrupted). With the larger build linked in the description, the files are described only as "data" by file, and the redundancy of their content is a lot more than what uncorrupted binaries should have judging by the archive size being about 1/3 of normal.

@mzabaluev
Copy link
Author

Will try with some larger binary from the system and checksums, some time later tonight.

@mzabaluev
Copy link
Author

Tried archiving clang and ld copied from /Library/Developer/CommandLineTools/usr/bin, no corruption seen with these files.

@andy-mishechkin
Copy link
Contributor

Hello, @mzabaluev
I've tried to reproduce your archive and have checked hashes of original and extracted files - all works without any errors:
https://github.com/andy-mishechkin/Azure-PipeLine-Test/runs/1862611131?check_suite_focus=true
Workflow yaml is here:
https://github.com/andy-mishechkin/Azure-PipeLine-Test/blob/master/.github/workflows/test.yml
May you provide some other repro steps where you've got corrupt archive. And did you get this corruption regularly or sometimes ?
Thank you

@Cyberbeni
Copy link

actions/cache#403 (comment)

What makes it really weird...if I add sleep 10 after cargo test and before creating the tar, it works fine.

@mzabaluev
Copy link
Author

It seems to occur with binary outputs of two compiler/linker toolchains based on LLVM: clang and rustc.
It does not occur with every output, either: a small hello-world binary is not corrupted.

@mzabaluev
Copy link
Author

My only plausible guess is that the virtual filesystem used by the macOS runner has a bug that is triggered by specific file access patterns of the linker and bsdtar.

@mzabaluev
Copy link
Author

mzabaluev commented Feb 10, 2021

May you provide some other repro steps where you've got corrupt archive. And did you get this corruption regularly or sometimes ?

The workflow linked in the description has been producing corrupt binaries regularly since at least the end of October. It seems to not corrupt with 100% regularity though: this release had two tarballs with macOS binaries built with different CPU optimization settings; only one of the archives got its contents corrupted.

@spinicist
Copy link

Thanks @mzabaluev for digging this deep!

@maxim-lobanov
Copy link
Contributor

@mzabaluev , could you please add sleep 10 after cargo / rust invocation before using tar to check if it helps as @Cyberbeni mentioned above?

It looks like that process that creates / builds files, blocks them for some additional time after execution. And tar tries to archive files that were not written fully yet. It explains why we can't reproduce it with tar directly and why sleep 10 helps to resolve the issue.

We don't tweak filesystem somehow from our side during image-generation so there shouldn't be any difference with local system.

Does it reproduce for you only with Rust? We still can't reproduce the issue on small project.

@Cyberbeni
Copy link

(I also had this issue using the @actions/cache npm package in my GitHub Action after building with the Swift compiler which is also LLVM based, fixed with v1.0.6 since that uses gnu-tar on macOS)

@mzabaluev
Copy link
Author

@mzabaluev , could you please add sleep 10 after cargo / rust invocation before using tar to check if it helps as @Cyberbeni mentioned above?

It seems to eliminate the corruption, indeed. In a run with the same steps except sleep 10, tar archives file contents that are different from what shasum had read just prior.

It looks like that process that creates / builds files, blocks them for some additional time after execution. And tar tries to archive files that were not written fully yet.

I doubt it: the compiler process that has built the jormungandr binary exits some 7 minutes before tar is invoked, yet both binaries end up corrupted in the archive. cargo runs the compiler as a regular child process, it's not left running in the background when the command exits. It's not that the files are incomplete, either: the file utility does not recognize the Mach-O magic in the heads of the extracted files, meaning that their content is completely different. The size of both files also corresponds to what they would likely be originally, but gzip is able to compress the contents to about 1/3 of the expected size of the archive, meaning that the entropy of the corrupted content is much larger.

Perhaps the idle 10 seconds let some settlement happen in an OS or virtualization storage layer that is otherwise prevented by busy processes?

We don't tweak filesystem somehow from our side during image-generation so there shouldn't be any difference with local system.

You do run it under VMWare on a virtual disk volume, though.

Does it reproduce for you only with Rust? We still can't reproduce the issue on small project.

I was unable to reproduce it with a smaller Rust project, or by archiving copies of some large binaries installed in the system.

@mzabaluev
Copy link
Author

I would use DTrace to see how the linker and tar access the files, but it does not seem to be possible with the cloud-hosted runners.

@nikita-bykov nikita-bykov self-assigned this Feb 18, 2021
tjni added a commit to tjni/offstage that referenced this issue Feb 21, 2021
Use GNU tar due to a strange issue with BSD tar on OSX runners.

For more information, see: actions/runner-images#2619
@andy-mishechkin andy-mishechkin removed their assignment Feb 25, 2021
@dsame
Copy link
Contributor

dsame commented Mar 1, 2021

@mzabaluev it seems the problem is the build has not completed at the moment you are calculating the checksum and starting to tar the binaries.

I've just confirmed the 10.15 osx tar works as expected on the any artificial binary files generated with dd if=/dev/random and i suspect there's something wrong with the binaries produced by the build.

While i'm still investigating the cargo step can you please try to enforce the disabling any parallel build settings if there are any?

KonishchevDmitry added a commit to KonishchevDmitry/investments that referenced this issue Jan 5, 2022
umbynos added a commit to arduino/imgtool-packing that referenced this issue Feb 10, 2022
tar in macos is not working correctly on the ghactions hosted runners actions/runner-images#2619
umbynos added a commit to arduino/imgtool-packing that referenced this issue Feb 11, 2022
tar in macos is not working correctly on the ghactions hosted runners actions/runner-images#2619
@dsame
Copy link
Contributor

dsame commented Feb 16, 2022

The problem happens on archiving sparse files (common case is the binary created by llvm tools), apple tar removes the "holes" during the compressing and can not restore the executable in its original size.

There's no known workaround except either use GNU tar or try apply "-S" option with "-x" mode of apple tar.

@dsame dsame closed this as completed Feb 16, 2022
umbynos added a commit to arduino/imgtool-packing that referenced this issue Feb 16, 2022
tar in macos is not working correctly on the ghactions hosted runners actions/runner-images#2619
umbynos added a commit to arduino/imgtool-packing that referenced this issue Feb 16, 2022
tar in macos is not working correctly on the ghactions hosted runners actions/runner-images#2619
umbynos added a commit to arduino/imgtool-packing that referenced this issue Feb 18, 2022
* Update README.md

* add license, the same used in imgtool repo

* add patches, apply them on top of 1.8.0

0009 is used to patch cryptography version used because of: healthchecks/healthchecks#565

* [WIP] add first draft of release wf

[TODO] remove hardcoded version and use ${GITHUB_REF/refs\/tags\//}

add bash as default shell to find zip on win

use 7zip on win to archive, zip is not installed by default

remove ${{ github.workspace }} from win, it does not get expanded correctly

* fix mac archive being corrupted

tar in macos is not working correctly on the ghactions hosted runners actions/runner-images#2619

* add build using qemu and crosscompile with docker containers

* fix path of volume binding, without the absolute path the volume is empty

* fix permission problem: dist dir is created in the container with different user/grp

* try to fix armv6 and v7

* install all qemu platforms, the build time does not increase

* use version 4.2 of pyinstaller [it has the bootloader 🎉 for Linux-32-arm] pyinstaller/pyinstaller#6532 (comment)

* use arm32v5 instead of arm32v6 as target arch. Debian is not available for armv6.
arm32v6 arch should be able to run arm32v5 binaries

* test for pyinstaller guys

* bring back runner version because of glibc too recent

* try to run file produced by pyinstaller

* fix imgtool not starting. imgtool has to be run from `scripts/` folder and not from `imgtool/` one. Otherwise it will pick up the wrong main.py

* use `env.PROJECT_NAME`

* finalize CI: add correct trigger, create-release step, step names & cleanup

* Apply suggestions from code review

Co-authored-by: per1234 <accounts@perglass.com>

* better organize the print output

Co-authored-by: per1234 <accounts@perglass.com>

* use env vars to factor out path strings

Co-authored-by: per1234 <accounts@perglass.com>
Daniel-Bloom-dfinity added a commit to dfinity/icx-proxy that referenced this issue Mar 1, 2022
`tar` is [broken](actions/runner-images#2619) on macOS. Use `gtar` instead.
jneira added a commit to jneira/cabal that referenced this issue Mar 18, 2022
remko added a commit to remko/waforth that referenced this issue Jun 10, 2022
Hamuko added a commit to Hamuko/anifunnel that referenced this issue May 3, 2023
Hamuko added a commit to Hamuko/anifunnel that referenced this issue May 3, 2023
NullHypothesis added a commit to TileDB-Inc/TileDB-Go that referenced this issue Feb 29, 2024
Our macOS job's tar often fails with the error message "Can't restore
time".  The following issue's detective work revealed that the problem
is caused by Apple's tar choking on extracting sparse files:
actions/runner-images#2619

This commit adds the `-S` flag to tar, which fixes the problem.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Common Tools bug report investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: macOS
Projects
None yet
Development

No branches or pull requests