Fix TarReader handling of GNU sparse format 1.0 (PAX) — resolve GNU.sparse.name and GNU.sparse.realsize#125283
Draft
Fix TarReader handling of GNU sparse format 1.0 (PAX) — resolve GNU.sparse.name and GNU.sparse.realsize#125283
Conversation
Contributor
|
Tagging subscribers to this area: @dotnet/area-system-io |
…rse.name and GNU.sparse.realsize Co-authored-by: lewing <24063+lewing@users.noreply.github.com>
…sertions for data stream integrity Co-authored-by: lewing <24063+lewing@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix TarReader to handle GNU sparse format 1.0 correctly
Fix TarReader handling of GNU sparse format 1.0 (PAX) — resolve GNU.sparse.name and GNU.sparse.realsize
Mar 6, 2026
This was referenced Mar 7, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TarReaderwas ignoringGNU.sparse.nameandGNU.sparse.realsizePAX extended attributes, causing ~46% of entries from bsdtar-created archives (e.g., .NET SDK tarballs built on macOS/APFS) to expose internal placeholder paths likeGNUSparseFile.0/real-file.dlland incorrect sizes.Changes
TarHeader.csPaxEaGnuSparseName(GNU.sparse.name) andPaxEaGnuSparseRealSize(GNU.sparse.realsize) constants_gnuSparseRealSizefield, separate from_size(which drives archive data stream reading) to avoid corrupting stream positioning_gnuSparseRealSizein the copy constructor used for format conversionTarHeader.Read.cs—ReplaceNormalAttributesWithExtended()path→_name, override withGNU.sparse.nameif present (replaces theGNUSparseFile.0/…placeholder with the real path)size→_size, captureGNU.sparse.realsizeinto_gnuSparseRealSizewithout touching_sizeTarEntry.csLengthreturns_gnuSparseRealSizewhen set, otherwise falls back to existing behaviorTest —
TarReader.GetNextEntry.Tests.csGnuSparse10Pax_NameAndLengthResolvedFromExtendedAttributes(bothcopyDatavariants): verifies resolved name, real size fromGNU.sparse.realsize, and thatDataStreamstill contains only the stored sparse bytes (confirming_sizewas not overridden)Original prompt
This section details on the original issue you should resolve
<issue_title>TarReader doesn't handle GNU sparse format 1.0 (PAX) - exposes GNUSparseFile.0 placeholder paths</issue_title>
<issue_description>## Description
System.Formats.Tar.TarReaderdoes not handle GNU sparse format 1.0 entries encoded via PAX extended attributes. When reading such entries,TarEntry.Namereturns the internal placeholder path (containingGNUSparseFile.0) instead of the real file name, andTarEntry.Lengthreturns the stored (sparse) size rather than the real file size.GNU sparse format 1.0 stores the real name and size in PAX extended attributes:
GNU.sparse.name— the real file pathGNU.sparse.realsize— the real file sizeTarHeader.ReplaceNormalAttributesWithExtended()processes standard PAX attributes likepath,size,mtime, etc., but does not processGNU.sparse.nameorGNU.sparse.realsize.How this occurs in practice
macOS ships bsdtar (libarchive), which detects sparse files by default during archive creation. .NET DLLs on APFS have zero-filled PE alignment sections that APFS stores as filesystem holes, causing bsdtar to treat them as sparse and encode them with the GNU sparse PAX format.
The tar command producing the affected archive was:
When .NET's
TarReaderreads these archives, ~46% of entries have incorrect names containingGNUSparseFile.0.Reproduction Steps
Option 1 — With an affected tar.gz file
Download an affected tarball (a .NET SDK built on macOS):
dotnet-sdk-11.0.100-ci-osx-x64.tar.gz
Then run the repro program (below) against it.
Option 2 — Create a sparse tar.gz on macOS
On a Mac, create a sparse file and archive it:
Then read it on any platform with the repro program below.
Repro Program
Program.cs:
tar-repro.csproj:
Expected behavior
For entries with
GNU.sparse.nameandGNU.sparse.realsizePAX extended attributes:entry.Nameshould return the value ofGNU.sparse.name(e.g.,./shared/Microsoft.NETCore.App/11.0.0-ci/Microsoft.CSharp.dll)entry.Lengthshould return the value of `GNU.sparse.r...🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.