Skip to content

[release/10.0-preview4] Tar: Adjust the way we write GNU longlink and longpath metadata #114942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Apr 23, 2025

Backport of #114940 to release/10.0-preview4

/cc @carlossanlop

Customer Impact

  • Customer reported
  • Found internally

LongLink and LongPath entries in GNU tar entries should have a null terminator in their data section.

We found this issue in the Unified Build system, specifically in Debian 12. The Debian packaging tools only allow TAR files with entries in the GNU format. This Tar entry format writes the metadata differently than other Tar entry formats. We are seeing that the deb packaging steps are failing because the tar files we generate are considered 'malformed'.

We found that Debian 13 does not have this issue because it uses a version of libdpkg that has a fix that can handle LongLink and LongPath entry types where the data section is not null terminated: https://git.dpkg.org/cgit/dpkg/dpkg.git/diff/?id=2b0229b8f . Since libdpkg in Debian 12 does not have this fix, it is unable to properly read our entries, as we don’t consider an ending null character as part of the data section of these entries.

When comparing GNU entries we create and the GNU entries that other tools create, we also found other fields that can be hardened. LongLink and LongPath entry types are exclusive to the GNU format, and are metadata entries used to store longer-than-expected link or name strings (respectively). They always precede their actual entry. Users don’t really see this entry or directly interact with it, we take care of them internally. Most of the metadata fields in these entries are not used, but we need to make sure they are well formed so other tools can read them properly and don’t unexpectedly consider an entry as ‘malformed’:

Most of the regular header metadata fields in these entries are not relevant, but we need to make sure they are well formed so other tools can read them properly:

The mtime, ctime and atime fields need to be set to the Unix Epoch so it shows up as zeros (0x30) by default. We were storing them as null (0x0) characters instead. For consistency, if a regular entry happens to get its mtime set to the Epoch, it should also be shown as zeros (0x30).
The uid, gid, uname and gname fields were not being set. These fields aren't really important in a metadata entry, but other tools set this to the default value (uid=gid=0 and uname=gname=root). Based on the fact that 'a value' is preferred for these fields, we can set them to the same value that the real entry has (not root).

Regression

  • Yes
  • No

TarWriter has had this behavior since the beginning.

Testing

Adding tests for each new behavior:

  • Check uid, gid, uname and gname, the mtime, ctime and atime values TarWriter writes directly in the produced stream bytes.
  • Check the null character in the metadata entries is present when these entries are created and ensure the final TarEntry is properly formed.
  • Confirm that the checksum is properly formed.
  • We already have tests that verify roundtripping of TarReader and TarWriter.
  • Also crack open the produced tar files in external tools like 7-zip to confirm they look fine.

Risk

Low.

  • We were writing GNU entries like this since the creation of TarWriter. They were not properly formatted, now they are.

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-formats-tar
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR backports changes to improve the GNU tar entry formatting in TarWriter. Key changes include:

  • Adding helper methods to determine if the link and name fields are too long for the regular header.
  • Introducing helper methods to create a long metadata stream with a null terminator.
  • Adjusting the GNU long metadata header to use the main entry’s uid, gid, and timestamp (using UnixEpoch).
Comments suppressed due to low confidence (1)

src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Write.cs:363

  • Add a Debug.Assert or null check for _name in IsNameTooLongForRegularField to ensure that _name is not null, assuming that _name must always be non-null.
private bool IsNameTooLongForRegularField() => (Encoding.UTF8.GetByteCount(_name) + 1) > FieldLengths.Name;

@github-actions github-actions bot force-pushed the backport/pr-114940-to-release/10.0-preview4 branch from 1ba3e52 to 096cb1b Compare April 23, 2025 07:02
@carlossanlop carlossanlop added the Servicing-consider Issue for next servicing release review label Apr 23, 2025
@github-actions github-actions bot force-pushed the backport/pr-114940-to-release/10.0-preview4 branch from 096cb1b to 005c139 Compare April 23, 2025 19:45
@github-actions github-actions bot force-pushed the backport/pr-114940-to-release/10.0-preview4 branch 2 times, most recently from 31fcde9 to ee39151 Compare April 23, 2025 20:18
The Tar entry types Longlink and Longpath, which are exclusive to the GNU format, are metadata entries used to store longer-than-expected link or name strings (respectively). They precede their actual entry, and they are not visible to the user, only the data they hold is visible.

Most of the regular header metadata in this metadata entries is not relevant, but we need to make sure it is well formed so other tools can read them properly.

There are 3 fields that were not getting their values stored as expected by other tools:

- The mtime, ctime and atime fields need to be set to the Unix Epoch so it shows up as zeros (0x30) by default. We were storing them as null (0x0) characters instead. For consistency, if a regular entry happens to get its mtime set to the Epoch, it should also be shown as zeros (0x30).
- The uid, gid, uname and gname fields were not being set. These fields aren't really important in a metadata entry, but other tools set this to the default value (uid=gid=0 and uname=gname=root). Based on the fact that 'a value' is preferred for these fields, we can set them to the same value that the real entry has (not root).
- The long text value (either link or name) should be null terminated, and this extra character should be counted in the size of the longpath. The library libdpkg, used by the dpkg-deb tool in Debian, now expects a null terminator in longlink and longpath data starting in Debian 13. Without the terminator, these entries are considered malformed.
…ck those values in the LongLink and LongPath stream bytes themselves. Also should check uid=gid=0 and the copying of uname and gname into the LongLink/LongPath entries. Also should confirm that the data length in LongLink and LongPath entries is counting a null terminator in the data.
…value is 0 (except the last char). Adjust the new tests to take care of that.
@github-actions github-actions bot force-pushed the backport/pr-114940-to-release/10.0-preview4 branch from ee39151 to 63a7f83 Compare April 23, 2025 22:13
@carlossanlop carlossanlop added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Apr 24, 2025
@carlossanlop
Copy link
Contributor

/ba-g failures are unrelated to tar

@carlossanlop carlossanlop merged commit a4175df into release/10.0-preview4 Apr 24, 2025
83 of 88 checks passed
@carlossanlop carlossanlop deleted the backport/pr-114940-to-release/10.0-preview4 branch April 24, 2025 15:52
@github-actions github-actions bot locked and limited conversation to collaborators May 25, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Formats.Tar Servicing-approved Approved for servicing release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants