Skip to content

Tar: Adjust the way we write GNU longlink and longpath metadata #114940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 24, 2025

Conversation

carlossanlop
Copy link
Contributor

@carlossanlop carlossanlop commented Apr 23, 2025

The Tar entry types Longlink and Longpath, which are exclusive to the GNU format, are metadata entries used to store longer-than-expected link or name strings (respectively). They precede their actual entry, and they are not visible to the user, only the data they hold is visible.

Most of the regular header metadata in this metadata entries is not relevant, but we need to make sure it is well formed so other tools can read them properly.

There are 3 fields that were not getting their values stored as expected by other tools:

  • The mtime, ctime and atime fields need to be set to the Unix Epoch so it shows up as zeros (0x30) by default. We were storing them as null (0x0) characters instead. For consistency, if a regular entry happens to get its mtime set to the Epoch, it should also be shown as zeros (0x30).
  • The uid, gid, uname and gname fields were not being set. These fields aren't really important in a metadata entry, but other tools set this to the default value (uid=gid=0 and uname=gname=root). Based on the fact that 'a value' is preferred for these fields, we can set them to the same value that the real entry has (not root).
  • The long text value (either link or name) should be null terminated, and this extra character should be counted in the size of the longpath. The library libdpkg, used for packagin in Debian, can handle a missing null terminator in longlink and longpath data starting with Debian 13, but we use Debian 12, and in that version, without the terminator, these entries are considered malformed.

The Tar entry types Longlink and Longpath, which are exclusive to the GNU format, are metadata entries used to store longer-than-expected link or name strings (respectively). They precede their actual entry, and they are not visible to the user, only the data they hold is visible.

Most of the regular header metadata in this metadata entries is not relevant, but we need to make sure it is well formed so other tools can read them properly.

There are 3 fields that were not getting their values stored as expected by other tools:

- The mtime, ctime and atime fields need to be set to the Unix Epoch so it shows up as zeros (0x30) by default. We were storing them as null (0x0) characters instead. For consistency, if a regular entry happens to get its mtime set to the Epoch, it should also be shown as zeros (0x30).
- The uid, gid, uname and gname fields were not being set. These fields aren't really important in a metadata entry, but other tools set this to the default value (uid=gid=0 and uname=gname=root). Based on the fact that 'a value' is preferred for these fields, we can set them to the same value that the real entry has (not root).
- The long text value (either link or name) should be null terminated, and this extra character should be counted in the size of the longpath. The library libdpkg, used by the dpkg-deb tool in Debian, now expects a null terminator in longlink and longpath data starting in Debian 13. Without the terminator, these entries are considered malformed.
@carlossanlop carlossanlop self-assigned this Apr 23, 2025
@Copilot Copilot AI review requested due to automatic review settings April 23, 2025 01:47
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adjusts GNU longlink and longpath metadata handling in TAR archives to ensure that fields like mtime, ctime, atime, uid, gid, uname, and gname are set correctly per GNU specifications and that long fields are properly null terminated.

  • Refactors long name and link checks by introducing helper methods that account for the null terminator.
  • Updates synchronous and asynchronous methods to generate and write long metadata streams.
  • Revises header metadata settings to copy real entry values and use UnixEpoch instead of DateTimeOffset.MinValue.

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-formats-tar
See info in area-owners.md if you want to be subscribed.

@carlossanlop
Copy link
Contributor Author

/backport to release/10.0-preview4

Copy link
Contributor

Started backporting to release/10.0-preview4: https://github.com/dotnet/runtime/actions/runs/14608870946

@jkoritzinsky
Copy link
Member

Slight adjustment to the statement about Debian: dpkg in Debian 12 and older expects the null terminator. Dpkg in Debian 13 can function safely without the null terminator.

@carlossanlop
Copy link
Contributor Author

Slight adjustment to the statement about Debian: dpkg in Debian 12 and older expects the null terminator. Dpkg in Debian 13 can function safely without the null terminator.

Ah thanks for the clarification. I understood the opposite. I read the chat conversation again.

…ck those values in the LongLink and LongPath stream bytes themselves.

Also should check uid=gid=0 and the copying of uname and gname into the LongLink/LongPath entries.
Also should confirm that the data length in LongLink and LongPath entries is counting a null terminator in the data.
@carlossanlop
Copy link
Contributor Author

Shouldn't the bot copy commits to the backport PR which where pushed after the /backport command?

@carlossanlop
Copy link
Contributor Author

/backport to release/10.0-preview4

Copy link
Contributor

Started backporting to release/10.0-preview4: https://github.com/dotnet/runtime/actions/runs/14611981688

@akoeplinger
Copy link
Member

Shouldn't the bot copy commits to the backport PR which where pushed after the /backport command?

no it needs another comment trigger like you did

Copy link
Contributor

Started backporting to release/10.0-preview4: https://github.com/dotnet/runtime/actions/runs/14626729273

…pply the if check >=0 to mode. Adjust tests to verify mode as well.
@carlossanlop
Copy link
Contributor Author

/backport to release/10.0-preview4

Copy link
Contributor

Started backporting to release/10.0-preview4: https://github.com/dotnet/runtime/actions/runs/14627136529

@carlossanlop
Copy link
Contributor Author

/backport to release/10.0-preview4

Copy link
Contributor

Started backporting to release/10.0-preview4: https://github.com/dotnet/runtime/actions/runs/14627268867

@am11
Copy link
Member

am11 commented Apr 23, 2025

@tmds, @carlossanlop, in extended attributes name, is it intentional that a tar generated on Windows will have trailing backslash and unix with forwardslash?

return _typeFlag is TarEntryType.Directory or TarEntryType.DirectoryList ?
$"{dirName}/PaxHeaders.{Environment.ProcessId}/{fileName}{Path.DirectorySeparatorChar}" :
$"{dirName}/PaxHeaders.{Environment.ProcessId}/{fileName}";

Perhaps we want:

-      $"{dirName}/PaxHeaders.{Environment.ProcessId}/{fileName}{Path.DirectorySeparatorChar}" : 
+      $"{dirName}/PaxHeaders.{Environment.ProcessId}/{fileName}/" : 

@carlossanlop
Copy link
Contributor Author

@tmds, @carlossanlop, in extended attributes name, is it intentional that a tar generated on Windows will have trailing backslash and unix with forwardslash?

Can we discuss that in a separate issue, please? This is unrelated to the bug we're fixing.

@am11
Copy link
Member

am11 commented Apr 23, 2025

@carlossanlop noticed it in the next method PR is touching, so that's why I've mentioned:

image

@carlossanlop
Copy link
Contributor Author

That's GenerateExtendedAttributeName, which is used by PAX. Not something this PR is touching. Feel free to open a separate issue.

Copy link
Member

@ericstj ericstj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable and @carlossanlop let me know he's worked with @NikolaMilosavljevic
to validate e2e that this fixes the broken deb packages.

Make sure we track any follow up work in an issue or a PR to main.

@carlossanlop
Copy link
Contributor Author

@am11 is there a reason for the downvote?

@carlossanlop
Copy link
Contributor Author

/backport to release/10.0-preview4

Copy link
Contributor

Started backporting to release/10.0-preview4: https://github.com/dotnet/runtime/actions/runs/14629079402

@tmds
Copy link
Member

tmds commented Apr 24, 2025

is it intentional that a tar generated on Windows will have trailing backslash and unix with forwardslash?

It should all be forward slashes. And as _name is expected to only use be using forward slashes as directory separators, GenerateExtendedAttributeName shouldn't use Path.GetDirectoryName/Path.GetFileName as on Windows it would also treat backward slashes as directory separators.

@carlossanlop
Copy link
Contributor Author

/ba-g failures are unrelated to tar

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants