-
Notifications
You must be signed in to change notification settings - Fork 5k
Tar: Adjust the way we write GNU longlink and longpath metadata #114940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The Tar entry types Longlink and Longpath, which are exclusive to the GNU format, are metadata entries used to store longer-than-expected link or name strings (respectively). They precede their actual entry, and they are not visible to the user, only the data they hold is visible. Most of the regular header metadata in this metadata entries is not relevant, but we need to make sure it is well formed so other tools can read them properly. There are 3 fields that were not getting their values stored as expected by other tools: - The mtime, ctime and atime fields need to be set to the Unix Epoch so it shows up as zeros (0x30) by default. We were storing them as null (0x0) characters instead. For consistency, if a regular entry happens to get its mtime set to the Epoch, it should also be shown as zeros (0x30). - The uid, gid, uname and gname fields were not being set. These fields aren't really important in a metadata entry, but other tools set this to the default value (uid=gid=0 and uname=gname=root). Based on the fact that 'a value' is preferred for these fields, we can set them to the same value that the real entry has (not root). - The long text value (either link or name) should be null terminated, and this extra character should be counted in the size of the longpath. The library libdpkg, used by the dpkg-deb tool in Debian, now expects a null terminator in longlink and longpath data starting in Debian 13. Without the terminator, these entries are considered malformed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adjusts GNU longlink and longpath metadata handling in TAR archives to ensure that fields like mtime, ctime, atime, uid, gid, uname, and gname are set correctly per GNU specifications and that long fields are properly null terminated.
- Refactors long name and link checks by introducing helper methods that account for the null terminator.
- Updates synchronous and asynchronous methods to generate and write long metadata streams.
- Revises header metadata settings to copy real entry values and use UnixEpoch instead of DateTimeOffset.MinValue.
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Write.cs
Show resolved
Hide resolved
Tagging subscribers to this area: @dotnet/area-system-formats-tar |
/backport to release/10.0-preview4 |
Started backporting to release/10.0-preview4: https://github.com/dotnet/runtime/actions/runs/14608870946 |
Slight adjustment to the statement about Debian: dpkg in Debian 12 and older expects the null terminator. Dpkg in Debian 13 can function safely without the null terminator. |
Ah thanks for the clarification. I understood the opposite. I read the chat conversation again. |
…ck those values in the LongLink and LongPath stream bytes themselves. Also should check uid=gid=0 and the copying of uname and gname into the LongLink/LongPath entries. Also should confirm that the data length in LongLink and LongPath entries is counting a null terminator in the data.
Shouldn't the bot copy commits to the backport PR which where pushed after the /backport command? |
/backport to release/10.0-preview4 |
Started backporting to release/10.0-preview4: https://github.com/dotnet/runtime/actions/runs/14611981688 |
no it needs another comment trigger like you did |
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Write.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Write.cs
Outdated
Show resolved
Hide resolved
Started backporting to release/10.0-preview4: https://github.com/dotnet/runtime/actions/runs/14626729273 |
…pply the if check >=0 to mode. Adjust tests to verify mode as well.
/backport to release/10.0-preview4 |
Started backporting to release/10.0-preview4: https://github.com/dotnet/runtime/actions/runs/14627136529 |
/backport to release/10.0-preview4 |
Started backporting to release/10.0-preview4: https://github.com/dotnet/runtime/actions/runs/14627268867 |
@tmds, @carlossanlop, in extended attributes name, is it intentional that a tar generated on Windows will have trailing backslash and unix with forwardslash? runtime/src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Write.cs Lines 1177 to 1179 in ca72cdf
Perhaps we want: - $"{dirName}/PaxHeaders.{Environment.ProcessId}/{fileName}{Path.DirectorySeparatorChar}" :
+ $"{dirName}/PaxHeaders.{Environment.ProcessId}/{fileName}/" : |
Can we discuss that in a separate issue, please? This is unrelated to the bug we're fixing. |
@carlossanlop noticed it in the next method PR is touching, so that's why I've mentioned: ![]() |
That's GenerateExtendedAttributeName, which is used by PAX. Not something this PR is touching. Feel free to open a separate issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable and @carlossanlop let me know he's worked with @NikolaMilosavljevic
to validate e2e that this fixes the broken deb packages.
Make sure we track any follow up work in an issue or a PR to main.
@am11 is there a reason for the downvote? |
/backport to release/10.0-preview4 |
Started backporting to release/10.0-preview4: https://github.com/dotnet/runtime/actions/runs/14629079402 |
It should all be forward slashes. And as |
/ba-g failures are unrelated to tar |
The Tar entry types Longlink and Longpath, which are exclusive to the GNU format, are metadata entries used to store longer-than-expected link or name strings (respectively). They precede their actual entry, and they are not visible to the user, only the data they hold is visible.
Most of the regular header metadata in this metadata entries is not relevant, but we need to make sure it is well formed so other tools can read them properly.
There are 3 fields that were not getting their values stored as expected by other tools: