-
Notifications
You must be signed in to change notification settings - Fork 5.1k
[release/10.0-preview4] Tar: Adjust the way we write GNU longlink and longpath metadata #114942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/10.0-preview4] Tar: Adjust the way we write GNU longlink and longpath metadata #114942
Conversation
Tagging subscribers to this area: @dotnet/area-system-formats-tar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR backports changes to improve the GNU tar entry formatting in TarWriter. Key changes include:
- Adding helper methods to determine if the link and name fields are too long for the regular header.
- Introducing helper methods to create a long metadata stream with a null terminator.
- Adjusting the GNU long metadata header to use the main entry’s uid, gid, and timestamp (using UnixEpoch).
Comments suppressed due to low confidence (1)
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Write.cs:363
- Add a Debug.Assert or null check for _name in IsNameTooLongForRegularField to ensure that _name is not null, assuming that _name must always be non-null.
private bool IsNameTooLongForRegularField() => (Encoding.UTF8.GetByteCount(_name) + 1) > FieldLengths.Name;
1ba3e52
to
096cb1b
Compare
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Write.cs
Outdated
Show resolved
Hide resolved
096cb1b
to
005c139
Compare
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Write.cs
Outdated
Show resolved
Hide resolved
31fcde9
to
ee39151
Compare
The Tar entry types Longlink and Longpath, which are exclusive to the GNU format, are metadata entries used to store longer-than-expected link or name strings (respectively). They precede their actual entry, and they are not visible to the user, only the data they hold is visible. Most of the regular header metadata in this metadata entries is not relevant, but we need to make sure it is well formed so other tools can read them properly. There are 3 fields that were not getting their values stored as expected by other tools: - The mtime, ctime and atime fields need to be set to the Unix Epoch so it shows up as zeros (0x30) by default. We were storing them as null (0x0) characters instead. For consistency, if a regular entry happens to get its mtime set to the Epoch, it should also be shown as zeros (0x30). - The uid, gid, uname and gname fields were not being set. These fields aren't really important in a metadata entry, but other tools set this to the default value (uid=gid=0 and uname=gname=root). Based on the fact that 'a value' is preferred for these fields, we can set them to the same value that the real entry has (not root). - The long text value (either link or name) should be null terminated, and this extra character should be counted in the size of the longpath. The library libdpkg, used by the dpkg-deb tool in Debian, now expects a null terminator in longlink and longpath data starting in Debian 13. Without the terminator, these entries are considered malformed.
…ck those values in the LongLink and LongPath stream bytes themselves. Also should check uid=gid=0 and the copying of uname and gname into the LongLink/LongPath entries. Also should confirm that the data length in LongLink and LongPath entries is counting a null terminator in the data.
…value is 0 (except the last char). Adjust the new tests to take care of that.
…pply the if check >=0 to mode. Adjust tests to verify mode as well.
ee39151
to
63a7f83
Compare
/ba-g failures are unrelated to tar |
Backport of #114940 to release/10.0-preview4
/cc @carlossanlop
Customer Impact
LongLink and LongPath entries in GNU tar entries should have a null terminator in their data section.
We found this issue in the Unified Build system, specifically in Debian 12. The Debian packaging tools only allow TAR files with entries in the GNU format. This Tar entry format writes the metadata differently than other Tar entry formats. We are seeing that the deb packaging steps are failing because the tar files we generate are considered 'malformed'.
We found that Debian 13 does not have this issue because it uses a version of libdpkg that has a fix that can handle LongLink and LongPath entry types where the data section is not null terminated: https://git.dpkg.org/cgit/dpkg/dpkg.git/diff/?id=2b0229b8f . Since libdpkg in Debian 12 does not have this fix, it is unable to properly read our entries, as we don’t consider an ending null character as part of the data section of these entries.
When comparing GNU entries we create and the GNU entries that other tools create, we also found other fields that can be hardened. LongLink and LongPath entry types are exclusive to the GNU format, and are metadata entries used to store longer-than-expected link or name strings (respectively). They always precede their actual entry. Users don’t really see this entry or directly interact with it, we take care of them internally. Most of the metadata fields in these entries are not used, but we need to make sure they are well formed so other tools can read them properly and don’t unexpectedly consider an entry as ‘malformed’:
Most of the regular header metadata fields in these entries are not relevant, but we need to make sure they are well formed so other tools can read them properly:
The mtime, ctime and atime fields need to be set to the Unix Epoch so it shows up as zeros (0x30) by default. We were storing them as null (0x0) characters instead. For consistency, if a regular entry happens to get its mtime set to the Epoch, it should also be shown as zeros (0x30).
The uid, gid, uname and gname fields were not being set. These fields aren't really important in a metadata entry, but other tools set this to the default value (uid=gid=0 and uname=gname=root). Based on the fact that 'a value' is preferred for these fields, we can set them to the same value that the real entry has (not root).
Regression
TarWriter has had this behavior since the beginning.
Testing
Adding tests for each new behavior:
Risk
Low.