Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tar: Improve unseekable stream handling #84279

Merged
merged 22 commits into from Jun 1, 2023

Conversation

carlossanlop
Copy link
Member

@carlossanlop carlossanlop commented Apr 4, 2023

Fixes #76690

Addresses the issue reported in dotnet/sdk-container-builds#192 (comment) by @rainersigwald

Before this change, it wasn't possible to write into an archive a file entry that contained an unseekable data stream. The undelying cause was that we were calling Length to try to get the total number of bytes of the data stream, but it is unsupported when the stream's CanSeek is false.

The way I fixed this was by separating the code that writes an entry into two possible paths:

  • One path for entries with seekable data streams, which remains mostly unmodified.

  • A new path for entries with unseekable streams, which changes the order in which we collect the entry information into the buffer:

    • First, write into the buffer all the metadata except for the size field.
    • Then, write the data stream into the archive (this is possible because we know the exact location where the data should start), counting the number of bytes that are written.
    • Afterwards, write that collected length value in the location for the size field in the buffer.
    • Finally, calculate the checksum, then write the buffer into the archive.

@ghost
Copy link

ghost commented Apr 4, 2023

Tagging subscribers to this area: @dotnet/area-system-formats-tar
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #76690

Addresses the issue reported in dotnet/sdk-container-builds#192 (comment) by @rainersigwald

Before this change, it wasn't possible to write into an archive a file entry that contained an unseekable data stream. The undelying cause was that we were calling Length to try to get the total number of bytes of the data stream, but it is unsupported when the stream's CanSeek is false.

The way I fixed this was by separating the code that writes an entry into two possible paths:

  • One path for entries with seekable data streams, which remains mostly unmodified.

  • A newnew path for entries with unseekable streams, which changes the order in which we collect the entry information into the buffer:

    • First, write into the buffer all the metadata except for the size field.
    • Then, write the data stream into the archive (this is possible because we know the exact location where the data should start), counting the number of bytes that are written.
    • Afterwards, write that collected length value in the location for the size field in the buffer.
    • Finally, calculate the checksum, then write the buffer into the archive.
Author: carlossanlop
Assignees: carlossanlop
Labels:

area-System.Formats.Tar

Milestone: -

@carlossanlop
Copy link
Member Author

carlossanlop commented Apr 4, 2023

I'll remove the new file TarFile.CreateFromDirectoryAsync.File.Roundtrip.cs and will submit it in a separate PR. It isn't related to this change.
Edit: Merged in separate PR: #84303

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it LGTM, but it would be nice if we could reduce the code duplication before merging.

Thank you @carlossanlop !

@adamsitnik adamsitnik added this to the 8.0.0 milestone May 17, 2023
Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please apply my suggestions before merging.

@carlossanlop
Copy link
Member Author

Failures are unrelated.

@carlossanlop carlossanlop merged commit 84a7be7 into dotnet:main Jun 1, 2023
101 of 105 checks passed
@carlossanlop carlossanlop deleted the TarUnseekableDataStream branch June 1, 2023 03:35
@dotnet dotnet locked as resolved and limited conversation to collaborators Jul 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve handling when setting an unseekable stream as a tar entry's DataStream
3 participants