Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Tar with large entry size #96209

Closed
ivanjx opened this issue Dec 20, 2023 · 2 comments
Closed

Support Tar with large entry size #96209

ivanjx opened this issue Dec 20, 2023 · 2 comments

Comments

@ivanjx
Copy link

ivanjx commented Dec 20, 2023

Description

Currently i cannot use TarReader.GetNextEntryAsync whenever it sees a tar entry file with more than 8GB of size (my case is 9GB). In the entry header itself, instead of writing size of the file in octal, the size header is constructed like this:
[0x80, 0, 0, 0, <size in bytes big endian>]

Reproduction Steps

using TarReader tar = new TarReader(tarStream, true);
List<TarEntry> entries = new List<TarEntry>();

while (true)
{
    TarEntry? entry = await tar.GetNextEntryAsync();

    if (entry == null)
    {
        break;
    }

    entries.Add(entry);
}
...

Expected behavior

Not error

Actual behavior

error:

Unable to parse number.
   at System.Formats.Tar.TarHelpers.ThrowInvalidNumber()
   at System.Formats.Tar.TarHelpers.ParseOctal[T](ReadOnlySpan`1 buffer)
   at System.Formats.Tar.TarHeader.TryReadCommonAttributes(Span`1 buffer, TarEntryFormat initialFormat)
   at System.Formats.Tar.TarHeader.TryReadAttributes(TarEntryFormat initialFormat, Span`1 buffer)
   at System.Formats.Tar.TarHeader.<TryGetNextHeaderAsync>d__48.MoveNext()
   at System.Threading.Tasks.ValueTask`1.get_Result()
   at System.Formats.Tar.TarReader.<TryGetNextEntryHeaderAsync>d__15.MoveNext()
   at System.Threading.Tasks.ValueTask`1.get_Result()
   at System.Formats.Tar.TarReader.<GetNextEntryInternalAsync>d__13.MoveNext()
   at System.Threading.Tasks.ValueTask`1.get_Result()
   at xxxxx.<GetEntriesAsync>d__0.MoveNext() in [redacted]

Regression?

No response

Known Workarounds

this is currently my own tar parser implementation:

public async Task<TarEntry[]> GetEntriesAsync(Stream tarStream, CancellationToken cancellationToken)
    {
        byte[] entryHeaderBuff = new byte[512];
        List<TarEntry> entries = new List<TarEntry>();

        while (true)
        {
            cancellationToken.ThrowIfCancellationRequested();
            entryHeaderBuff[0] = 0; // Reset flag.
            int read = await tarStream.ReadAsync(
                entryHeaderBuff,
                cancellationToken);

            if (read < entryHeaderBuff.Length ||
                entryHeaderBuff[0] == 0)
            {
                // No more entries.
                break;
            }

            string fileName = Encoding.ASCII.GetString(entryHeaderBuff, 0, 100).Trim('\0');
            long fileSize;

            if (entryHeaderBuff[124] == 0x80)
            {
                byte[] sizeBuff = new byte[8];
                entryHeaderBuff
                    .AsSpan()
                    .Slice(124 + 4, 8)
                    .CopyTo(sizeBuff);
                Array.Reverse(sizeBuff);
                fileSize = BitConverter.ToInt64(sizeBuff);
            }
            else
            {
                string sizeStr = Encoding.UTF8.GetString(entryHeaderBuff, 124, 12).Trim('\0');
                fileSize = Convert.ToInt64(sizeStr, 8);
            }

            entries.Add(new TarEntry(fileName, fileSize));

            // Next.
            tarStream.Seek((fileSize + 511) / 512 * 512, SeekOrigin.Current);
        }

        return entries.ToArray();
    }

Configuration

No response

Other information

No response

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Dec 20, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Dec 20, 2023
@vcsjones vcsjones added area-System.Formats.Tar and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Dec 20, 2023
@ghost
Copy link

ghost commented Dec 20, 2023

Tagging subscribers to this area: @dotnet/area-system-formats-tar
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Currently i cannot use TarReader.GetNextEntryAsync whenever it sees a tar entry file with more than 8GB of size (my case is 9GB). In the entry header itself, instead of writing size of the file in octal, the size header is constructed like this:
[0x80, 0, 0, 0, <size in bytes big endian>]

Reproduction Steps

using TarReader tar = new TarReader(tarStream, true);
List<TarEntry> entries = new List<TarEntry>();

while (true)
{
    TarEntry? entry = await tar.GetNextEntryAsync();

    if (entry == null)
    {
        break;
    }

    entries.Add(entry);
}
...

Expected behavior

Not error

Actual behavior

error:

Unable to parse number.
   at System.Formats.Tar.TarHelpers.ThrowInvalidNumber()
   at System.Formats.Tar.TarHelpers.ParseOctal[T](ReadOnlySpan`1 buffer)
   at System.Formats.Tar.TarHeader.TryReadCommonAttributes(Span`1 buffer, TarEntryFormat initialFormat)
   at System.Formats.Tar.TarHeader.TryReadAttributes(TarEntryFormat initialFormat, Span`1 buffer)
   at System.Formats.Tar.TarHeader.<TryGetNextHeaderAsync>d__48.MoveNext()
   at System.Threading.Tasks.ValueTask`1.get_Result()
   at System.Formats.Tar.TarReader.<TryGetNextEntryHeaderAsync>d__15.MoveNext()
   at System.Threading.Tasks.ValueTask`1.get_Result()
   at System.Formats.Tar.TarReader.<GetNextEntryInternalAsync>d__13.MoveNext()
   at System.Threading.Tasks.ValueTask`1.get_Result()
   at xxxxx.<GetEntriesAsync>d__0.MoveNext() in [redacted]

Regression?

No response

Known Workarounds

this is currently my own tar parser implementation:

public async Task<TarEntry[]> GetEntriesAsync(Stream tarStream, CancellationToken cancellationToken)
    {
        byte[] entryHeaderBuff = new byte[512];
        List<TarEntry> entries = new List<TarEntry>();

        while (true)
        {
            cancellationToken.ThrowIfCancellationRequested();
            entryHeaderBuff[0] = 0; // Reset flag.
            int read = await tarStream.ReadAsync(
                entryHeaderBuff,
                cancellationToken);

            if (read < entryHeaderBuff.Length ||
                entryHeaderBuff[0] == 0)
            {
                // No more entries.
                break;
            }

            string fileName = Encoding.ASCII.GetString(entryHeaderBuff, 0, 100).Trim('\0');
            long fileSize;

            if (entryHeaderBuff[124] == 0x80)
            {
                byte[] sizeBuff = new byte[8];
                entryHeaderBuff
                    .AsSpan()
                    .Slice(124 + 4, 8)
                    .CopyTo(sizeBuff);
                Array.Reverse(sizeBuff);
                fileSize = BitConverter.ToInt64(sizeBuff);
            }
            else
            {
                string sizeStr = Encoding.UTF8.GetString(entryHeaderBuff, 124, 12).Trim('\0');
                fileSize = Convert.ToInt64(sizeStr, 8);
            }

            entries.Add(new TarEntry(fileName, fileSize));

            // Next.
            tarStream.Seek((fileSize + 511) / 512 * 512, SeekOrigin.Current);
        }

        return entries.ToArray();
    }

Configuration

No response

Other information

No response

Author: ivanjx
Assignees: -
Labels:

untriaged, area-System.Formats.Tar

Milestone: -

@ivanjx
Copy link
Author

ivanjx commented Dec 21, 2023

closing in favor of #93763

@ivanjx ivanjx closed this as completed Dec 21, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Dec 21, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Jan 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants