Skip to content

System.Formats.Tar — OverflowException on GNU binary-encoded UID/GID #127006

@javiercn

Description

@javiercn

Description

Summary

System.Formats.Tar.TarReader throws System.OverflowException when reading
tar archives that contain GNU binary-encoded UID or GID values exceeding
Int32.MaxValue (2,147,483,647). GNU tar uses a binary encoding for numeric
fields that don't fit in the standard 8-byte octal representation: the high bit
of the first byte is set, and the remaining bytes form a big-endian integer.
.NET's TarReader parses these into int, causing a checked arithmetic
overflow when the decoded value exceeds Int32.MaxValue.

I ran into this while writing some code that unpacked NPM packages. The following package is an example. statuses@1.5.0 ships a tarball with GNU binary-encoded UID fields where some entries decode to values > 2^31, triggering this bug.

This is not a one off, but something that happens for many packages downloaded from NPM.

Impact

Any .NET application using TarReader to extract npm tarballs (or any tar archive with GNU binary-encoded headers) may hit this unexpectedly. The exception is unrecoverable — there is no way to skip the offending entry or configure TarReader to ignore UID/GID overflow.

Sample Real-world affected package

statuses@1.5.0 from the npm registry triggers this bug. It is a transitive
dependency of express, koa, send, finalhandler, and many other popular
packages. Its tarball uses GNU binary-encoded UID fields (byte 0x80 prefix):

Entry                              UID (decoded)    Overflow?
─────────────────────────────────  ──────────────   ─────────
PaxHeader/package/HISTORY.md       3,059,377,464    YES (> Int32.MaxValue)
package/HISTORY.md                 4,284,114,232    YES (> Int32.MaxValue)
PaxHeader/package/codes.json       3,227,149,624    YES (> Int32.MaxValue)
package/package.json               5,924,152        no
package/index.js                   526,017,848      no

The UID hex bytes for an overflowing entry: 80 00 00 00 b6 5a 65 38 — the
0x80 prefix signals GNU binary encoding, and the decoded value 0xB65A6538
= 3,059,377,464 exceeds Int32.MaxValue.

Root Cause

In System.Formats.Tar, the TarHelpers.ParseOctal<T>() method parses numeric
fields from tar headers. When it encounters GNU binary encoding (first byte has
high bit 0x80 set), it decodes the remaining bytes as a big-endian integer and
converts to the target type T. For UID and GID fields, T is int (signed
32-bit).

The problem: GNU binary encoding can represent values up to 2^56 - 1 in an
8-byte field (7 data bytes after the 0x80 marker). When the decoded value
exceeds Int32.MaxValue (2,147,483,647), the checked conversion to int
throws OverflowException.

Real-world example from statuses@1.5.0:

  • UID field bytes: 80 00 00 00 B6 5A 65 38
  • 0x80 prefix → GNU binary encoding
  • Remaining 7 bytes → 0x000000B65A6538 = 3,059,377,464
  • (int)3059377464OverflowException (> 2,147,483,647)

Environment

  • .NET 10

Reproduction Steps

Minimal Reproduction

Single-file C# app. Run with dotnet run in a .NET 9+ or .NET 10 project,
or paste into any top-level-statements Program.cs. No external files needed.

// Repro: System.Formats.Tar OverflowException on GNU binary-encoded UID
//
// Run: dotnet run
// Requires: .NET 9+ (System.Formats.Tar is inbox)

using System.Formats.Tar;
using System.IO.Compression;
using System.Text;

// ──────────────────────────────────────────────────────────────────────────
// Part 1: Synthetic tarball — GNU binary-encoded UID > Int32.MaxValue
// ──────────────────────────────────────────────────────────────────────────
//
// Builds a valid .tar.gz in memory with GNU binary-encoded UID = 0xB65A6538
// (3,059,377,464) — matching the real encoding found in statuses@1.5.0.
// No network, no files on disk.

Console.WriteLine("=== Part 1: Synthetic tarball (GNU binary UID) ===");
Console.WriteLine();

var syntheticTgz = CreateSyntheticTarball();
Console.WriteLine($"  Created {syntheticTgz.Length}-byte .tgz in memory");

try
{
    using var ms = new MemoryStream(syntheticTgz);
    using var gz = new GZipStream(ms, CompressionMode.Decompress);
    using var reader = new TarReader(gz);

    while (await reader.GetNextEntryAsync() is { } entry)
    {
        Console.WriteLine($"  Entry: {entry.Name}, UID={entry.Uid}");
    }

    Console.WriteLine("  [OK] No exception (may be fixed in your .NET version).");
}
catch (OverflowException ex)
{
    Console.WriteLine($"  [BUG] OverflowException: {ex.Message}");
    Console.WriteLine($"  Stack: {ex.StackTrace?.Split('\n').FirstOrDefault()?.Trim()}");
}

Console.WriteLine();

// ──────────────────────────────────────────────────────────────────────────
// Part 2: Real npm package — statuses@1.5.0 (transitive dep of express)
// ──────────────────────────────────────────────────────────────────────────
//
// Uncomment to download and reproduce against the real npm package.
// statuses@1.5.0 has GNU binary-encoded UID fields with values > Int32.MaxValue.

Console.WriteLine("=== Part 2: Real npm package (statuses@1.5.0) ===");
Console.WriteLine();

// var httpClient = new HttpClient();
// var tarballUrl = "https://registry.npmjs.org/statuses/-/statuses-1.5.0.tgz";
// Console.WriteLine($"  Downloading {tarballUrl} ...");
// var tgzBytes = await httpClient.GetByteArrayAsync(tarballUrl);
// Console.WriteLine($"  Downloaded {tgzBytes.Length} bytes");
// try
// {
//     using var ms2 = new MemoryStream(tgzBytes);
//     using var gz2 = new GZipStream(ms2, CompressionMode.Decompress);
//     using var reader2 = new TarReader(gz2);
//     while (await reader2.GetNextEntryAsync() is { } entry2)
//     {
//         Console.WriteLine($"  Entry: {entry2.Name}, UID={entry2.Uid}");
//     }
//     Console.WriteLine("  [OK] No exception (may be fixed in your .NET version).");
// }
// catch (OverflowException ex2)
// {
//     Console.WriteLine($"  [BUG] OverflowException on statuses@1.5.0: {ex2.Message}");
//     Console.WriteLine();
//     Console.WriteLine("  This package has GNU binary-encoded UID fields:");
//     Console.WriteLine("    PaxHeader/package/HISTORY.md — UID hex: 80 00 00 00 b6 5a 65 38");
//     Console.WriteLine("    Decoded UID: 3,059,377,464 (> Int32.MaxValue = 2,147,483,647)");
// }

Console.WriteLine("  (Uncomment the code block above to test with the real npm package)");
Console.WriteLine();

// ──────────────────────────────────────────────────────────────────────────
// Helper: build a synthetic .tar.gz with GNU binary-encoded UID
// ──────────────────────────────────────────────────────────────────────────

static byte[] CreateSyntheticTarball()
{
    // Tar header layout (512 bytes per entry):
    //   [0..100)   name
    //   [100..108) mode     (octal)
    //   [108..116) uid      (octal or GNU binary)  ← overflow field
    //   [116..124) gid      (octal or GNU binary)  ← overflow field
    //   [124..136) size     (octal)
    //   [136..148) mtime    (octal)
    //   [148..156) checksum
    //   [156]      typeflag ('0' = regular file)
    //   [257..263) magic    ("ustar\0")
    //   [263..265) version  ("00")

    using var tarMs = new MemoryStream();

    var content = Encoding.UTF8.GetBytes(
        """{"name":"overflow-test","version":"1.0.0"}""");

    // UID 0xB65A6538 = 3,059,377,464 — same value found in statuses@1.5.0
    WriteTarEntry(tarMs, "package/package.json", content, uid: 0xB65A6538);

    // End-of-archive marker: two 512-byte zero blocks
    tarMs.Write(new byte[1024]);

    // Gzip compress
    var tarBytes = tarMs.ToArray();
    using var gzMs = new MemoryStream();
    using (var gzStream = new GZipStream(gzMs, CompressionLevel.Optimal, leaveOpen: true))
    {
        gzStream.Write(tarBytes);
    }
    return gzMs.ToArray();
}

static void WriteTarEntry(Stream stream, string name, byte[] content, long uid)
{
    var header = new byte[512];

    // Name
    Encoding.ASCII.GetBytes(name).AsSpan(0, Math.Min(name.Length, 100))
        .CopyTo(header);

    // Mode: 0644
    WriteOctal(header.AsSpan(100, 8), 0x1A4);

    // UID — GNU binary encoding: high bit set, big-endian value in remaining 7 bytes
    // This is the encoding that triggers the OverflowException in .NET's TarReader.
    WriteGnuBinary(header.AsSpan(108, 8), uid);

    // GID: 0 (octal — won't overflow)
    WriteOctal(header.AsSpan(116, 8), 0);

    // Size
    WriteOctal(header.AsSpan(124, 12), content.Length);

    // Mtime
    WriteOctal(header.AsSpan(136, 12), DateTimeOffset.UtcNow.ToUnixTimeSeconds());

    // Typeflag: regular file
    header[156] = (byte)'0';

    // USTAR magic + version
    "ustar\0"u8.CopyTo(header.AsSpan(257));
    header[263] = (byte)'0';
    header[264] = (byte)'0';

    // Checksum: sum of all header bytes, with checksum field treated as spaces
    header.AsSpan(148, 8).Fill((byte)' ');
    long cksum = 0;
    foreach (var b in header) cksum += b;
    WriteOctal(header.AsSpan(148, 7), cksum);
    header[155] = (byte)' ';

    stream.Write(header);
    stream.Write(content);

    // Pad to 512-byte boundary
    var pad = (512 - content.Length % 512) % 512;
    if (pad > 0) stream.Write(new byte[pad]);
}

static void WriteOctal(Span<byte> field, long value)
{
    var octal = Convert.ToString(value, 8).PadLeft(field.Length - 1, '0');
    for (var i = 0; i < octal.Length && i < field.Length - 1; i++)
        field[i] = (byte)octal[i];
    field[^1] = 0; // null terminator
}

static void WriteGnuBinary(Span<byte> field, long value)
{
    // GNU binary encoding: first byte = 0x80, remaining bytes = big-endian value
    field[0] = 0x80;
    for (var i = field.Length - 1; i >= 1; i--)
    {
        field[i] = (byte)(value & 0xFF);
        value >>= 8;
    }
}

Expected output

=== Part 1: Synthetic tarball (GNU binary UID) ===

  Created 135-byte .tgz in memory
  [BUG] OverflowException: Arithmetic operation resulted in an overflow.
  Stack: at System.Formats.Tar.TarHelpers.ParseOctal[T](ReadOnlySpan`1 buffer)

=== Part 2: Real npm package (statuses@1.5.0) ===

  (Uncomment the code block above to test with the real npm package)

When Part 2 is uncommented:

=== Part 2: Real npm package (statuses@1.5.0) ===

  Downloading https://registry.npmjs.org/statuses/-/statuses-1.5.0.tgz ...
  Downloaded 5482 bytes
  [BUG] OverflowException on statuses@1.5.0: Arithmetic operation resulted in an overflow.

  This package has GNU binary-encoded UID fields:
    PaxHeader/package/HISTORY.md — UID hex: 80 00 00 00 b6 5a 65 38
    Decoded UID: 3,059,377,464 (> Int32.MaxValue = 2,147,483,647)

Expected behavior

Code reads the tar successfully

Actual behavior

StackOverflow exception

Regression?

No response

Known Workarounds

N/A

Configuration

.NET 10.0

Other information

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions