Description
Summary
System.Formats.Tar.TarReader throws System.OverflowException when reading
tar archives that contain GNU binary-encoded UID or GID values exceeding
Int32.MaxValue (2,147,483,647). GNU tar uses a binary encoding for numeric
fields that don't fit in the standard 8-byte octal representation: the high bit
of the first byte is set, and the remaining bytes form a big-endian integer.
.NET's TarReader parses these into int, causing a checked arithmetic
overflow when the decoded value exceeds Int32.MaxValue.
I ran into this while writing some code that unpacked NPM packages. The following package is an example. statuses@1.5.0 ships a tarball with GNU binary-encoded UID fields where some entries decode to values > 2^31, triggering this bug.
This is not a one off, but something that happens for many packages downloaded from NPM.
Impact
Any .NET application using TarReader to extract npm tarballs (or any tar archive with GNU binary-encoded headers) may hit this unexpectedly. The exception is unrecoverable — there is no way to skip the offending entry or configure TarReader to ignore UID/GID overflow.
Sample Real-world affected package
statuses@1.5.0 from the npm registry triggers this bug. It is a transitive
dependency of express, koa, send, finalhandler, and many other popular
packages. Its tarball uses GNU binary-encoded UID fields (byte 0x80 prefix):
Entry UID (decoded) Overflow?
───────────────────────────────── ────────────── ─────────
PaxHeader/package/HISTORY.md 3,059,377,464 YES (> Int32.MaxValue)
package/HISTORY.md 4,284,114,232 YES (> Int32.MaxValue)
PaxHeader/package/codes.json 3,227,149,624 YES (> Int32.MaxValue)
package/package.json 5,924,152 no
package/index.js 526,017,848 no
The UID hex bytes for an overflowing entry: 80 00 00 00 b6 5a 65 38 — the
0x80 prefix signals GNU binary encoding, and the decoded value 0xB65A6538
= 3,059,377,464 exceeds Int32.MaxValue.
Root Cause
In System.Formats.Tar, the TarHelpers.ParseOctal<T>() method parses numeric
fields from tar headers. When it encounters GNU binary encoding (first byte has
high bit 0x80 set), it decodes the remaining bytes as a big-endian integer and
converts to the target type T. For UID and GID fields, T is int (signed
32-bit).
The problem: GNU binary encoding can represent values up to 2^56 - 1 in an
8-byte field (7 data bytes after the 0x80 marker). When the decoded value
exceeds Int32.MaxValue (2,147,483,647), the checked conversion to int
throws OverflowException.
Real-world example from statuses@1.5.0:
- UID field bytes:
80 00 00 00 B6 5A 65 38
0x80 prefix → GNU binary encoding
- Remaining 7 bytes →
0x000000B65A6538 = 3,059,377,464
(int)3059377464 → OverflowException (> 2,147,483,647)
Environment
Reproduction Steps
Minimal Reproduction
Single-file C# app. Run with dotnet run in a .NET 9+ or .NET 10 project,
or paste into any top-level-statements Program.cs. No external files needed.
// Repro: System.Formats.Tar OverflowException on GNU binary-encoded UID
//
// Run: dotnet run
// Requires: .NET 9+ (System.Formats.Tar is inbox)
using System.Formats.Tar;
using System.IO.Compression;
using System.Text;
// ──────────────────────────────────────────────────────────────────────────
// Part 1: Synthetic tarball — GNU binary-encoded UID > Int32.MaxValue
// ──────────────────────────────────────────────────────────────────────────
//
// Builds a valid .tar.gz in memory with GNU binary-encoded UID = 0xB65A6538
// (3,059,377,464) — matching the real encoding found in statuses@1.5.0.
// No network, no files on disk.
Console.WriteLine("=== Part 1: Synthetic tarball (GNU binary UID) ===");
Console.WriteLine();
var syntheticTgz = CreateSyntheticTarball();
Console.WriteLine($" Created {syntheticTgz.Length}-byte .tgz in memory");
try
{
using var ms = new MemoryStream(syntheticTgz);
using var gz = new GZipStream(ms, CompressionMode.Decompress);
using var reader = new TarReader(gz);
while (await reader.GetNextEntryAsync() is { } entry)
{
Console.WriteLine($" Entry: {entry.Name}, UID={entry.Uid}");
}
Console.WriteLine(" [OK] No exception (may be fixed in your .NET version).");
}
catch (OverflowException ex)
{
Console.WriteLine($" [BUG] OverflowException: {ex.Message}");
Console.WriteLine($" Stack: {ex.StackTrace?.Split('\n').FirstOrDefault()?.Trim()}");
}
Console.WriteLine();
// ──────────────────────────────────────────────────────────────────────────
// Part 2: Real npm package — statuses@1.5.0 (transitive dep of express)
// ──────────────────────────────────────────────────────────────────────────
//
// Uncomment to download and reproduce against the real npm package.
// statuses@1.5.0 has GNU binary-encoded UID fields with values > Int32.MaxValue.
Console.WriteLine("=== Part 2: Real npm package (statuses@1.5.0) ===");
Console.WriteLine();
// var httpClient = new HttpClient();
// var tarballUrl = "https://registry.npmjs.org/statuses/-/statuses-1.5.0.tgz";
// Console.WriteLine($" Downloading {tarballUrl} ...");
// var tgzBytes = await httpClient.GetByteArrayAsync(tarballUrl);
// Console.WriteLine($" Downloaded {tgzBytes.Length} bytes");
// try
// {
// using var ms2 = new MemoryStream(tgzBytes);
// using var gz2 = new GZipStream(ms2, CompressionMode.Decompress);
// using var reader2 = new TarReader(gz2);
// while (await reader2.GetNextEntryAsync() is { } entry2)
// {
// Console.WriteLine($" Entry: {entry2.Name}, UID={entry2.Uid}");
// }
// Console.WriteLine(" [OK] No exception (may be fixed in your .NET version).");
// }
// catch (OverflowException ex2)
// {
// Console.WriteLine($" [BUG] OverflowException on statuses@1.5.0: {ex2.Message}");
// Console.WriteLine();
// Console.WriteLine(" This package has GNU binary-encoded UID fields:");
// Console.WriteLine(" PaxHeader/package/HISTORY.md — UID hex: 80 00 00 00 b6 5a 65 38");
// Console.WriteLine(" Decoded UID: 3,059,377,464 (> Int32.MaxValue = 2,147,483,647)");
// }
Console.WriteLine(" (Uncomment the code block above to test with the real npm package)");
Console.WriteLine();
// ──────────────────────────────────────────────────────────────────────────
// Helper: build a synthetic .tar.gz with GNU binary-encoded UID
// ──────────────────────────────────────────────────────────────────────────
static byte[] CreateSyntheticTarball()
{
// Tar header layout (512 bytes per entry):
// [0..100) name
// [100..108) mode (octal)
// [108..116) uid (octal or GNU binary) ← overflow field
// [116..124) gid (octal or GNU binary) ← overflow field
// [124..136) size (octal)
// [136..148) mtime (octal)
// [148..156) checksum
// [156] typeflag ('0' = regular file)
// [257..263) magic ("ustar\0")
// [263..265) version ("00")
using var tarMs = new MemoryStream();
var content = Encoding.UTF8.GetBytes(
"""{"name":"overflow-test","version":"1.0.0"}""");
// UID 0xB65A6538 = 3,059,377,464 — same value found in statuses@1.5.0
WriteTarEntry(tarMs, "package/package.json", content, uid: 0xB65A6538);
// End-of-archive marker: two 512-byte zero blocks
tarMs.Write(new byte[1024]);
// Gzip compress
var tarBytes = tarMs.ToArray();
using var gzMs = new MemoryStream();
using (var gzStream = new GZipStream(gzMs, CompressionLevel.Optimal, leaveOpen: true))
{
gzStream.Write(tarBytes);
}
return gzMs.ToArray();
}
static void WriteTarEntry(Stream stream, string name, byte[] content, long uid)
{
var header = new byte[512];
// Name
Encoding.ASCII.GetBytes(name).AsSpan(0, Math.Min(name.Length, 100))
.CopyTo(header);
// Mode: 0644
WriteOctal(header.AsSpan(100, 8), 0x1A4);
// UID — GNU binary encoding: high bit set, big-endian value in remaining 7 bytes
// This is the encoding that triggers the OverflowException in .NET's TarReader.
WriteGnuBinary(header.AsSpan(108, 8), uid);
// GID: 0 (octal — won't overflow)
WriteOctal(header.AsSpan(116, 8), 0);
// Size
WriteOctal(header.AsSpan(124, 12), content.Length);
// Mtime
WriteOctal(header.AsSpan(136, 12), DateTimeOffset.UtcNow.ToUnixTimeSeconds());
// Typeflag: regular file
header[156] = (byte)'0';
// USTAR magic + version
"ustar\0"u8.CopyTo(header.AsSpan(257));
header[263] = (byte)'0';
header[264] = (byte)'0';
// Checksum: sum of all header bytes, with checksum field treated as spaces
header.AsSpan(148, 8).Fill((byte)' ');
long cksum = 0;
foreach (var b in header) cksum += b;
WriteOctal(header.AsSpan(148, 7), cksum);
header[155] = (byte)' ';
stream.Write(header);
stream.Write(content);
// Pad to 512-byte boundary
var pad = (512 - content.Length % 512) % 512;
if (pad > 0) stream.Write(new byte[pad]);
}
static void WriteOctal(Span<byte> field, long value)
{
var octal = Convert.ToString(value, 8).PadLeft(field.Length - 1, '0');
for (var i = 0; i < octal.Length && i < field.Length - 1; i++)
field[i] = (byte)octal[i];
field[^1] = 0; // null terminator
}
static void WriteGnuBinary(Span<byte> field, long value)
{
// GNU binary encoding: first byte = 0x80, remaining bytes = big-endian value
field[0] = 0x80;
for (var i = field.Length - 1; i >= 1; i--)
{
field[i] = (byte)(value & 0xFF);
value >>= 8;
}
}
Expected output
=== Part 1: Synthetic tarball (GNU binary UID) ===
Created 135-byte .tgz in memory
[BUG] OverflowException: Arithmetic operation resulted in an overflow.
Stack: at System.Formats.Tar.TarHelpers.ParseOctal[T](ReadOnlySpan`1 buffer)
=== Part 2: Real npm package (statuses@1.5.0) ===
(Uncomment the code block above to test with the real npm package)
When Part 2 is uncommented:
=== Part 2: Real npm package (statuses@1.5.0) ===
Downloading https://registry.npmjs.org/statuses/-/statuses-1.5.0.tgz ...
Downloaded 5482 bytes
[BUG] OverflowException on statuses@1.5.0: Arithmetic operation resulted in an overflow.
This package has GNU binary-encoded UID fields:
PaxHeader/package/HISTORY.md — UID hex: 80 00 00 00 b6 5a 65 38
Decoded UID: 3,059,377,464 (> Int32.MaxValue = 2,147,483,647)
Expected behavior
Code reads the tar successfully
Actual behavior
StackOverflow exception
Regression?
No response
Known Workarounds
N/A
Configuration
.NET 10.0
Other information
No response
Description
Summary
System.Formats.Tar.TarReaderthrowsSystem.OverflowExceptionwhen readingtar archives that contain GNU binary-encoded UID or GID values exceeding
Int32.MaxValue(2,147,483,647). GNU tar uses a binary encoding for numericfields that don't fit in the standard 8-byte octal representation: the high bit
of the first byte is set, and the remaining bytes form a big-endian integer.
.NET's
TarReaderparses these intoint, causing a checked arithmeticoverflow when the decoded value exceeds
Int32.MaxValue.I ran into this while writing some code that unpacked NPM packages. The following package is an example.
statuses@1.5.0ships a tarball with GNU binary-encoded UID fields where some entries decode to values > 2^31, triggering this bug.This is not a one off, but something that happens for many packages downloaded from NPM.
Impact
Any .NET application using
TarReaderto extract npm tarballs (or any tar archive with GNU binary-encoded headers) may hit this unexpectedly. The exception is unrecoverable — there is no way to skip the offending entry or configureTarReaderto ignore UID/GID overflow.Sample Real-world affected package
statuses@1.5.0from the npm registry triggers this bug. It is a transitivedependency of
express,koa,send,finalhandler, and many other popularpackages. Its tarball uses GNU binary-encoded UID fields (byte
0x80prefix):The UID hex bytes for an overflowing entry:
80 00 00 00 b6 5a 65 38— the0x80prefix signals GNU binary encoding, and the decoded value0xB65A6538= 3,059,377,464 exceeds
Int32.MaxValue.Root Cause
In
System.Formats.Tar, theTarHelpers.ParseOctal<T>()method parses numericfields from tar headers. When it encounters GNU binary encoding (first byte has
high bit
0x80set), it decodes the remaining bytes as a big-endian integer andconverts to the target type
T. For UID and GID fields,Tisint(signed32-bit).
The problem: GNU binary encoding can represent values up to
2^56 - 1in an8-byte field (7 data bytes after the
0x80marker). When the decoded valueexceeds
Int32.MaxValue(2,147,483,647), the checked conversion tointthrows
OverflowException.Real-world example from
statuses@1.5.0:80 00 00 00 B6 5A 65 380x80prefix → GNU binary encoding0x000000B65A6538= 3,059,377,464(int)3059377464→ OverflowException (> 2,147,483,647)Environment
Reproduction Steps
Minimal Reproduction
Single-file C# app. Run with
dotnet runin a .NET 9+ or .NET 10 project,or paste into any top-level-statements
Program.cs. No external files needed.Expected output
When Part 2 is uncommented:
Expected behavior
Code reads the tar successfully
Actual behavior
StackOverflow exception
Regression?
No response
Known Workarounds
N/A
Configuration
.NET 10.0
Other information
No response