Fix ZipArchive Update removing data descriptors#126447
Fix ZipArchive Update removing data descriptors#126447bwinsley wants to merge 26 commits intodotnet:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a regression test in System.IO.Compression to validate ZIP archives created in “streaming” mode (non-seekable stream, data-descriptor bit set) remain structurally valid after reopening in ZipArchiveMode.Update and adding an entry.
Changes:
- Added
System.Buffers.Binaryusage to inspect ZIP bytes. - Added a new
[Theory]test that creates a streaming ZIP, validates data-descriptor markers, updates the archive by adding an entry, then validates readability/structure again.
src/libraries/System.IO.Compression/tests/ZipArchive/zip_UpdateTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/tests/ZipArchive/zip_UpdateTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/tests/ZipArchive/zip_UpdateTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/tests/ZipArchive/zip_UpdateTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.Async.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.Async.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Outdated
Show resolved
Hide resolved
… in compressed data
…ditions into functions
|
@dotnet-policy-service agree company="Displayr" |
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.Async.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/tests/ZipArchive/zip_UpdateTests.cs
Outdated
Show resolved
Hide resolved
One could argue that entries with DataDescriptor bit aren't sequentially readable anyway, since the size of the entry is not known up-front (size in the local header is 0), you don't know where the DataDescriptor is (looking for the magic bytes header is not reliable). I agree that we should preserve the bits and the data descriptor though. |
| ComputeEntryEndOffsets(); | ||
|
|
||
| for (int i = 0; i < _entries.Count; i++) |
There was a problem hiding this comment.
can we avoid iterating over the _entries twice?
There was a problem hiding this comment.
Unfortunately not without heavy refactoring as ComputeEntryEndOffsets() iterates backwards and this loop on L371 iterates forwards and needs the values from ComputeEntryEndOffsets(). This should still be O(n) anyways and ComputeEntryEndOffsets() is not a very heavy/intensive function
There was a problem hiding this comment.
I mean, you can do something like
_entries[i].EndOfLocalEntryData = i < _entries.Length - 1
? _entries[i + 1].OffsetOfLocalHeader
: _centralDirectoryStartThere was a problem hiding this comment.
It might actually make sense to do this at the time of actually reading the central directory for the first time, that way we don't need to care about OriginallyInTheArchive
There was a problem hiding this comment.
Excellent pick up. I've moved it to where we read the central directory now
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.Async.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Show resolved
Hide resolved
…scriptor rewrites
… data descriptor rewrites" This reverts commit 1fc1762.
… rewrites in ZipArchive Update mode
|
@rzikm @alinpahontu2912 Have implemented suggestions where possible and left replies where not. Please let me know if anything is missing or needs updating! |
|
|
||
| entriesToWrite = new(_entries.Count); | ||
| foreach (ZipArchiveEntry entry in _entries) | ||
|
|
| entriesToWrite = new(_entries.Count); | ||
|
|
||
| foreach (ZipArchiveEntry entry in _entries) | ||
| for (int i = 0; i < _entries.Count; i++) |
There was a problem hiding this comment.
same question about changing this foreach loop
|
|
||
| // return value is true if we allocated an extra field for 64 bit headers, un/compressed size | ||
| private async Task<bool> WriteLocalFileHeaderAsync(bool isEmptyFile, bool forceWrite, CancellationToken cancellationToken) | ||
| private async Task<bool> WriteLocalFileHeaderAsync(bool isEmptyFile, bool forceWrite, CancellationToken cancellationToken, bool preserveDataDescriptor = false) |
There was a problem hiding this comment.
Usually cancellationtoken is the last parameter, what do we think about this @rzikm ?
| // directly into the header, so a data descriptor is not needed. | ||
| if (isEmptyFile || _archive.ArchiveStream.CanSeek) | ||
| { | ||
| if (preserveDataDescriptor && (_generalPurposeBitFlag & BitFlagValues.DataDescriptor) != 0) |
There was a problem hiding this comment.
(_generalPurposeBitFlag & BitFlagValues.DataDescriptor) != 0 won't this always be true if preserveDataDescriptor is true ?
Fix ZipArchive Update mode corruption when entries have data descriptors (bit 3)
Fixes #126344
Description
When opening an existing ZIP in
ZipArchiveMode.Updateand disposing (which triggers the write-back), the offset calculations inWriteFileCalculateOffsetsand the stream seeking inWriteLocalFileHeaderAndDataIfNeeded/WriteLocalFileHeaderAndDataIfNeededAsyncdid not account for the data descriptor bytes (12–24 bytes depending on format) that follow compressed data when general purpose bit flag bit 3 is set. This caused new or shifted entries to overwrite data descriptors, corrupting the archive.This commonly affects archives created on non-seekable streams (which always set bit 3) or by external tools such as Java's
ZipOutputStream, Azure blob storage SDKs, and similar.This is a .NET 10 regression introduced by PR #102704's selective-rewrite optimization.
Changes
Offset calculation fix (
ZipArchive.cs,ZipArchive.Async.cs)ComputeEntryEndOffsets()— New method that precomputesEndOfLocalEntryDatafor each originally-in-archive entry in a single O(n) reverse pass. Since_entriesis sorted by local header offset, each entry's end boundary is the next original entry'sOffsetOfLocalHeader, or_centralDirectoryStartfor the last entry. This naturally accounts for any trailing data (data descriptors, padding, etc.) without any stream I/O.WriteFileCalculateOffsets— Now usesentry.EndOfLocalEntryDatainstead ofGetOffsetOfCompressedData() + CompressedLength, which did not include data descriptor bytes.WriteFile()andWriteFileAsync()callComputeEntryEndOffsets()before processing entries.Metadata-only seek fix (
ZipArchiveEntry.cs,ZipArchiveEntry.Async.cs)EndOfLocalEntryDataposition instead of using a relative seek by_compressedSize. This single seek correctly advances past both compressed data and any trailing data descriptor, with zero stream reads.Bit 3 (data descriptor flag) handling
WriteLocalFileHeaderInitializenow clears bit 3 only when the header is actually being written (not when it's skipped for unchanged entries) and only for seekable/empty-file paths. This means unchanged entries that skip header writing preserve their original bit 3 flag, keeping the local header, central directory, and on-disk data descriptor consistent — no save/restore or seek-back patching needed.Testing
Three new regression tests in
zip_UpdateTests.cs, all parameterized onbool asyncvia[Theory]/[MemberData]to exercise syncDisposeand asyncDisposeAsynccode paths:Update_DataDescriptorSignature_IsCorrectlyWrittenAndPreservedCreates a 3-entry archive via a non-seekable stream (forces bit 3 / data descriptors) with
NoCompression, reopens in Update mode, adds a new entry. Performs a structural binary walk of the updated archive: EOCD → central directory → local file headers, verifying bit 3 flags on original entries and data descriptor signatures at the correct computed byte offsets.Update_DataDescriptorWithDeletedEntry_PreservesArchiveCreates a 5-entry archive via a non-seekable stream, reopens in Update mode, deletes a middle entry. Verifies the remaining 4 entries are readable with correct data. Exercises offset recalculation in
ComputeEntryEndOffsetswhen entries with data descriptors are removed and subsequent entries must shift.Update_DataDescriptorWithMetadataOnlyChange_PreservesArchiveCreates a 3-entry archive via a non-seekable stream, reopens in Update mode, changes
LastWriteTimeon the middle entry without opening its stream (exercises the metadata-only rewrite path), adds a new entry. Reopens in Read mode and verifies all original entries' data intact, the metadata change was preserved, and the new entry is correct.Test results
All 1739 tests pass with 0 errors, 0 failures (1733 existing + 6 new test runs).