Add Flush to ZipArchive #24149

twsouthwick · 2017-11-15T17:58:36Z

When creating a package on .NET Core, you have to call Dispose or Close to get the contents written out to the underlying stream. This works on .NET Framework, but fails on .NET Core:

[Fact]
public void FlushSavesPackageContents()
{
   var uri = new Uri("/something", UriKind.Relative);
   var contentType = "something/other";

   byte[] CreatePackage()
   {
       using (var ms = new MemoryStream())
       {
           using (var package = Package.Open(ms, FileMode.Create))
           {
               package.CreatePart(uri, contentType);

               Assert.Empty(ms.ToArray());

               package.Flush();

               // All data should be written after calling flush
               return ms.ToArray();
           }
       }
   }

   var bytes = CreatePackage();

   Assert.NotEmpty(bytes);

   using (var ms = new MemoryStream(bytes))
   using (var package = Package.Open(ms))
   {
       Assert.True(package.PartExists(uri));
       Assert.Equal(contentType, package.GetPart(uri).ContentType);
   }
}

twsouthwick · 2017-11-15T18:01:28Z

It appears that this is due to not flushing the underlying zip. In .NET Framework it is flushed, but not in .NET Core. I'll try adding that in and see if it works.

twsouthwick · 2017-11-15T19:17:31Z

@ianhays ZipArchive does not have a Flush method. Is there a way to flush the current archive state to a stream, or is this part of the issue to get streaming support for compression?

ianhays · 2017-11-15T20:52:17Z

I'm not sure why ZipArchive in netcore doesn't have a flush method. It was ported to CoreFX without one. It could be that it was removed since it was internal-only in netfx and not used in netcore.

We could probably add it as public, at least for ZipArchiveMode.Create. I'm not sure how that would affect being able to open some entries though. We would need to add lots of tests to first verify the netfx Flushing behavior - which may be tricky since it's not exposed through ZipArchive - then try to match that in core. At that point a streaming API may be the better solution.

twsouthwick · 2017-11-15T21:29:28Z

Yeah - I took a shot at just calling the WriteFile method (which is what the Dispose method calls) in lieu of a flush and it caused a lot of problems. It seems to close streams, so it expects to be called in the dispose. Exposing it would be nice, but I agree that there will be a need to add a bunch of tests for it.

We seem to be finding many issues with System.IO.Packaging in the DocumentFormat.OpenXml library that would be fixed with the streaming API...

daniel-white · 2018-01-18T20:14:14Z

Any progress on this? we use DocumentFormat.OpenXml and it has intentional bugs that keep parity between .net fx and .net core. see dotnet/Open-XML-SDK#380

ianhays · 2018-01-18T20:57:00Z

There hasn't been any progress that I'm aware of, no. I've updated the title and the tags with the current state of the issue and marked it as up-for-grabs.

The work item here is to plan an API addition of Flush to ZipArchive. To do that, a formal proposal will have to be made and the design questions in the comments above should be answered.

ghost · 2018-07-21T19:37:21Z

So I have just hit this issue too. Porting some code from FX to .Net. Need ZipPackage to be able to flush the stream. Is there any workaround ? Else, I can take on adding the API to ZipArchive.

odhanson · 2018-07-23T13:31:40Z

Sorry, replaced my github account. So the message above is from me.

I have made the changes to add a Flush API and added tests to verify. I will follow through the procedures for getting the API and change approved.

SinxHe · 2018-08-10T08:37:29Z

when will the flush function will be added?

odhanson · 2018-08-11T15:24:55Z

@SinxHe , unfortunately fixing this issue is a bit more tricky than I initially anticipated. As I have managed to work around this issue for now, I am not sure when I will be able to get back to work on a fix for this.

twsouthwick · 2018-08-20T16:34:50Z

@odhanson Thanks for taking a look at this. Can you share a fork with your current progress so someone else can build off of what you've tried?

odhanson · 2018-08-20T17:09:24Z

@twsouthwick

Sure. I have made a fork under my account:
https://github.com/odhanson/corefx/tree/dev/Flush

There are two commits, the first adds the API and some tests. The second implements the API, and adds some tests that failed my implementation :(. So the good news is, that we do have a bunch of tests to base off of.

MathewRimmington · 2020-06-25T16:55:02Z

Hi, I'm wondering if there has been any progress on this issue? I'm seeing an out of memory exception when trying to create a huge spreadsheet. As the OpenXmlWriter is writing contents the memory usage continues to creep up, but there is currently no way to flush the data so the entire contents of the sheet remain in memory.

I'm also curious to know what the work around is.

carlossanlop · 2021-11-18T16:11:41Z

Thank you all for the patience. We are planning work for .NET 7 and fixing the bugs affecting System.IO.Packaging is most likely going to be among our top priorities for the Compression area specifically.

@odhanson Thank you for the code you provided: https://github.com/odhanson/corefx/tree/dev/Flush . It was extremely helpful!

I debugged it to understand the problem better (I adapted them to dotnet/runtime of course) and I would like to share the following feedback:

The Flush method is writing the entries into the target stream via WriteFile(), just like Dispose, but without disposing the internal streams afterwards. This means we can call Flush inside Dispose to avoid the code duplication. It's a simple enough change.
I noticed that you added a line to clear the entries at the end of WriteFile. I assume you added this after running the unit tests and noticing that, after calling Flush(), the file was getting the entries written twice.
To fix this problem, we need to add logic to detect flushing without disposal separately for Update and for Create. This wouldn't be a trivial change.

The logic to handle creation and update is too convoluted and intermixed. So before making any Flush-related changes, I'm thinking we could potentially benefit from refactoring the code using the strategy pattern, similarly to what we did for FileStream in .NET 6.

After we refactor the code, we could implement Flush more easily, and have clear and separate logic for creation and for updating.

I also need to keep in mind Ian's concern shared above:

We could probably add it as public, at least for ZipArchiveMode.Create. I'm not sure how that would affect being able to open some entries though. We would need to add lots of tests to first verify the netfx Flushing behavior - which may be tricky since it's not exposed through ZipArchive - then try to match that in core. At that point a streaming API may be the better solution.

@twsouthwick I'll continue investigating how feasible it is to add Flush and will let you know if/when I have an API proposal.

carlossanlop · 2021-11-21T18:39:19Z

@twsouthwick (and/or anyone else monitoring this issue),

I have some questions that arised after I read this:
https://en.wikipedia.org/wiki/ZIP_(file_format)#Structure

Let's ignore for a moment what .NET Framework is doing. What behavior do you expect from Flush when...

The stream is non-seekable?
The stream is seekable?

Non-seekable stream

Let's say you're working with a stream that is being sent to you from the network.

Deleting

If we delete an entry, then call Flush, we can't really delete the file, so will simply append an updated Central Directory at the end of the stream that would exclude the "deleted" file from the list.
Alternative: we save the location where the old CD starts, then we overwrite the CD with the new one, which would exclude the "deleted" entry from its existing list.

Note: We currently cannot open a ZipArchive with ZipArchiveMode.Update if the stream is non-seekable, but I'm exploring the option of removing the restriction if it's feasible.

Adding

If we add a new file, then call Flush, we append the new entry at the end of the file, then immediately afterwards write an updated Central Directory that contains the new file.
Alternative: we save the location where the old CD starts, then we write the new entry right where the old CD starts (overwriting it, obviously) then we would have to write a new CD after the file.

Seekable stream

Let's say you're working with a zip file loaded from disk using a FileStream:

Deleting

If we delete a file, then call Flush, we could in theory rewrite the whole archive from start to end, ensuring we skip the deleted file and updating the CD list at the end to exclude the deleted file.
Alternative (more complicated): We iterate through the list of entries, find the first one marked as deleted, located the first byte of its header in the archive stream, then start overwriting it with the subsequent entries that are not marked as deleted, until all are rewritten, and finally write the updated CD.

Adding

I think the behavior would be the same as described for non-seekable.

Updating a file entry's stream

Similar to deleting, we mark it as a file whose contents changed, then we either rewrite the whole archive from start to end, or
We find the first modified entry in the CD list and start overwriting everything after that file, including it, then write the updated CD with the new sizes.

ashahabov · 2022-09-15T10:36:39Z

If I understood correctly, the fix is planned for .NET 8 time?😥

exelsior · 2023-12-05T19:53:28Z

Any progress on this?

exelsior · 2023-12-07T22:18:27Z

Hello, could you please recommend third party package with zip flush support?

edwardneal · 2024-05-29T06:55:24Z

I recently submitted #102704, which implements the logic @carlossanlop describes in Flush for seekable streams (although I've admittedly only just seen the comment - wish I'd seen it earlier!) Hopefully this disposes of the more complex case.

The only ZipArchiveMode which can modify a non-seekable stream is Create. The first Flush here is simple, and I'm tempted say we should throw an exception for any subsequent Flush calls:

At that point, we're no longer Creating an archive, we're modifying it
Any modifications we perform are going to be technically valid but strange. Adding a new archive entry is broadly fine, but modifying an archive entry will mean that we abandon the original copy and re-append it again to the file with the updated CD. This'll result in archives which are much larger than they need to be, and which will return a different set of results when streamed (because the abandoned entry will be detected until the CD is parsed.)

ianhays changed the title ~~Calling Flush on System.IO.Packaging.Package does not flush contents~~ Add Flush to ZipArchive Jan 18, 2018

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the 5.0 milestone Jan 31, 2020

ericstj added area-System.IO.Compression and removed area-System.IO.Packaging labels Feb 19, 2020

carlossanlop modified the milestones: 5.0.0, Future Jun 18, 2020

carlossanlop added this to To do in System.IO - Compression Jan 15, 2021

carlossanlop mentioned this issue Nov 18, 2021

ZipArchive: Apply strategy pattern depending on ZipArchiveMode #61820

Closed

ghost added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Nov 21, 2021

carlossanlop mentioned this issue Dec 10, 2021

System.IO.Compression work planned for .NET 7 #62658

Closed

28 tasks

jeffhandley modified the milestones: Future, 7.0.0 Jan 9, 2022

jeffhandley modified the milestones: 7.0.0, 8.0.0 Jul 9, 2022

ashahabov mentioned this issue Sep 29, 2022

No support of System.IO.Packaging.Package.Save()/Flush() in .NET Core dotnet/Open-XML-SDK#1211

Closed

ViktorHofer modified the milestones: 8.0.0, Future Jul 12, 2023

ulrichb mentioned this issue Nov 7, 2023

ZipArchive in Update mode loads whole ZIP file into memory #94455

Closed

GerardSmit mentioned this issue Dec 11, 2023

Is it possible to save FileSystem changes to disk? xoofx/zio#81

Closed

edwardneal mentioned this issue May 28, 2024

SpreadsheetDocument.Dispose() throws System.ObjectDisposedException dotnet/Open-XML-SDK#1702

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Flush to ZipArchive #24149

Add Flush to ZipArchive #24149

twsouthwick commented Nov 15, 2017

twsouthwick commented Nov 15, 2017

twsouthwick commented Nov 15, 2017

ianhays commented Nov 15, 2017

twsouthwick commented Nov 15, 2017

daniel-white commented Jan 18, 2018

ianhays commented Jan 18, 2018

ghost commented Jul 21, 2018

odhanson commented Jul 23, 2018

SinxHe commented Aug 10, 2018

odhanson commented Aug 11, 2018

twsouthwick commented Aug 20, 2018

odhanson commented Aug 20, 2018 •

edited

Loading

MathewRimmington commented Jun 25, 2020

carlossanlop commented Nov 18, 2021 •

edited

Loading

carlossanlop commented Nov 21, 2021 •

edited

Loading

ashahabov commented Sep 15, 2022

exelsior commented Dec 5, 2023

exelsior commented Dec 7, 2023

edwardneal commented May 29, 2024

Add Flush to ZipArchive #24149

Add Flush to ZipArchive #24149

Comments

twsouthwick commented Nov 15, 2017

twsouthwick commented Nov 15, 2017

twsouthwick commented Nov 15, 2017

ianhays commented Nov 15, 2017

twsouthwick commented Nov 15, 2017

daniel-white commented Jan 18, 2018

ianhays commented Jan 18, 2018

ghost commented Jul 21, 2018

odhanson commented Jul 23, 2018

SinxHe commented Aug 10, 2018

odhanson commented Aug 11, 2018

twsouthwick commented Aug 20, 2018

odhanson commented Aug 20, 2018 • edited Loading

MathewRimmington commented Jun 25, 2020

carlossanlop commented Nov 18, 2021 • edited Loading

carlossanlop commented Nov 21, 2021 • edited Loading

Non-seekable stream

Deleting

Adding

Seekable stream

Deleting

Adding

Updating a file entry's stream

ashahabov commented Sep 15, 2022

exelsior commented Dec 5, 2023

exelsior commented Dec 7, 2023

edwardneal commented May 29, 2024

odhanson commented Aug 20, 2018 •

edited

Loading

carlossanlop commented Nov 18, 2021 •

edited

Loading

carlossanlop commented Nov 21, 2021 •

edited

Loading