Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.IO.Packaging part stream has a memory leak when writing a large stream #23750

Closed
twsouthwick opened this issue Oct 5, 2017 · 25 comments
Closed

Comments

@twsouthwick
Copy link
Member

This was found through the DocumentFormat.OpenXML library which uses System.IO.Packaging extensively (original issue: dotnet/Open-XML-SDK#244). The issue logged there is trying to generate a large Excel document which uses a working set of around 10mb on .NET 4.7, while it grows quite quickly until hitting a OutOfMemoryException. I've simplified the issue to remove the dependency on DocumentFormat.OpenXML and it appears to be isolated to writing to a Part within a Package.

Source

using System;
using System.IO;
using System.IO.Packaging;

namespace MemoryRepro
{
    class Program
    {
        static void Main(string[] args)
        {
            using (var fs = new FileStream(Path.GetTempFileName(), FileMode.Create, FileAccess.ReadWrite))
            using (var package = Package.Open(fs, FileMode.Create))
            {
                var part = package.CreatePart(new Uri("/part", UriKind.Relative), "something/sometype");

                using (var stream = part.GetStream())
                using (var writer = new StreamWriter(stream))
                {
                    for (var i = 0; i < int.MaxValue; i++)
                    {
                        writer.Write("hello");
                    }
                }
            }
        }
    }
}

Project

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net47</TargetFramework>
    <!--<TargetFramework>netcoreapp2.0</TargetFramework>-->
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="System.IO.Packaging" Version="4.4.0" />
  </ItemGroup>

</Project>

This repro code appears to have a working set of around 60mb running on .NET 4.7, while it grows very quickly on .NET Core 2.0

The error on .NET Core 2.0 is:

Unhandled Exception: System.IO.IOException: Stream was too long.
   at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
   at System.IO.Compression.WrappedStream.Write(Byte[] buffer, Int32 offset, Int32 count)
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.IO.StreamWriter.Write(String value)
   at MemoryRepro.Program.Main(String[] args) in c:\users\tasou\source\repos\MemoryRepro\MemoryRepro\Program.cs:line 21
@twsouthwick twsouthwick changed the title System.IO.Packaging part stream has a memory leak only on .NET Core System.IO.Packaging part stream has a memory leak when writing a large stream Oct 5, 2017
@twsouthwick
Copy link
Member Author

twsouthwick commented Oct 5, 2017

I did some more investigating, and it's only seen on .NET Core because the .NET 4.6 build just contains type forwarders to the System.IO.Packaging classes in WindowsBase.

It appears that the reason the memory is growing is due to the ZipArchiveMode passed into the ZipArchive. The following is what the packaging library is doing (simplified to show the cause):

static void Main(string[] args)
{
    // Causes memory leak
    var mode = ZipArchiveMode.Update;
    // Does not leak
    //var mode = ZipArchiveMode.Create;

    using (var fs = new FileStream(Path.GetTempFileName(), FileMode.Create, FileAccess.ReadWrite))
    using (var archive = new ZipArchive(fs, mode, true, System.Text.Encoding.UTF8))
    {
        var entry = archive.CreateEntry("entry", CompressionLevel.NoCompression);

        using (var stream = entry.Open())
        using (var writer = new StreamWriter(stream))
        {
            for (var i = 0; i < int.MaxValue; i++)
            {
                writer.Write("hello");
            }
        }
    }
}

The Package creates the ZipArchive here: https://source.dot.net/#System.IO.Packaging/System/IO/Packaging/ZipPackage.cs,349.

@twsouthwick
Copy link
Member Author

Looks like there are a few other open issues with performance issues for ZipArchive. @ianhays Any thoughts?

@stephentoub
Copy link
Member

The following is what the packaging library is doing (simplified to show the cause)

@twsouthwick, it's doing the exact same thing on both .NET Framework and .NET Core?

@twsouthwick
Copy link
Member Author

The distinction between .NET Framework and .NET Core was a red herring; the .NET Framework build was using a System.IO.Packaging that redirects to the implementation in WindowsBase. When I ran the code in corefx manually against .NET Framework, it repros the same. So, yes, it is doing the same thing on both .NET Framework and .NET Core.

@stephentoub
Copy link
Member

So, yes, it is doing the same thing on both .NET Framework and .NET Core.

I'm a little confused :) What is doing the same thing? Is ZipArchive behaving the same between .NET Framework and .NET Core, but System.IO.Packaging is behaving differently between the two? Or is the issue you cited with Packaging also reproing with .NET Framework?

@twsouthwick
Copy link
Member Author

Sorry about that.

  • ZipArchvie is behaving the same between .NET Framework and .NET Core. The ZipArchive snippet is used in the .NET Core implementation of System.IO.Packaging; I don't know what is used in the .NET Framework implementation since it's in WindowsBase.
  • System.IO.Packaging is behaving differently between .NET Framework and .NET Core. On .NET Framework, the original code snippet works; I can create a package and write an arbitrarily large number of items to it. On .NET Core, the original snippet will crash due to memory usage.

@stephentoub
Copy link
Member

Thanks, @twsouthwick. I took a quick look. It looks like Package in WindowsBase uses its own internal ZipArchive that has streaming support.

@twsouthwick
Copy link
Member Author

That would explain it. Are there plans to add streaming support to ZipArchive in CoreFX?

@ianhays
Copy link
Contributor

ianhays commented Oct 9, 2017

There's an issue open for it that's had some good discussion but hasn't moved forward due to it being a lower priority addition.

@cmerat
Copy link

cmerat commented Oct 24, 2017

This has caused different behavior when using the OpenXml SDK library between NetFX and Core. It makes it impossible to create huge spreadsheets without busting memory limits. The implications are rather big as the library can no longer be used if the dataset is large enough in .NET Core.

@twsouthwick
Copy link
Member Author

@ianhays Is there a timeline as to when a design may be available? What can be done to help move that forward? This issue is blocking for people who generate large office documents on the fly and it would be nice to know what steps need to be taken.

@ianhays
Copy link
Contributor

ianhays commented Oct 26, 2017

There is not a timeline currently. More "likes" on the issue always helps as does discussion of the API on the issue and comments explaining use cases.

It's pretty high up on the priority list as far as compression investments go.

@LarinLive
Copy link

IGNOREME: I am commenting here so I can easily find this issue again in the future, since I can't do that by subscribing alone.

@igorko
Copy link

igorko commented May 21, 2018

Voting for this to be fixed.

@srini1978
Copy link

I have an Asp.net core application but my target framework is .net 4.6.2 . based on what I understand, I should not face this issue.

@twsouthwick
Copy link
Member Author

@srini1978 The System.IO.Packaging on .NET Framework and .NET Core use different implementations of ZipArchive. Running on .NET Framework will not hit this issue.

@ivaylo-botusharov
Copy link

I am also voting for this problem to be fixed.

@Konsts
Copy link

Konsts commented Jul 2, 2018

catch this bug too! In .net core the DocX library for working with .doc files does not work correctly
The issue was opened half a year ago and there are no changes =(

@twsouthwick
Copy link
Member Author

@ianhays Is this (and the zip streaming feature) being tracked for .NET Core 2.2?

@ianhays
Copy link
Contributor

ianhays commented Sep 28, 2018

@twsouthwick They are not being tracked for .NET Core 2.2 or 3.0 to my knowledge.

cc: @terrajobst

@LarinLive
Copy link

Hello. Almost a year has passed but the bug seems not going to be fixed. This error blocks using of .NET Core in our reporting solution.

@alanisaac
Copy link
Contributor

+1 -- this is blocking my team from creating a scalable Excel file writing solution in .NET Core using Open-XML-SDK.

@ericstj
Copy link
Member

ericstj commented Feb 27, 2019

I took a closer look at this. It's very much related to the ZipArchiveEntry behavior mentioned here; https://github.com/dotnet/corefx/issues/11669#issuecomment-468016815
Also mentioned by @twsouthwick above.

When opening a Package with ReadWrite access the underlying archive is opened in Update mode which causes all entries to be buffered completely to a MemoryStream. The MemoryStream has an upper limit of int.MaxValue, so in addition to causing this to use a lot more memory than it needs, it also means that the upper limit of entries it can deal with is int.MaxValue.

When opening a Package with FileAccess.Read or FileAccess.Write you won't hit the case where ZipArchiveEntry stores uncompressed data in memory. This can permit you to work with Packages with large files: only open them for FileAccess.Read or FileAccess.Write.

Today there is a bug in the .NETCore implementation of Package which blocks the use of FileAccess.Write. I have a fix for that which I'll submit shortly which should unblock the issue pointed out in the original posting here.

Although S.IO.Packaging on desktop does define "streaming" support, that's not actually at play here as far as I can tell. One thing that is at play is that on desktop the System.IO.Packaging implementation of zip had a fancier stream for the update case. It behaved in a similar way to ZipArchiveEntry where updates would decompress the entire entry for access, but it would back that decompressed stream in a mix of memory+file. https://referencesource.microsoft.com/#WindowsBase/Base/MS/Internal/IO/Packaging/CompressStream.cs,716

There's still the scenario of a package opened for Update that needs to Read / Write large files. For that, let's let issue #1544 track the improvement to ZipArchiveEntry to permit it and dotnet/corefx#31362 track using that in System.IO.Packaging.

@salvois
Copy link

salvois commented Jan 26, 2020

Greetings,
apparently this issue is still present. Creating a streamed XLSX file using the OpenXmlWriter of Office Open XML SDK 2.10.0, using System.IO.Packaging 4.7.0, on .net core 2.1, memory consumption is not constant but proportional to the amount of data written. This does not happen on .net framework 4.6.2, which exhibits constant, low memory usage as expected.
Any hint is appreciated.
Thanks

@twsouthwick
Copy link
Member Author

@salvois Looks like the issue described here is slightly different. The issue you'll want to track is #1544

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests