New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GZipStream & DeflateStream calling Flush after write or read is not equivalent to calling Dispose. #3669

Closed
redknightlois opened this Issue Oct 6, 2015 · 5 comments

Comments

Projects
None yet
5 participants
@redknightlois

redknightlois commented Oct 6, 2015

.Dispose() will purge buffers and finish writing data while .Flush() will not force that to happen.
While in other Streams the .Flush() call will ensure that all the data is written, it is not the case for GZipStream and DefrateStream.

For example if we modify RoundtripCompressDecompress in the System.IO.Compression.Tests project in the following way, all tests using it will fail.

private async Task RoundtripCompressDecompress(bool useAsync, bool useGzip, int chunkSize, int totalSize, CompressionLevel level)
{
    byte[] data = new byte[totalSize];
    new Random(42).NextBytes(data);

    var compressed = new MemoryStream();
    var compressor = useGzip ? (Stream)new GZipStream(compressed, level, true) : new DeflateStream(compressed, level, true);
    {
        for (int i = 0; i < data.Length; i += chunkSize) // not using CopyTo{Async} due to optimizations in MemoryStream's implementation that avoid what we're trying to test
        {
            switch (useAsync)
            {
                case true: await compressor.WriteAsync(data, i, chunkSize); break;
                case false: compressor.Write(data, i, chunkSize); break;
            }
        }
    }
    compressor.Flush();
    compressed.Position = 0;

    var decompressed = new MemoryStream();
    var decompressor = useGzip ? (Stream)new GZipStream(compressed, CompressionMode.Decompress, true) : new DeflateStream(compressed, CompressionMode.Decompress, true);
    {
        if (useAsync)
            decompressor.CopyTo(decompressed, chunkSize);
        else
            await decompressor.CopyToAsync(decompressed, chunkSize, CancellationToken.None);
    }
    decompressor.Flush();

    Assert.Equal<byte>(data, decompressed.ToArray());

    compressor.Dispose();
    decompressor.Dispose();
}

That means that GZipStream & DefrateStream will not finish writing until they are effectively disposed (which could take some time if .Dispose() is not called immediately) and may also fail if chained.

This is particularly relevant for #1991 (cc @stephentoub)

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Oct 6, 2015

Member

I believe this is by design, but @FiveTimesTheFun can comment.

Member

stephentoub commented Oct 6, 2015

I believe this is by design, but @FiveTimesTheFun can comment.

@joshfree joshfree changed the title from GZipStream & DefrateStream calling Flush after write or read is not equivalent to calling Dispose. to GZipStream & DeflateStream calling Flush after write or read is not equivalent to calling Dispose. Oct 7, 2015

@joshfree joshfree added this to the 1.0.0-rc1 milestone Oct 7, 2015

@FiveTimesTheFun

This comment has been minimized.

Show comment
Hide comment
@FiveTimesTheFun

FiveTimesTheFun Oct 19, 2015

Contributor

Although that seems to be the current implementation, I don't know why it should be. I would expect that Flush() on a Deflate or GZip stream would clear its internal buffers and write out the data.

Right now, Flush() just calls EnsureNotDisposed
https://github.com/dotnet/corefx/blob/master/src/System.IO.Compression/src/System/IO/Compression/DeflateStream.cs#L248

while Dispose() calls the private PurgeBuffers() which implements the flush behavior
https://github.com/dotnet/corefx/blob/master/src/System.IO.Compression/src/System/IO/Compression/DeflateStream.cs#L580

And PurgeBuffers() calls Flush(), which seems wrong, because it seems that would cause Dispose() to throw ObjectDisposedException after the first time.

I'm not sure why it is this way, I think that PurgeBuffers should no longer call Flush or throw an exception if the stream is disposed, I think Flush should throw an exception if the stream is disposed and then call PurgeBuffers, and Dispose should continue to call PurgeBuffers.

Contributor

FiveTimesTheFun commented Oct 19, 2015

Although that seems to be the current implementation, I don't know why it should be. I would expect that Flush() on a Deflate or GZip stream would clear its internal buffers and write out the data.

Right now, Flush() just calls EnsureNotDisposed
https://github.com/dotnet/corefx/blob/master/src/System.IO.Compression/src/System/IO/Compression/DeflateStream.cs#L248

while Dispose() calls the private PurgeBuffers() which implements the flush behavior
https://github.com/dotnet/corefx/blob/master/src/System.IO.Compression/src/System/IO/Compression/DeflateStream.cs#L580

And PurgeBuffers() calls Flush(), which seems wrong, because it seems that would cause Dispose() to throw ObjectDisposedException after the first time.

I'm not sure why it is this way, I think that PurgeBuffers should no longer call Flush or throw an exception if the stream is disposed, I think Flush should throw an exception if the stream is disposed and then call PurgeBuffers, and Dispose should continue to call PurgeBuffers.

@joshfree joshfree assigned ianhays and unassigned FiveTimesTheFun Oct 19, 2015

@joshfree joshfree modified the milestones: 1.0.0-rc2, 1.0.0-rc1 Oct 26, 2015

@ianhays

This comment has been minimized.

Show comment
Hide comment
@ianhays

ianhays Nov 5, 2015

Member

I agree that the current implementation doesn't seem to make much sense, but I assume it is that way for a reason, The same behavior is present on Desktop and in the docs, so modifying it would be a potentially breaking change.

If we're going to modify this, it should be in a way that doesn't break people porting from Desktop. I can think of a few primary correctness scenarios around Flush/Dispose for DeflateStream that we should have tests for:

  • flushing twice in a row (currently doesn't throw)
  • disposing twice in a row (currently doesn't throw)
  • flush then dispose (currently doesn't throw)
  • dispose then flush (currently throws ObjectDisposedException)

And then there's also the test in the OP that covers the time at which data is purged. There's a reason why Flush for DeflateStream is currently equivalent to EnsureNotDisposed, but I can't think of a good excuse for it - what is it about a DeflateStream that makes us not want to Flush before disposal? Is that reason good enough to justify Flush not Flushing?

The only reason I can think of is that the Dispose method writes out the footer information for the format and closes the stream. I could imagine a Flush that would only write out the user data and leave footer information to the Disposal method, but then we're back to failing the tests in the OP.

Member

ianhays commented Nov 5, 2015

I agree that the current implementation doesn't seem to make much sense, but I assume it is that way for a reason, The same behavior is present on Desktop and in the docs, so modifying it would be a potentially breaking change.

If we're going to modify this, it should be in a way that doesn't break people porting from Desktop. I can think of a few primary correctness scenarios around Flush/Dispose for DeflateStream that we should have tests for:

  • flushing twice in a row (currently doesn't throw)
  • disposing twice in a row (currently doesn't throw)
  • flush then dispose (currently doesn't throw)
  • dispose then flush (currently throws ObjectDisposedException)

And then there's also the test in the OP that covers the time at which data is purged. There's a reason why Flush for DeflateStream is currently equivalent to EnsureNotDisposed, but I can't think of a good excuse for it - what is it about a DeflateStream that makes us not want to Flush before disposal? Is that reason good enough to justify Flush not Flushing?

The only reason I can think of is that the Dispose method writes out the footer information for the format and closes the stream. I could imagine a Flush that would only write out the user data and leave footer information to the Disposal method, but then we're back to failing the tests in the OP.

@ianhays

This comment has been minimized.

Show comment
Hide comment
@ianhays

ianhays Nov 17, 2015

Member

I've thought about this more and the way I see it we have two choices:

  • Leave Flush as it is now as a public indirection to EnsureNotDisposed and stick to the the docs
  • Modify Flush to (actually) flush write data in the buffer but not flush footer data. This shouldn't fail the test in the OP now that I think of it.

There a lot of indirections going on here so I'll try to describe the current structure of the write/dispose system now without going into too much detail:

  • A call to DeflateStream.Write consists of two main operations: Zlib.Deflate(input bytes, ZFlushCode.NoFlush) and a write to the underlying output stream with the results of that Deflate.
  • A call to Dispose first calls Zlib.Deflate(any remaining bytes in buffer, ZFlushCode.Finish) and then writes to the underlying output stream with the results of Deflate. After doing this, it performs a write to the underlying output stream with the header data.
  • A call to Flush does nothing to the underlying stream of the input buffer.

If we wanted Flush to do something, I imagine it would look like so:

  • DeflateStream.Write is functionally unchanged
  • DeflateStream.Dispose is functionally unchanged
  • DeflateStream.Flush calls Zlib.Deflate(any remaining bytes in the buffer, ZFlushCode.SyncFlush) and writes the output the underlying output stream.

We would still have Flush call EnsureNotDisposed, but I can imagine this change would still break someone somewhere.

Thoughts?

Member

ianhays commented Nov 17, 2015

I've thought about this more and the way I see it we have two choices:

  • Leave Flush as it is now as a public indirection to EnsureNotDisposed and stick to the the docs
  • Modify Flush to (actually) flush write data in the buffer but not flush footer data. This shouldn't fail the test in the OP now that I think of it.

There a lot of indirections going on here so I'll try to describe the current structure of the write/dispose system now without going into too much detail:

  • A call to DeflateStream.Write consists of two main operations: Zlib.Deflate(input bytes, ZFlushCode.NoFlush) and a write to the underlying output stream with the results of that Deflate.
  • A call to Dispose first calls Zlib.Deflate(any remaining bytes in buffer, ZFlushCode.Finish) and then writes to the underlying output stream with the results of Deflate. After doing this, it performs a write to the underlying output stream with the header data.
  • A call to Flush does nothing to the underlying stream of the input buffer.

If we wanted Flush to do something, I imagine it would look like so:

  • DeflateStream.Write is functionally unchanged
  • DeflateStream.Dispose is functionally unchanged
  • DeflateStream.Flush calls Zlib.Deflate(any remaining bytes in the buffer, ZFlushCode.SyncFlush) and writes the output the underlying output stream.

We would still have Flush call EnsureNotDisposed, but I can imagine this change would still break someone somewhere.

Thoughts?

ianhays added a commit to ianhays/corefx that referenced this issue Dec 8, 2015

Modify System.IO.Compression.DeflateStream.Flush to actually flush.
The existing implementation of Flush and FlushAsync for System.IO.Compression.DeflateStream doesn't actually flush - it only checks that the stream isn't disposed. This commit modifies Flush to Flush and also adds unit tests around the new functionality.

A more detailed discussion is available at #3669

ianhays added a commit to ianhays/corefx that referenced this issue Dec 10, 2015

Modify System.IO.Compression.DeflateStream.Flush to actually flush.
The existing implementation of Flush and FlushAsync for System.IO.Compression.DeflateStream doesn't actually flush - it only checks that the stream isn't disposed. This commit modifies Flush to Flush and also adds unit tests around the new functionality.

A more detailed discussion is available at #3669
@ianhays

This comment has been minimized.

Show comment
Hide comment
@ianhays

ianhays Dec 11, 2015

Member

resolved in #4879

Member

ianhays commented Dec 11, 2015

resolved in #4879

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment