New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utility.Skip uses seek #336

Merged
merged 1 commit into from Jan 10, 2018

Conversation

Projects
None yet
3 participants
@diontools

diontools commented Jan 2, 2018

Hello.
I have a large zip file.(~100GB 871kEntries)
I tried below code.

var sw = Stopwatch.StartNew();
using (var stream = new FileStream(@"large.zip", FileMode.Open, FileAccess.Read, FileShare.Read))
using (var reader = ZipReader.Open(stream, new SharpCompress.Readers.ReaderOptions { LeaveStreamOpen = true }))
{
    while (reader.MoveToNextEntry())
    {
        //Console.WriteLine(reader.Entry.Key);
    }
}
Console.WriteLine(sw.Elapsed);

It took about 13 minutes in my machine.
However, when this PR is applied, it was shortened to 10 seconds.
Does this cause problems?

I'm sorry for my poor English.
Thank you for reading.

@adamhathcock

This comment has been minimized.

Owner

adamhathcock commented Jan 2, 2018

Use the Archive API for random access. The Reader API intentionally doesn’t use Seek

@simmotech

This comment has been minimized.

simmotech commented Jan 2, 2018

The change conditionally uses CanSeek to perform a Skip much quicker where possible - why is this considered wrong?

@diontools

This comment has been minimized.

diontools commented Jan 2, 2018

Thank you for reply.
I understood Reader spec.

Archive API is very useful, but since Entry is cached, memory space is needed.
In my file it will be about 400 MB.
I'd like to reduce a little more.

I'm trying to extract some files from Zip.
I hope a way to filter Entry.
For example, it is like this.

using (var zip = ZipArchive.Open(stream, ...))
{
    foreach (var entry in zip.LoadEntries())
    {
        if (IsTargetFile(entry.Key))
        {
            entry.WriteToDirectory(...);
        }
    }
}

Current ZipArchive.LoadEntries is protected, but I think that it would be nice if there was an API similar to this.
Thank you for reading.

@adamhathcock

This comment has been minimized.

Owner

adamhathcock commented Jan 2, 2018

Fair enough. I shouldn’t be so rigid. Hopefully no streams throw when accessing CanSeek

@diontools

This comment has been minimized.

diontools commented Jan 4, 2018

Thank you so much. PR closing.

@diontools diontools closed this Jan 4, 2018

@adamhathcock adamhathcock reopened this Jan 4, 2018

@adamhathcock adamhathcock merged commit d8c8dab into adamhathcock:master Jan 10, 2018

1 check passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment