Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utility.Skip uses seek #336

Merged
merged 1 commit into from Jan 10, 2018
Merged

Conversation

@diontools
Copy link

@diontools diontools commented Jan 2, 2018

Hello.
I have a large zip file.(~100GB 871kEntries)
I tried below code.

var sw = Stopwatch.StartNew();
using (var stream = new FileStream(@"large.zip", FileMode.Open, FileAccess.Read, FileShare.Read))
using (var reader = ZipReader.Open(stream, new SharpCompress.Readers.ReaderOptions { LeaveStreamOpen = true }))
{
    while (reader.MoveToNextEntry())
    {
        //Console.WriteLine(reader.Entry.Key);
    }
}
Console.WriteLine(sw.Elapsed);

It took about 13 minutes in my machine.
However, when this PR is applied, it was shortened to 10 seconds.
Does this cause problems?

I'm sorry for my poor English.
Thank you for reading.

diontools
@adamhathcock
Copy link
Owner

@adamhathcock adamhathcock commented Jan 2, 2018

Use the Archive API for random access. The Reader API intentionally doesn’t use Seek

@simmotech
Copy link

@simmotech simmotech commented Jan 2, 2018

The change conditionally uses CanSeek to perform a Skip much quicker where possible - why is this considered wrong?

@diontools
Copy link
Author

@diontools diontools commented Jan 2, 2018

Thank you for reply.
I understood Reader spec.

Archive API is very useful, but since Entry is cached, memory space is needed.
In my file it will be about 400 MB.
I'd like to reduce a little more.

I'm trying to extract some files from Zip.
I hope a way to filter Entry.
For example, it is like this.

using (var zip = ZipArchive.Open(stream, ...))
{
    foreach (var entry in zip.LoadEntries())
    {
        if (IsTargetFile(entry.Key))
        {
            entry.WriteToDirectory(...);
        }
    }
}

Current ZipArchive.LoadEntries is protected, but I think that it would be nice if there was an API similar to this.
Thank you for reading.

@adamhathcock
Copy link
Owner

@adamhathcock adamhathcock commented Jan 2, 2018

Fair enough. I shouldn’t be so rigid. Hopefully no streams throw when accessing CanSeek

@diontools
Copy link
Author

@diontools diontools commented Jan 4, 2018

Thank you so much. PR closing.

@diontools diontools closed this Jan 4, 2018
@adamhathcock adamhathcock reopened this Jan 4, 2018
@adamhathcock adamhathcock merged commit d8c8dab into adamhathcock:master Jan 10, 2018
1 check passed
1 check passed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.