Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement pooling of decompressed folder readers #23

Merged
merged 13 commits into from
May 10, 2022
Merged

Implement pooling of decompressed folder readers #23

merged 13 commits into from
May 10, 2022

Conversation

bodgit
Copy link
Owner

@bodgit bodgit commented May 2, 2022

Assuming the common use case:

r, err := sevenzip.OpenReader("archive.7z")
if err != nil {        
        panic(err)
}
defer r.Close()   
                                                           
for _, f := range r.File {
        rc, err := f.Open()
        if err != nil {
                panic(err)
        }
        ...
        rc.Close()                                                                                                                                                                                                            
}

r.Close()

Comparing performance using the noopPool implementation (which doesn't pool so it's identical to the original behaviour) and the pool implementation:

$ benchstat old.txt new.txt 
name        old time/op  new time/op  delta
Bzip2-12    10.6ms ± 1%   2.3ms ± 1%  -78.54%  (p=0.008 n=5+5)
Copy-12     1.18ms ± 5%  1.19ms ± 2%     ~     (p=0.310 n=5+5)
Deflate-12  2.15ms ± 0%  1.22ms ± 2%  -43.43%  (p=0.008 n=5+5)
Delta-12    1.43ms ± 1%  1.16ms ± 0%  -19.03%  (p=0.008 n=5+5)
LZMA-12     15.2ms ± 2%   2.5ms ± 1%  -83.90%  (p=0.008 n=5+5)
LZMA2-12    7.88ms ± 1%  1.78ms ± 1%  -77.46%  (p=0.008 n=5+5)
BCJ2-12     3.73ms ± 1%  1.36ms ± 2%  -63.62%  (p=0.016 n=4+5)
Complex-12   35.4s ± 0%    0.2s ± 1%  -99.49%  (p=0.016 n=4+5)

The complex benchmark uses the 7-Zip/LZMA SDK archive which contains 633 files using a mix of compression methods. Reading this entire archive goes from taking 35.4 seconds to just 0.2 seconds!

The copy decompressor is the only one that doesn't improve (unsurprisingly) as it doesn't transform the bytes in any way.

Fixes #22

bodgit added 6 commits May 2, 2022 18:39
* noopPool doesn't store anything and automatically closes the passed
  SizeReadSeekCloser.
* pool maintains a fixed number of SizeReadSeekCloser's based on their
  offset using an LRU strategy.
Implement Size and Seek methods. Seek can only seek forward in the file.
When calling Open on a File, the associated pool is used to find an
existing SizeReadSeekCloser for the underlying stream.

The Close method attempts to put the underlying SizeReadSeekCloser back in
the pool if it's not yet at the end of the stream.
Contains lots of files with a mix of compression methods.
@bodgit bodgit added the enhancement New feature or request label May 2, 2022
@bodgit bodgit self-assigned this May 2, 2022
bodgit added 6 commits May 2, 2022 20:45
Also fix cyclop, funlen, gochecknoglobals, gochecknoinits, gocognit &
gocyclo lint warnings.
Not sure it was doing anything useful. Make the passed reader implement
io.ByteReader.
@bodgit bodgit merged commit 629e548 into master May 10, 2022
@bodgit bodgit deleted the speedup branch May 10, 2022 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve performance
1 participant