-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance #5
Comments
The benchmarks I posted on the issue in the flate2 issue were [these] from my deflate encoder. I simply replaced flate2 with flate2-oxide. (I don't think there is an easy way of using 2 versions of the same crate at the same time without renaming one of them.) I also have this project which I've used. We could add miniz-oxide to that, though for decompression it would probably be nice to alter it to do do several runs for each file and take the average as the noise tends to be a bit high. For adler32, there is the adler32 crate we could use. |
It turns out directly comparing with miniz-backed flate2 is a bit complicated, as the compiler doesn't like that there are two crates that link to 2 versions of a c library with the same name. Maybe we could hide the parts that still link to C code behind a feature. Though at least it's quite simple to compare to other pure rust libraries with the compression-tester binary by simply swapping flate2 with flate2-oxide. |
I faced the same issue when I was writing fuzzing: I solved the problem using objcopy, adding prefix c_ to every symbol in one version of miniz. See: https://github.com/Frommi/miniz_oxide/blob/master/build_fuzz.sh |
Ah, that's nice. Then we could do the same for performance comparisons. |
I managed to improve match copying in tinfl_oxide a bit. loading/storing u64 and other larger types didn't seem to help, but adding a special case for 3-length matches helped a little. Using a local variable as a loop counter for overlapping matches also helped a bit, so I think it might be worth making local copies of more values, like in the C code, so the compiler can put them in registers when it finds it beneficial. |
This loop improves performance noticeably as it is essentially a for-loop. But it reduces readability. The main path of |
Yeah, C code can jump to random places using switch and goto, something miniz and many C/C++ decoders makes use of. We need to find some way around that in Rust. |
We could change |
Yeah we could probably abuse loops a bit. |
Seems like we're still a bit behind on 32-bit (tested on old core (1) duo laptop):
|
Well, yes. We use 64bit buffers and even 64bit magic regardless of the machine's bitness. There are ifdef's in miniz to fallback into byte-by-byte style functions if it is on 32bits. |
I don't think decompression uses much 64-bit buffers so I'm not sure why the difference is so massive there. Compression is more as expected (actually expected it to be more of a difference given that we use some 64-bit stuff there. Also interesting is that using |
Added some loops and local vars. Halfway to the miniz decompression time. |
Nice work! Helped a fair bit on 32-bit machine too:
|
Actually, I just realized the Action enum is 64-bit (as status and state are 32-bit values.) I don't know whether this makes an impact on 32-bit or not. |
With my last commit, the decompression benchmark is now faster than miniz on my machine!
It also helped a fair bit on my rpi3 (still not quite as fast here though):
|
Have you looked at libdeflate? It has the fastest decompression that I've seen: https://quixdb.github.io/squash-benchmark/unstable |
This is somewhat off topic, but an impl period is looming (next monday), and I think miniz-oxide can be a perfect crate to contribute to: it should be possible to write comprehensive tests and benchmarks, so that I unfortunately don't have time to do the work required to make this crate super-friendly to contributors myself :( |
@jrmuizel I am aware of the library, it's probably worth looking into for potential ideas for improving performance further. Granted, libdeflate doesn't support streaming, so the use case is a bit different. @matklad Yeah, sounds like an idea, or at the very least, we could put in something in the looking for contributers thread on the forums. |
@oyvindln these are very cool results! I will have time in the near future, so I will be able to hop back and help. I have quite some history of changes to go through first though :) |
Last few commits improved decompression performance by another ~5% over miniz:
The main change are that decompress_fast no longer uses I have an idea for the next improvement though: right now there is up to 14 bytes to be read from input buffer per |
Neat. I added some extra checks to avoid some overflows/out of bounds issues that showed up in fuzzing and from people using this library after zip-rs adopted it as default. Some of it may have been redundant, though I was focused on fixing the bugs first. I was worried that it might have impacted performance, but it seems not. As for optimizing more, I guess you just have to try it out, it may help. |
Hey, so I'm writing a library that handles parsing some compressed XML files and noticed that
And 32% of branch mispredictions (there's a lot of branches in my code, too, but nothing else comes close in terms of proportion) |
I just recompiled my application w/ the flate2 "cloudflare_zlib" feature instead of the miniz_oxide backend, and it knocked nearly 30% off of the runtime. Possibly a good place to mine for optimizations. |
The system zlib backend was ~15% faster, but this is measuring my whole application including all the XML parsing I'm doing, so the actual differences are probably bigger. |
@oyvindln, can you please give a link for your performance tests?
The text was updated successfully, but these errors were encountered: