Skip to content
This repository has been archived by the owner on Mar 28, 2024. It is now read-only.

Adjust CRC32 buffer size to 64 KB and use a buffering wrapper for IO #28

Merged
merged 13 commits into from
Nov 12, 2017

Conversation

julik
Copy link
Contributor

@julik julik commented Nov 4, 2017

Even though the size of the buffer is specific to the CRC32 implementation in zlib, the pattern of buffering writes is actually pretty common - and in the StreamCRC32 objects it is not very declarative. We reimplement it as an write proxy instead, which decouples the buffering stuff and makes it possible to use it in other scenarios as well.

This also adds a benchmark that proves, as correctly stated by @felixbuenemann that the 64KB buffer size is indeed the sweet spot as far as CRC32 is concerned (we intentionally perform 1-byte writes to get the slowest possible throughput and the smallest chunks). This will be beneficial to libraries like XSLX writers which are likely to be writing lots of small chunks in succession (as oposed to archive-from-file situations where IO.copy_stream can choose the right chunk size for us).

@felixbuenemann
Copy link
Contributor

This is great work. I was thinking of doing a proper benchmark myself, but didn't yet get around to it.

The sweet spot for CPU intensive operations like this is often related to the L1 cache size, which is 32KB for data on the CPU I was testing on (Intel Core i7-5557U) and most modern desktop and server CPUs.

@felixbuenemann
Copy link
Contributor

Btw. there's a typo "Single-bute" vs "Single-byte" in the pasted benchmark results of bench/buffered_crc32_bench.rb.

@julik
Copy link
Contributor Author

julik commented Nov 6, 2017

There is. There is also a thing where we don't need to pre-generate this blob :-P that will be rectified

@julik julik removed the request for review from davidenko87 November 11, 2017 14:42
@io = io
@uncompressed_size = 0
@compressed_size = 0
@io = ZipTricks::WriteAndTell.new(io)
@started_at = @io.tell
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have removed @started_at from #finish so it can be removed here as well as it isn't referenced anywhere else in the class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

This value is now maintained by WriteAndTell
@julik julik merged commit 35c3046 into master Nov 12, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants