Adjust CRC32 buffer size to 64 KB and use a buffering wrapper for IO #28

julik · 2017-11-04T15:42:07Z

Even though the size of the buffer is specific to the CRC32 implementation in zlib, the pattern of buffering writes is actually pretty common - and in the StreamCRC32 objects it is not very declarative. We reimplement it as an write proxy instead, which decouples the buffering stuff and makes it possible to use it in other scenarios as well.

This also adds a benchmark that proves, as correctly stated by @felixbuenemann that the 64KB buffer size is indeed the sweet spot as far as CRC32 is concerned (we intentionally perform 1-byte writes to get the slowest possible throughput and the smallest chunks). This will be beneficial to libraries like XSLX writers which are likely to be writing lots of small chunks in succession (as oposed to archive-from-file situations where IO.copy_stream can choose the right chunk size for us).

felixbuenemann · 2017-11-04T16:50:46Z

This is great work. I was thinking of doing a proper benchmark myself, but didn't yet get around to it.

The sweet spot for CPU intensive operations like this is often related to the L1 cache size, which is 32KB for data on the CPU I was testing on (Intel Core i7-5557U) and most modern desktop and server CPUs.

felixbuenemann · 2017-11-06T00:27:06Z

Btw. there's a typo "Single-bute" vs "Single-byte" in the pasted benchmark results of bench/buffered_crc32_bench.rb.

julik · 2017-11-06T00:27:57Z

There is. There is also a thing where we don't need to pre-generate this blob :-P that will be rectified

Also: numbers get suspicious again

felixbuenemann · 2017-11-11T17:52:58Z

lib/zip_tricks/streamer/stored_writer.rb

-    @io = io
-    @uncompressed_size = 0
-    @compressed_size = 0
+    @io = ZipTricks::WriteAndTell.new(io)
    @started_at = @io.tell


You have removed @started_at from #finish so it can be removed here as well as it isn't referenced anywhere else in the class.

This value is now maintained by WriteAndTell

julik added 7 commits November 2, 2017 20:31

Extract the write buffer into a wrapper object

3f849b4

Simplify and document

a2cceda

Add a benchmark, and 64KB buffer is indeed optimal

8ff2d54

Codify the constants at the right values

16d4533

Add a WriteBuffer spec

449ebda

Add benchmark-ips to dev deps

9367370

The usual Rubocop appeasements

2baf57b

julik requested a review from davidenko87 November 4, 2017 16:17

julik added 2 commits November 6, 2017 01:12

Remove misleading comment

57d9b2b

Unpack creates integers, we need bytes instead

350bf5f

julik added 3 commits November 6, 2017 01:43

When we say "byte" it should be a byte

ca4905e

And more Rubocop ofc

9c77093

Resolve merge conflict

8021e8a

Also: numbers get suspicious again

julik removed the request for review from davidenko87 November 11, 2017 14:42

felixbuenemann reviewed Nov 11, 2017

View reviewed changes

Remove unused offset variable

6a6eaef

This value is now maintained by WriteAndTell

julik merged commit 35c3046 into master Nov 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust CRC32 buffer size to 64 KB and use a buffering wrapper for IO #28

Adjust CRC32 buffer size to 64 KB and use a buffering wrapper for IO #28

julik commented Nov 4, 2017

felixbuenemann commented Nov 4, 2017

felixbuenemann commented Nov 6, 2017

julik commented Nov 6, 2017

felixbuenemann Nov 11, 2017

julik Nov 11, 2017

Adjust CRC32 buffer size to 64 KB and use a buffering wrapper for IO #28

Adjust CRC32 buffer size to 64 KB and use a buffering wrapper for IO #28

Conversation

julik commented Nov 4, 2017

felixbuenemann commented Nov 4, 2017

felixbuenemann commented Nov 6, 2017

julik commented Nov 6, 2017

felixbuenemann Nov 11, 2017

Choose a reason for hiding this comment

julik Nov 11, 2017

Choose a reason for hiding this comment