elm/bytes support #7

folkertdev · 2019-10-13T16:10:04Z

This PR adds support for the Bytes type from elm-bytes, and uses that package to implement the SHA1 logic. The end result is support for Bytes and roughly a 10X speedup for an input of 1Kb. Hopefully, this fixes both aspects of #5

I've tried to make small commits with descriptive messages. I'm also planning a discourse post with some more general lessons from this optimization.

This is a big PR with changes to the public api and to most of the business logic. Two dependencies were removed (elm-utf-tools and list-extra) and one added (danfishgold/base64-bytes). I really see this as a starting point and I'm happy to not rush this PR, give more context, and make further changes.

The advantage is that converting between State, DeltaState and Digest is now free (with --optimize)

adds the elm/bytes dependency, which needs a bump in the elm/core version. also adds two helpers that are needed later: map16 and iterate

Use Bytes.Decode to read the 512-bit chunks that sha1 uses

This is the important one; previously, an array was used to store the delta values. But, this array would grow to effectively duplicate the input size in memory. Additionally, an array has overhead (get, push). And, only the last 16 values of the array were used. So, we can inline those 16 values. It's not pretty, but seems 10X faster on a 1kb input size. That's worth it!

This one is weird, but `(a,b) = (1,2)` will allocate the `(1,2)` tuply, only to then immediately destructure. That is fine normally, but in a tight loop (like we have here) that allocation is expensive. Gives a 20% speed increase

Something weird is going on with the tests allocating large ArrayBuffers all at once. This commit runs the same test 10 times, and only one of those 10 times does it fail...

elm-utf-tools is no longer needed; elm/bytes has a built-in utf-8 encoder and decoder list-extra is no longer needed

`trim` can be removed wordToHex required that integers are unsigned; makes logic simpler add some comments

There was a weird issue where the result of `SHA1.fromBytes` would be non-deterministic when many large (200k elements) `Bytes` objects were allocated at once. The length and sum were equal, the SHA1 result somehow not. A fix, for unclear reasons, is to introduce a `Decode.andThen` guarding the decoder.

TSFoster · 2019-10-14T02:38:07Z

These both sound great. I’m not back at my computer for a couple of weeks, but I’ll check these out as soon as I can.

There is some weirdness with large Bytes values. I hope to find the cause, but until then splitting into smaller chunks fixes those issues.

This is not needed when `iterate` uses `Decode.loop`. A previous version uses an implementation of `iterate` that allocated less. But it could produce RangeErrors which result in failed decoding. anyhow, it's not needed any more.

This is the more intuitive order. It doesn't matter if the same decoder is used 16 times but this might be copied in the future, better have it be correct.

TSFoster · 2019-10-28T10:58:11Z

Thanks for your patience on this. I'm going to merge it now, and I think I'll publish a minor update first so as many as possible can benefit from the performance gains.

folkertdev added 15 commits October 11, 2019 16:10

wrap State and DeltaState

7aecbbe

The advantage is that converting between State, DeltaState and Digest is now free (with --optimize)

prepare for Bytes

238502d

adds the elm/bytes dependency, which needs a bump in the elm/core version. also adds two helpers that are needed later: map16 and iterate

add bytes-based chunker

f798c87

Use Bytes.Decode to read the 512-bit chunks that sha1 uses

remove tuple allocaton

0f96e5c

This one is weird, but `(a,b) = (1,2)` will allocate the `(1,2)` tuply, only to then immediately destructure. That is fine normally, but in a tight loop (like we have here) that allocation is expensive. Gives a 20% speed increase

optimize bit operations in calculateDigestDeltas

de92bcb

non-deterministic tests

18c5e33

Something weird is going on with the tests allocating large ArrayBuffers all at once. This commit runs the same test 10 times, and only one of those 10 times does it fail...

cleaning up unused code

a4d869a

remove unused dependencies

9184c6c

elm-utf-tools is no longer needed; elm/bytes has a built-in utf-8 encoder and decoder list-extra is no longer needed

use danfishgold/base64-bytes to clean up base64 encoding

1cd6076

use bitwise operator laws to remove operations in tight loop

83951a7

cleanup

08413ec

`trim` can be removed wordToHex required that integers are unsigned; makes logic simpler add some comments

change public api

776449a

fix typo

34cc665

folkertdev added 4 commits October 18, 2019 23:27

split large Bytes into smaller chunks

f25a7e0

There is some weirdness with large Bytes values. I hope to find the cause, but until then splitting into smaller chunks fixes those issues.

Remove bytes splitting

d5ba621

This is not needed when `iterate` uses `Decode.loop`. A previous version uses an implementation of `iterate` that allocated less. But it could produce RangeErrors which result in failed decoding. anyhow, it's not needed any more.

optimize conversion from byte values to Bytes

5aa98d5

flip order of map16

98dfee1

This is the more intuitive order. It doesn't matter if the same decoder is used 16 times but this might be copied in the future, better have it be correct.

TSFoster changed the base branch from master to elm-bytes October 28, 2019 10:53

TSFoster merged commit b748d3c into TSFoster:elm-bytes Oct 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

elm/bytes support #7

elm/bytes support #7

folkertdev commented Oct 13, 2019 •

edited

TSFoster commented Oct 14, 2019

TSFoster commented Oct 28, 2019

elm/bytes support #7

elm/bytes support #7

Conversation

folkertdev commented Oct 13, 2019 • edited

TSFoster commented Oct 14, 2019

TSFoster commented Oct 28, 2019

folkertdev commented Oct 13, 2019 •

edited