Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

elm/bytes support #7

Merged
merged 19 commits into from Oct 28, 2019
Merged

Conversation

folkertdev
Copy link
Contributor

@folkertdev folkertdev commented Oct 13, 2019

This PR adds support for the Bytes type from elm-bytes, and uses that package to implement the SHA1 logic. The end result is support for Bytes and roughly a 10X speedup for an input of 1Kb. Hopefully, this fixes both aspects of #5

I've tried to make small commits with descriptive messages. I'm also planning a discourse post with some more general lessons from this optimization.

This is a big PR with changes to the public api and to most of the business logic. Two dependencies were removed (elm-utf-tools and list-extra) and one added (danfishgold/base64-bytes). I really see this as a starting point and I'm happy to not rush this PR, give more context, and make further changes.

The advantage is that converting between State, DeltaState and Digest is now free (with --optimize)
adds the elm/bytes dependency, which needs a bump in the elm/core version.
also adds two helpers that are needed later: map16 and iterate
Use Bytes.Decode to read the 512-bit chunks that sha1 uses
This is the important one;

previously, an array was used to store the delta values. But, this array would grow to effectively duplicate the input size in memory.
Additionally, an array has overhead (get, push). And, only the last 16 values of the array were used.

So, we can inline those 16 values. It's not pretty, but seems 10X faster on a 1kb input size. That's worth it!
This one is weird, but `(a,b) = (1,2)` will allocate the `(1,2)` tuply, only to then immediately destructure.
That is fine normally, but in a tight loop (like we have here) that allocation is expensive. Gives a 20% speed increase
Something weird is going on with the tests allocating large ArrayBuffers all at once.
This commit runs the same test 10 times, and only one of those 10 times does it fail...
elm-utf-tools is no longer needed; elm/bytes has a built-in utf-8 encoder and decoder
list-extra is no longer needed
`trim` can be removed
wordToHex required that integers are unsigned; makes logic simpler
add some comments
There was a weird issue where the result of `SHA1.fromBytes` would be
non-deterministic when many large (200k elements) `Bytes` objects were
allocated at once. The length and sum were equal, the SHA1 result
somehow not.

A fix, for unclear reasons, is to introduce a `Decode.andThen` guarding
the decoder.
@TSFoster
Copy link
Owner

These both sound great. I’m not back at my computer for a couple of weeks, but I’ll check these out as soon as I can.

There is some weirdness with large Bytes values. I hope to find the cause, but until then splitting into smaller chunks fixes those issues.
This is not needed when `iterate` uses `Decode.loop`. A previous version
uses an implementation of `iterate` that allocated less. But it could
produce RangeErrors which result in failed decoding.

anyhow, it's not needed any more.
This is the more intuitive order. It doesn't matter if the same decoder is used 16 times
but this might be copied in the future, better have it be correct.
@TSFoster TSFoster changed the base branch from master to elm-bytes October 28, 2019 10:53
@TSFoster
Copy link
Owner

Thanks for your patience on this. I'm going to merge it now, and I think I'll publish a minor update first so as many as possible can benefit from the performance gains.

@TSFoster TSFoster merged commit b748d3c into TSFoster:elm-bytes Oct 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants