Optimize binary decoding of builtins #1810

sjakobi · 2020-05-24T22:25:31Z

Use decodeUtf8ByteArray to avoid UTF16-encoding the scrutinee.
Optimize the pattern matching by grouping the patterns by length.

GHC currently doesn't produce static length information for string
literals. Consequently the pattern matching worked somewhat like this:
```
s <- decodeString

let len_s = length s

if len_s == length "Natural/build" && sameBytes s "Natural/build"
    then return NaturalBuild
    else if len_s == length "Natural/fold" && sameBytes s "Natural/fold"
             ...
```
Decoding Sort, the most extreme case, would involve a total of 32
conditional jumps as a consequence of length comparisons alone.

Judging by the Core, we can get that number down to 8 by grouping
the patterns by length: One to check the length of the decoded string,
and (unfortunately) still one each for the 7 candidate literals of
length 4.

The number of string content comparisons should be unchanged.

The result of these optimizations is that the time to decode the cache for cpkg
is reduced by 7-9%. Decoding time for the Prelude goes down by 13-16%.

This also changes the builtin encoding to use encodeUtf8ByteArray in order
to avoid UTF16-encoding and decoding the builtins strings. I didn't check
the performance implications though.

Context: #1804.

* Use decodeUtf8ByteArray to avoid UTF16-encoding the scrutinee. * Optimize the pattern matching by grouping the patterns by length. GHC currently doesn't produce static length information for string literals. Consequently the pattern matching worked somewhat like this: s <- decodeString let len_s = length s if len_s == length "Natural/build" && sameBytes s "Natural/build" then return NaturalBuild else if len_s == length "Natural/fold" && sameBytes s "Natural/fold" ... Decoding `Sort`, the most extreme case, would involve a total of 32 conditional jumps as a consequence of length comparisons alone. Judging by the Core, we can get that number down to 8 by grouping the patterns by length: One to check the length of the decoded string, and (unfortunately) still one each for the 7 candidate literals of length 4. The number of string content comparisons should be unchanged. The result of these optimizations is that the time to decode the cache for cpkg is reduced by 7-9%. Decoding time for the Prelude goes down by 13-16%. This also changes the builtin encoding to use encodeUtf8ByteArray in order to avoid UTF16-encoding and decoding the builtins strings. I didn't check the performance implications though. Context: #1804.

sjakobi · 2020-05-24T22:33:14Z

Judging by the Core, we can get that number down to 8 by grouping
the patterns by length: One to check the length of the decoded string,
and (unfortunately) still one each for the 7 candidate literals of
length 4.

These completely unnecessary length checks on the literals come from ShortByteString's Eq instance. To avoid them, we could try using the compareByteArrays# primop instead. primitive has a nice backward-compatible wrapper for it, but it's unfortunately not currently exported: haskell/primitive#131

Gabriella439

Very nice work! 🙂

Gabriella439 approved these changes May 24, 2020

View reviewed changes

sjakobi added the merge me label May 24, 2020

sjakobi mentioned this pull request May 24, 2020

Compute length at compile time for literal strings haskell/bytestring#191

Merged

Merge branch 'master' into sjakobi/builtins-binary

1d7273a

mergify bot merged commit 93313dc into master May 25, 2020

mergify bot deleted the sjakobi/builtins-binary branch May 25, 2020 00:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize binary decoding of builtins #1810

Optimize binary decoding of builtins #1810

sjakobi commented May 24, 2020

sjakobi commented May 24, 2020

Gabriella439 left a comment

Optimize binary decoding of builtins #1810

Optimize binary decoding of builtins #1810

Conversation

sjakobi commented May 24, 2020

sjakobi commented May 24, 2020

Gabriella439 left a comment

Choose a reason for hiding this comment