Allow building `ByteString` and `Text` #4

oberblastmeister · 2023-02-10T01:06:19Z

The builder in ByteString is slow, and it would be really nice if we could use this library to build ByteString. We could do this by adding a phantom type parameter to Buffer (and Builder). Creating a Builder from a ByteString will force the parameter to ByteString, to ensure that we cannot create a Text from it. Creating a Builder from text will be polymorphic because we can combine it with ByteString. Also, the Builder closures can check for pinned-ness when reallocating the buffer. I think this would be nice for code resuse because we can use the same operations for both Text and ByteString. For example

(|>.) ∷ Buffer s ⊸ Char → Buffer s

The text was updated successfully, but these errors were encountered:

Bodigrim · 2023-02-11T17:22:04Z

Do you even need a phantom parameter? If you modify appendBounded / prependBounded to maintain pinnedness of underlying byte array, you can have runBuffer ∷ (Buffer ⊸ Buffer) ⊸ Text, supplying an unpinned array to the callback, and runBufferBS ∷ (Buffer ⊸ Buffer) ⊸ ByteString, supplying a pinned array.

oberblastmeister · 2023-02-11T22:11:02Z

You need a phantom type parameter, because otherwise you could do

invalidText = runBufferText $ fromText "adsfadsfadf" <> fromByteString invalidUtf8Sequence

Bodigrim · 2023-02-11T23:16:33Z

I don't see why invalidText would be invalid. Both fromText "adsfadsfadf" and fromByteString invalidUtf8Sequence are functions Buffer ⊸ Buffer. So if we compose them and apply to an unpinned array, we obtain a Text. If we apply their composition to a pinned array, it's a valid ByteString.

Ah, I got your point about invalidUtf8Sequence.

Bodigrim · 2023-02-12T10:56:14Z

@oberblastmeister I think these are actually orthogonal features.

One is to support ByteString as an output type. This could be achieved without any breakage to existing type signatures, PR is welcome.

Another is to support invalid UTF-8 ByteString as an input source. One can already supply ByteString as an input via fromAddr, but the documentation is explicit that the input should be a valid UTF-8 without \NUL. Is the use case for arbitrary binary data significant enough to justify a breaking change to Buffer type? I'm not quite convinced at the moment.

oberblastmeister · 2023-02-12T14:51:41Z

I don't think we have to break existing type signatures. We could add a new generic module, something like Data.Builder.Linear and do

type Builder = Builder.Linear.Builder Text

All functions in Data.Text.Builder.Linear would be specialized versions of functions from Data.Builder.Linear.

Bodigrim · 2023-02-12T18:26:49Z

Potentially. This is still a significant amount of work and future maintenance, which I'm not ready to commit to at the moment. I would like to see runBufferBS implemented and benchmarked against ByteString native builders first.

Bodigrim · 2023-03-02T23:22:38Z

I've got a spare moment and implemented runBufferBS, but have not benchmarked it against ByteString native builders.

oberblastmeister · 2023-03-03T00:00:20Z

Other ByteString builders are probably faster because they unbox the state and also don't support prepending, which removes an extra word from the buffer. Other than that, it should be pretty fast.

Bodigrim · 2023-03-03T23:18:08Z

I've added benchmarks. It's kinda inconclusive, ByteString is often on par or slightly faster unless the output gets extremely long. It's probably still a net-positive, by virtue of providing a unifed interface.

Bodigrim · 2023-03-06T21:03:22Z

Updated README with benchmarking results.

Bodigrim · 2023-03-12T10:32:23Z

Sigh, there is a significant performance regression between 0.1 and master, which I can only attribute to this change:

All
  Text
    1
      Data.Text.Builder.Linear: 
        35.1 ns ± 1.8 ns, 19% more than baseline
    10
      Data.Text.Builder.Linear: 
        196  ns ±  13 ns, 19% more than baseline
    100
      Data.Text.Builder.Linear:
        1.66 μs ± 103 ns, 18% more than baseline
    1000
      Data.Text.Builder.Linear: 
        14.9 μs ± 837 ns, 16% more than baseline
    10000
      Data.Text.Builder.Linear:
        154  μs ±  15 μs, 16% more than baseline
    100000
      Data.Text.Builder.Linear: 
        2.59 ms ± 219 μs, 17% more than baseline
    1000000
      Data.Text.Builder.Linear:
        16.6 ms ± 1.6 ms, 28% less than baseline

oberblastmeister · 2023-03-12T12:58:56Z

I tested this on ghc 9.2.5 and got the same result. If you check core, the change in performance is not due to an addition of one extra if branch, but because ghc does not inline (|>) and (<|). I would argue that this is the correct behavior, as we don't want to inline those functions as they are big. For some reason, ghc does only does a worker wrapper transformation on the arguments of (|>), and not the return. So in the benchmark,
you are paying for extra function calls, and an allocation of a constructor every time you call (|>) and (<|). I'm pretty sure that ghc should be unboxing the return, but we can also manually unbox it.

Bodigrim · 2023-03-12T23:37:28Z

appendBounded maxSrcLen appender (Buffer (Text dst dstOff dstLen)) = Buffer $ runST $ do
  let dstFullLen = sizeofByteArray dst
      newFullLen = dstOff + 2 * (dstLen + maxSrcLen)
  dstM <- unsafeThaw dst
  newM <- if dstOff + dstLen + maxSrcLen <= dstFullLen
    then pure dstM
    else A.resizeM dstM newFullLen
  srcLen ← appender newM (dstOff + dstLen)
  new ← A.unsafeFreeze newM
  pure $ Text new dstOff (dstLen + srcLen)

becomes

appendBounded
  :: Int
     -> (forall s. MArray s -> Int -> ST s Int) -> Buffer %1 -> Buffer
appendBounded
  = \ (maxSrcLen_a19J :: Int)
      (appender_a19K :: forall s. MArray s -> Int -> ST s Int)
      (ds_d1Kx :: Buffer) ->
      case maxSrcLen_a19J of { I# ipv_s2jz ->
      case ds_d1Kx of { Buffer bx_d1MY bx1_d1MZ bx2_d1N0 ->
      runRW#
        (\ (s_a1Og :: State# RealWorld) ->
           case <=#
                  (+# (+# bx1_d1MZ bx2_d1N0) ipv_s2jz) (sizeofByteArray# bx_d1MY)
           of {
             __DEFAULT ->
               case isByteArrayPinned# bx_d1MY of {
                 __DEFAULT ->
                   case newPinnedByteArray#
                          (+# bx1_d1MZ (*# 2# (+# bx2_d1N0 ipv_s2jz))) s_a1Og
                   of
                   { (# ipv1_a1OP, ipv2_a1OQ #) ->
                   case copyByteArray#
                          bx_d1MY bx1_d1MZ ipv2_a1OQ bx1_d1MZ bx2_d1N0 ipv1_a1OP
                   of s2#_a2jf
                   { __DEFAULT ->
                   case ((appender_a19K
                            (MutableByteArray ipv2_a1OQ) (I# (+# bx1_d1MZ bx2_d1N0)))
                         `cast` <Co:3> :: ...)
                          s2#_a2jf
                   of
                   { (# ipv3_X4, ipv4_X5 #) ->
                   case unsafeFreezeByteArray# ipv2_a1OQ ipv3_X4 of
                   { (# ipv5_a1P0, ipv6_a1P1 #) ->
                   case ipv4_X5 of { I# y_a2ia ->
                   Buffer ipv6_a1P1 bx1_d1MZ (+# bx2_d1N0 y_a2ia)
                   }
                   }
                   }
                   }
                   };
                 0# ->
                   case newByteArray#
                          (+# bx1_d1MZ (*# 2# (+# bx2_d1N0 ipv_s2jz))) s_a1Og
                   of
                   { (# ipv1_a2iF, ipv2_a2iG #) ->
                   case copyByteArray#
                          bx_d1MY bx1_d1MZ ipv2_a2iG bx1_d1MZ bx2_d1N0 ipv1_a2iF
                   of s2#_a2jf
                   { __DEFAULT ->
                   case ((appender_a19K
                            (MutableByteArray ipv2_a2iG) (I# (+# bx1_d1MZ bx2_d1N0)))
                         `cast` <Co:3> :: ...)
                          s2#_a2jf
                   of
                   { (# ipv3_X4, ipv4_X5 #) ->
                   case unsafeFreezeByteArray# ipv2_a2iG ipv3_X4 of
                   { (# ipv5_a1P0, ipv6_a1P1 #) ->
                   case ipv4_X5 of { I# y_a2ia ->
                   Buffer ipv6_a1P1 bx1_d1MZ (+# bx2_d1N0 y_a2ia)
                   }
                   }
                   }
                   }
                   }
               };
             1# ->
               case ((appender_a19K
                        (case unsafeEqualityProof of { UnsafeRefl v2_a1NZ ->
                         MutableByteArray (bx_d1MY `cast` <Co:2> :: ...)
                         })
                        (I# (+# bx1_d1MZ bx2_d1N0)))
                     `cast` <Co:3> :: ...)
                      s_a1Og
               of
               { (# ipv1_X4, ipv2_X5 #) ->
               case unsafeEqualityProof of { UnsafeRefl v2_a1NZ ->
               case unsafeFreezeByteArray# (bx_d1MY `cast` <Co:2> :: ...) ipv1_X4
               of
               { (# ipv3_a1P0, ipv4_a1P1 #) ->
               case ipv2_X5 of { I# y_a2ia ->
               Buffer ipv4_a1P1 bx1_d1MZ (+# bx2_d1N0 y_a2ia)
               }
               }
               }
               }
           })
      }
      }

It's very stupid that we have duplicating branches case newPinnedByteArray# ... and case newByteArray# .... Any ideas how to avoid it? I tried some tricks, but GHC insists to inline cases.

I'm pretty sure that ghc should be unboxing the return, but we can also manually unbox it.

How would you do this?

Bodigrim · 2023-03-15T01:28:23Z

I raised a GHC issue to track this: https://gitlab.haskell.org/ghc/ghc/-/issues/23122

oberblastmeister · 2023-03-15T23:14:33Z

You could unbox the buffer

newtype Buffer# = Buffer# (# Int#, Int#, ByteArray# #)

data Buffer = Buffer Buffer#

Then you can do manual worker wrapper transformation.

Though I think we can just leave it unboxed, because the Buffer is already unlifted and you can't use it with normal polymorphic functions anyway.

Bodigrim · 2023-03-22T22:03:36Z

I went ahead with a release as is. Thanks for the idea!

Bodigrim closed this as completed Mar 22, 2023

oberblastmeister mentioned this issue Mar 25, 2023

Unbox buffer #5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow building `ByteString` and `Text` #4

Allow building `ByteString` and `Text` #4

oberblastmeister commented Feb 10, 2023

Bodigrim commented Feb 11, 2023 •

edited

Loading

oberblastmeister commented Feb 11, 2023

Bodigrim commented Feb 11, 2023 •

edited

Loading

Bodigrim commented Feb 12, 2023

oberblastmeister commented Feb 12, 2023

Bodigrim commented Feb 12, 2023

Bodigrim commented Mar 2, 2023 •

edited

Loading

oberblastmeister commented Mar 3, 2023

Bodigrim commented Mar 3, 2023

Bodigrim commented Mar 6, 2023

Bodigrim commented Mar 12, 2023

oberblastmeister commented Mar 12, 2023

Bodigrim commented Mar 12, 2023 •

edited

Loading

Bodigrim commented Mar 15, 2023

oberblastmeister commented Mar 15, 2023

Bodigrim commented Mar 22, 2023

Allow building ByteString and Text #4

Allow building ByteString and Text #4

Comments

oberblastmeister commented Feb 10, 2023

Bodigrim commented Feb 11, 2023 • edited Loading

oberblastmeister commented Feb 11, 2023

Bodigrim commented Feb 11, 2023 • edited Loading

Bodigrim commented Feb 12, 2023

oberblastmeister commented Feb 12, 2023

Bodigrim commented Feb 12, 2023

Bodigrim commented Mar 2, 2023 • edited Loading

oberblastmeister commented Mar 3, 2023

Bodigrim commented Mar 3, 2023

Bodigrim commented Mar 6, 2023

Bodigrim commented Mar 12, 2023

oberblastmeister commented Mar 12, 2023

Bodigrim commented Mar 12, 2023 • edited Loading

Bodigrim commented Mar 15, 2023

oberblastmeister commented Mar 15, 2023

Bodigrim commented Mar 22, 2023

Allow building `ByteString` and `Text` #4

Allow building `ByteString` and `Text` #4

Bodigrim commented Feb 11, 2023 •

edited

Loading

Bodigrim commented Feb 11, 2023 •

edited

Loading

Bodigrim commented Mar 2, 2023 •

edited

Loading

Bodigrim commented Mar 12, 2023 •

edited

Loading