Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow building ByteString and Text #4

Closed
oberblastmeister opened this issue Feb 10, 2023 · 16 comments
Closed

Allow building ByteString and Text #4

oberblastmeister opened this issue Feb 10, 2023 · 16 comments

Comments

@oberblastmeister
Copy link

The builder in ByteString is slow, and it would be really nice if we could use this library to build ByteString. We could do this by adding a phantom type parameter to Buffer (and Builder). Creating a Builder from a ByteString will force the parameter to ByteString, to ensure that we cannot create a Text from it. Creating a Builder from text will be polymorphic because we can combine it with ByteString. Also, the Builder closures can check for pinned-ness when reallocating the buffer. I think this would be nice for code resuse because we can use the same operations for both Text and ByteString. For example

(|>.)  Buffer s  Char  Buffer s
@Bodigrim
Copy link
Owner

Bodigrim commented Feb 11, 2023

Do you even need a phantom parameter? If you modify appendBounded / prependBounded to maintain pinnedness of underlying byte array, you can have runBuffer ∷ (Buffer ⊸ Buffer) ⊸ Text, supplying an unpinned array to the callback, and runBufferBS ∷ (Buffer ⊸ Buffer) ⊸ ByteString, supplying a pinned array.

@oberblastmeister
Copy link
Author

You need a phantom type parameter, because otherwise you could do

invalidText = runBufferText $ fromText "adsfadsfadf" <> fromByteString invalidUtf8Sequence

@Bodigrim
Copy link
Owner

Bodigrim commented Feb 11, 2023

I don't see why invalidText would be invalid. Both fromText "adsfadsfadf" and fromByteString invalidUtf8Sequence are functions Buffer ⊸ Buffer. So if we compose them and apply to an unpinned array, we obtain a Text. If we apply their composition to a pinned array, it's a valid ByteString.

Ah, I got your point about invalidUtf8Sequence.

@Bodigrim
Copy link
Owner

@oberblastmeister I think these are actually orthogonal features.

One is to support ByteString as an output type. This could be achieved without any breakage to existing type signatures, PR is welcome.

Another is to support invalid UTF-8 ByteString as an input source. One can already supply ByteString as an input via fromAddr, but the documentation is explicit that the input should be a valid UTF-8 without \NUL. Is the use case for arbitrary binary data significant enough to justify a breaking change to Buffer type? I'm not quite convinced at the moment.

@oberblastmeister
Copy link
Author

I don't think we have to break existing type signatures. We could add a new generic module, something like Data.Builder.Linear and do

type Builder = Builder.Linear.Builder Text

All functions in Data.Text.Builder.Linear would be specialized versions of functions from Data.Builder.Linear.

@Bodigrim
Copy link
Owner

Potentially. This is still a significant amount of work and future maintenance, which I'm not ready to commit to at the moment. I would like to see runBufferBS implemented and benchmarked against ByteString native builders first.

@Bodigrim
Copy link
Owner

Bodigrim commented Mar 2, 2023

I've got a spare moment and implemented runBufferBS, but have not benchmarked it against ByteString native builders.

@oberblastmeister
Copy link
Author

Other ByteString builders are probably faster because they unbox the state and also don't support prepending, which removes an extra word from the buffer. Other than that, it should be pretty fast.

@Bodigrim
Copy link
Owner

Bodigrim commented Mar 3, 2023

I've added benchmarks. It's kinda inconclusive, ByteString is often on par or slightly faster unless the output gets extremely long. It's probably still a net-positive, by virtue of providing a unifed interface.

@Bodigrim
Copy link
Owner

Bodigrim commented Mar 6, 2023

Updated README with benchmarking results.

@Bodigrim
Copy link
Owner

Sigh, there is a significant performance regression between 0.1 and master, which I can only attribute to this change:

All
  Text
    1
      Data.Text.Builder.Linear: 
        35.1 ns ± 1.8 ns, 19% more than baseline
    10
      Data.Text.Builder.Linear: 
        196  ns ±  13 ns, 19% more than baseline
    100
      Data.Text.Builder.Linear:
        1.66 μs ± 103 ns, 18% more than baseline
    1000
      Data.Text.Builder.Linear: 
        14.9 μs ± 837 ns, 16% more than baseline
    10000
      Data.Text.Builder.Linear:
        154  μs ±  15 μs, 16% more than baseline
    100000
      Data.Text.Builder.Linear: 
        2.59 ms ± 219 μs, 17% more than baseline
    1000000
      Data.Text.Builder.Linear:
        16.6 ms ± 1.6 ms, 28% less than baseline

@oberblastmeister
Copy link
Author

I tested this on ghc 9.2.5 and got the same result. If you check core, the change in performance is not due to an addition of one extra if branch, but because ghc does not inline (|>) and (<|). I would argue that this is the correct behavior, as we don't want to inline those functions as they are big. For some reason, ghc does only does a worker wrapper transformation on the arguments of (|>), and not the return. So in the benchmark,
you are paying for extra function calls, and an allocation of a constructor every time you call (|>) and (<|). I'm pretty sure that ghc should be unboxing the return, but we can also manually unbox it.

@Bodigrim
Copy link
Owner

Bodigrim commented Mar 12, 2023

appendBounded maxSrcLen appender (Buffer (Text dst dstOff dstLen)) = Buffer $ runST $ do
  let dstFullLen = sizeofByteArray dst
      newFullLen = dstOff + 2 * (dstLen + maxSrcLen)
  dstM <- unsafeThaw dst
  newM <- if dstOff + dstLen + maxSrcLen <= dstFullLen
    then pure dstM
    else A.resizeM dstM newFullLen
  srcLen  appender newM (dstOff + dstLen)
  new  A.unsafeFreeze newM
  pure $ Text new dstOff (dstLen + srcLen)

becomes

appendBounded
  :: Int
     -> (forall s. MArray s -> Int -> ST s Int) -> Buffer %1 -> Buffer
appendBounded
  = \ (maxSrcLen_a19J :: Int)
      (appender_a19K :: forall s. MArray s -> Int -> ST s Int)
      (ds_d1Kx :: Buffer) ->
      case maxSrcLen_a19J of { I# ipv_s2jz ->
      case ds_d1Kx of { Buffer bx_d1MY bx1_d1MZ bx2_d1N0 ->
      runRW#
        (\ (s_a1Og :: State# RealWorld) ->
           case <=#
                  (+# (+# bx1_d1MZ bx2_d1N0) ipv_s2jz) (sizeofByteArray# bx_d1MY)
           of {
             __DEFAULT ->
               case isByteArrayPinned# bx_d1MY of {
                 __DEFAULT ->
                   case newPinnedByteArray#
                          (+# bx1_d1MZ (*# 2# (+# bx2_d1N0 ipv_s2jz))) s_a1Og
                   of
                   { (# ipv1_a1OP, ipv2_a1OQ #) ->
                   case copyByteArray#
                          bx_d1MY bx1_d1MZ ipv2_a1OQ bx1_d1MZ bx2_d1N0 ipv1_a1OP
                   of s2#_a2jf
                   { __DEFAULT ->
                   case ((appender_a19K
                            (MutableByteArray ipv2_a1OQ) (I# (+# bx1_d1MZ bx2_d1N0)))
                         `cast` <Co:3> :: ...)
                          s2#_a2jf
                   of
                   { (# ipv3_X4, ipv4_X5 #) ->
                   case unsafeFreezeByteArray# ipv2_a1OQ ipv3_X4 of
                   { (# ipv5_a1P0, ipv6_a1P1 #) ->
                   case ipv4_X5 of { I# y_a2ia ->
                   Buffer ipv6_a1P1 bx1_d1MZ (+# bx2_d1N0 y_a2ia)
                   }
                   }
                   }
                   }
                   };
                 0# ->
                   case newByteArray#
                          (+# bx1_d1MZ (*# 2# (+# bx2_d1N0 ipv_s2jz))) s_a1Og
                   of
                   { (# ipv1_a2iF, ipv2_a2iG #) ->
                   case copyByteArray#
                          bx_d1MY bx1_d1MZ ipv2_a2iG bx1_d1MZ bx2_d1N0 ipv1_a2iF
                   of s2#_a2jf
                   { __DEFAULT ->
                   case ((appender_a19K
                            (MutableByteArray ipv2_a2iG) (I# (+# bx1_d1MZ bx2_d1N0)))
                         `cast` <Co:3> :: ...)
                          s2#_a2jf
                   of
                   { (# ipv3_X4, ipv4_X5 #) ->
                   case unsafeFreezeByteArray# ipv2_a2iG ipv3_X4 of
                   { (# ipv5_a1P0, ipv6_a1P1 #) ->
                   case ipv4_X5 of { I# y_a2ia ->
                   Buffer ipv6_a1P1 bx1_d1MZ (+# bx2_d1N0 y_a2ia)
                   }
                   }
                   }
                   }
                   }
               };
             1# ->
               case ((appender_a19K
                        (case unsafeEqualityProof of { UnsafeRefl v2_a1NZ ->
                         MutableByteArray (bx_d1MY `cast` <Co:2> :: ...)
                         })
                        (I# (+# bx1_d1MZ bx2_d1N0)))
                     `cast` <Co:3> :: ...)
                      s_a1Og
               of
               { (# ipv1_X4, ipv2_X5 #) ->
               case unsafeEqualityProof of { UnsafeRefl v2_a1NZ ->
               case unsafeFreezeByteArray# (bx_d1MY `cast` <Co:2> :: ...) ipv1_X4
               of
               { (# ipv3_a1P0, ipv4_a1P1 #) ->
               case ipv2_X5 of { I# y_a2ia ->
               Buffer ipv4_a1P1 bx1_d1MZ (+# bx2_d1N0 y_a2ia)
               }
               }
               }
               }
           })
      }
      }

It's very stupid that we have duplicating branches case newPinnedByteArray# ... and case newByteArray# .... Any ideas how to avoid it? I tried some tricks, but GHC insists to inline cases.

I'm pretty sure that ghc should be unboxing the return, but we can also manually unbox it.

How would you do this?

@Bodigrim
Copy link
Owner

I raised a GHC issue to track this: https://gitlab.haskell.org/ghc/ghc/-/issues/23122

@oberblastmeister
Copy link
Author

You could unbox the buffer

newtype Buffer# = Buffer# (# Int#, Int#, ByteArray# #)

data Buffer = Buffer Buffer#

Then you can do manual worker wrapper transformation.

Though I think we can just leave it unboxed, because the Buffer is already unlifted and you can't use it with normal polymorphic functions anyway.

@Bodigrim
Copy link
Owner

I went ahead with a release as is. Thanks for the idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants