-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow building ByteString
and Text
#4
Comments
Do you even need a phantom parameter? If you modify |
You need a phantom type parameter, because otherwise you could do invalidText = runBufferText $ fromText "adsfadsfadf" <> fromByteString invalidUtf8Sequence |
Ah, I got your point about |
@oberblastmeister I think these are actually orthogonal features. One is to support Another is to support invalid UTF-8 |
I don't think we have to break existing type signatures. We could add a new generic module, something like type Builder = Builder.Linear.Builder Text All functions in |
Potentially. This is still a significant amount of work and future maintenance, which I'm not ready to commit to at the moment. I would like to see |
I've got a spare moment and implemented |
Other |
I've added benchmarks. It's kinda inconclusive, |
Updated |
Sigh, there is a significant performance regression between
|
I tested this on ghc 9.2.5 and got the same result. If you check core, the change in performance is not due to an addition of one extra if branch, but because ghc does not inline |
appendBounded maxSrcLen appender (Buffer (Text dst dstOff dstLen)) = Buffer $ runST $ do
let dstFullLen = sizeofByteArray dst
newFullLen = dstOff + 2 * (dstLen + maxSrcLen)
dstM <- unsafeThaw dst
newM <- if dstOff + dstLen + maxSrcLen <= dstFullLen
then pure dstM
else A.resizeM dstM newFullLen
srcLen ← appender newM (dstOff + dstLen)
new ← A.unsafeFreeze newM
pure $ Text new dstOff (dstLen + srcLen) becomes appendBounded
:: Int
-> (forall s. MArray s -> Int -> ST s Int) -> Buffer %1 -> Buffer
appendBounded
= \ (maxSrcLen_a19J :: Int)
(appender_a19K :: forall s. MArray s -> Int -> ST s Int)
(ds_d1Kx :: Buffer) ->
case maxSrcLen_a19J of { I# ipv_s2jz ->
case ds_d1Kx of { Buffer bx_d1MY bx1_d1MZ bx2_d1N0 ->
runRW#
(\ (s_a1Og :: State# RealWorld) ->
case <=#
(+# (+# bx1_d1MZ bx2_d1N0) ipv_s2jz) (sizeofByteArray# bx_d1MY)
of {
__DEFAULT ->
case isByteArrayPinned# bx_d1MY of {
__DEFAULT ->
case newPinnedByteArray#
(+# bx1_d1MZ (*# 2# (+# bx2_d1N0 ipv_s2jz))) s_a1Og
of
{ (# ipv1_a1OP, ipv2_a1OQ #) ->
case copyByteArray#
bx_d1MY bx1_d1MZ ipv2_a1OQ bx1_d1MZ bx2_d1N0 ipv1_a1OP
of s2#_a2jf
{ __DEFAULT ->
case ((appender_a19K
(MutableByteArray ipv2_a1OQ) (I# (+# bx1_d1MZ bx2_d1N0)))
`cast` <Co:3> :: ...)
s2#_a2jf
of
{ (# ipv3_X4, ipv4_X5 #) ->
case unsafeFreezeByteArray# ipv2_a1OQ ipv3_X4 of
{ (# ipv5_a1P0, ipv6_a1P1 #) ->
case ipv4_X5 of { I# y_a2ia ->
Buffer ipv6_a1P1 bx1_d1MZ (+# bx2_d1N0 y_a2ia)
}
}
}
}
};
0# ->
case newByteArray#
(+# bx1_d1MZ (*# 2# (+# bx2_d1N0 ipv_s2jz))) s_a1Og
of
{ (# ipv1_a2iF, ipv2_a2iG #) ->
case copyByteArray#
bx_d1MY bx1_d1MZ ipv2_a2iG bx1_d1MZ bx2_d1N0 ipv1_a2iF
of s2#_a2jf
{ __DEFAULT ->
case ((appender_a19K
(MutableByteArray ipv2_a2iG) (I# (+# bx1_d1MZ bx2_d1N0)))
`cast` <Co:3> :: ...)
s2#_a2jf
of
{ (# ipv3_X4, ipv4_X5 #) ->
case unsafeFreezeByteArray# ipv2_a2iG ipv3_X4 of
{ (# ipv5_a1P0, ipv6_a1P1 #) ->
case ipv4_X5 of { I# y_a2ia ->
Buffer ipv6_a1P1 bx1_d1MZ (+# bx2_d1N0 y_a2ia)
}
}
}
}
}
};
1# ->
case ((appender_a19K
(case unsafeEqualityProof of { UnsafeRefl v2_a1NZ ->
MutableByteArray (bx_d1MY `cast` <Co:2> :: ...)
})
(I# (+# bx1_d1MZ bx2_d1N0)))
`cast` <Co:3> :: ...)
s_a1Og
of
{ (# ipv1_X4, ipv2_X5 #) ->
case unsafeEqualityProof of { UnsafeRefl v2_a1NZ ->
case unsafeFreezeByteArray# (bx_d1MY `cast` <Co:2> :: ...) ipv1_X4
of
{ (# ipv3_a1P0, ipv4_a1P1 #) ->
case ipv2_X5 of { I# y_a2ia ->
Buffer ipv4_a1P1 bx1_d1MZ (+# bx2_d1N0 y_a2ia)
}
}
}
}
})
}
} It's very stupid that we have duplicating branches
How would you do this? |
I raised a GHC issue to track this: https://gitlab.haskell.org/ghc/ghc/-/issues/23122 |
You could unbox the buffer newtype Buffer# = Buffer# (# Int#, Int#, ByteArray# #)
data Buffer = Buffer Buffer# Then you can do manual worker wrapper transformation. Though I think we can just leave it unboxed, because the |
I went ahead with a release as is. Thanks for the idea! |
The builder in
ByteString
is slow, and it would be really nice if we could use this library to buildByteString
. We could do this by adding a phantom type parameter toBuffer
(andBuilder
). Creating aBuilder
from aByteString
will force the parameter toByteString
, to ensure that we cannot create aText
from it. Creating aBuilder
from text will be polymorphic because we can combine it withByteString
. Also, theBuilder
closures can check for pinned-ness when reallocating the buffer. I think this would be nice for code resuse because we can use the same operations for bothText
andByteString
. For exampleThe text was updated successfully, but these errors were encountered: