-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Whither Text? #51
Comments
supporting |
@phadej is there something intrinsic to how UTF16 bytestrings are laid out that would mean this requires a large-scale revision of the library or is it schlep? Or something in-between? |
@bitemyapp trifecta works on The problem is that data ByteString = PS {-# UNPACK #-} !(ForeignPtr Word8) -- payload
{-# UNPACK #-} !Int -- offset
{-# UNPACK #-} !Int -- length but -- | A space efficient, packed, unboxed Unicode text type.
--
-- Internally, the 'Text' type is represented as an array of 'Word16' UTF-16 code units.
-- The offset and length fields in the constructor are in these units, not units of 'Char'.
data Text = Text
{-# UNPACK #-} !Array -- payload (Word16 elements)
{-# UNPACK #-} !Int -- offset (units of Word16, not Char)
{-# UNPACK #-} !Int -- length (units of Word16, not Char)
deriving (Typeable)
-- | Immutable array type.
data Array = Array {
aBA :: ByteArray#
} And I'm not sure if one can convert from |
This is the essence of my worry - that it would force a larger rewrite to play nice with |
The real reason was massive amounts of code duplication would be required. I'm open to switching everything to |
Thanks to another IRC user, I was able to get Text parsing with Trifecta via this code:
But the copying makes me unhappy. I asked in IRC but no-one really knew, why does Trifecta only support UTF-8 ByteStrings as a first-class input stream?
The text was updated successfully, but these errors were encountered: