Skip to content

Commit

Permalink
streamDecodeUtf8With: accumulate undecoded chunks correctly
Browse files Browse the repository at this point in the history
We had previously gotten the accounting and reporting wrong if an
incomplete input was fed in over the course of several continuations,
such that we'd report only the incomplete input seen by the most
recent continuation.

This fixes gh-70.
  • Loading branch information
bos committed Feb 19, 2014
1 parent f5dade2 commit 02b409c
Showing 1 changed file with 9 additions and 5 deletions.
14 changes: 9 additions & 5 deletions Data/Text/Encoding.hs
Expand Up @@ -238,12 +238,13 @@ streamDecodeUtf8 = streamDecodeUtf8With strictDecode
-- | Decode, in a stream oriented way, a 'ByteString' containing UTF-8
-- encoded text.
streamDecodeUtf8With :: OnDecodeError -> ByteString -> Decoding
streamDecodeUtf8With onErr = decodeChunk 0 0
streamDecodeUtf8With onErr = decodeChunk B.empty 0 0
where
-- We create a slightly larger than necessary buffer to accommodate a
-- potential surrogate pair started in the last buffer
decodeChunk :: CodePoint -> DecoderState -> ByteString -> Decoding
decodeChunk codepoint0 state0 bs@(PS fp off len) =
decodeChunk :: ByteString -> CodePoint -> DecoderState -> ByteString
-> Decoding
decodeChunk undecoded0 codepoint0 state0 bs@(PS fp off len) =
runST $ (unsafeIOToST . decodeChunkToBuffer) =<< A.new (len+1)
where
decodeChunkToBuffer :: A.MArray s -> IO Decoding
Expand Down Expand Up @@ -281,8 +282,11 @@ streamDecodeUtf8With onErr = decodeChunk 0 0
return $! textP arr 0 (fromIntegral n)
lastPtr <- peek curPtrPtr
let left = lastPtr `minusPtr` curPtr
return $ Some chunkText (B.drop left bs)
(decodeChunk codepoint state)
undecoded = case state of
UTF8_ACCEPT -> B.empty
_ -> B.append undecoded0 (B.drop left bs)
return $ Some chunkText undecoded
(decodeChunk undecoded codepoint state)
in loop (ptr `plusPtr` off)
desc = "Data.Text.Internal.Encoding.streamDecodeUtf8With: Invalid UTF-8 stream"

Expand Down

0 comments on commit 02b409c

Please sign in to comment.