Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Comparing changes

Choose two branches to see what's changed or to start a new pull request. If you need to, you can also compare across forks.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also compare across forks.
base fork: basvandijk/blaze-binary
base: 7ec3479196
...
head fork: basvandijk/blaze-binary
compare: 50aa57ec0e
Checking mergeability… Don't worry, you can still create the pull request.
  • 19 commits
  • 13 files changed
  • 0 commit comments
  • 2 contributors
Commits on May 08, 2012
@meiersi meiersi added streaming decoding - as fast as binary 53ceaf5
@meiersi meiersi make benchmarks more fair
- Do not inline 'decodeList' at all.
- Also measure time for a full copy of the input data for the decoders that
  work only for strict bytestrings. It is negligible, 10us vs 500us.
79e668a
@meiersi meiersi Merge pull request #1 from basvandijk/generic
Added generic default implementations for encode and decode
24fa733
@meiersi meiersi Merge remote-tracking branch 'origin/master' 2c8f134
@basvandijk Moved Encoding type synonym to - and exported it from Data.Blaze.Bina…
…ry.Encoding
88cf189
@basvandijk INLINE sumSize & Doc fixes & Variable renamings f886b9b
@meiersi meiersi made a trade: speed for proof obligations...
...the hackery should work out. Once finished, a critical eye from a GHC
developer cannot hurt :P
ce10dbf
@meiersi meiersi Merge pull request #2 from basvandijk/generic
Minor changes
c138d02
Commits on May 09, 2012
@meiersi meiersi renamed: StreamDecoding ~> IterDecoding a9d4190
@meiersi meiersi yet another decoder: its speed doesn't warrant the hacks b705b70
Commits on May 10, 2012
@meiersi meiersi catch stackoverflows 3044a1d
@meiersi meiersi added support for Int, Float, Double to iteratee decoder 615cd4e
@meiersi meiersi implemented iteratee style character decoding 0bdb3c1
Commits on May 11, 2012
@meiersi meiersi benchmarked attoparsec: we gain 2.4x wrt it 8479864
@meiersi meiersi implemented base-128 variable length rendering: 1.8x slower than LE 6f48692
@meiersi meiersi bytestrings are completely regarded as primitive values 5f81f58
@meiersi meiersi more thoughts on the design of blaze-binary 958154c
@meiersi meiersi yes, the README is written in Markdown 1d79b63
Commits on May 12, 2012
@basvandijk Use a benchmark section in the blaze-binary.cabal file instead of a s…
…eparate cabal file
50aa57e
View
72 README
@@ -1,72 +0,0 @@
-Overview
-========
-
-Projects such as cloud-haskell or acid-state crucially rely on performant,
-generic serialization. As I've developed the new bytestring builder, I
-wondered what speedup I could gain using it for encoding a Haskell value to an
-unambiguous binary representation. This library, `blaze-binary`, is the
-(current) result state of this experiment.
-
-In preliminary benchmarks on my i7, 64bit Linux machine, this library is 2 - 4
-times faster for binary encoding than both `binary-0.5.0.2` and
-`cereal-0.3.5.1`. I have not yet implemented the decoding-half of this
-library.
-
-As an additional improvement over binary and cereal, this library can
-also output a textual representation of the sequence of primitive values
-(e.g., `Int`s, `Double`s, and strict `ByteString`s). Moreover, the encoding
-for this stream of primitive values can be chosen at runtime without any
-performance impact. This allows for example a developer of a CloudHaskell
-application to analyse the messages sent and received without having access to
-the type of the data being sent. This is especially interesting for displaying
-error messages in the case of a failed parse of a received message.
-
-Encoding Implementation
-=======================
-
-The implementation uses a two step approach to encode a Haskell value to a
-sequence of bytes. In a first step, the Haskell value is converted to a
-stream of primitive values, where a primitive value is an `IntX` or `WordX`
-for `X` in `["", "8","16","32","64"]`, a `Float`, `Double`, `Integer`, or a
-`ByteString` value. The conversion uses a difference list representation of
-the primitive stream to ensure *O(1)*-concatentation. In the second step, the
-stream of primitive values is converted to a sequence of bytes using the new
-bytestring builder and its support for bounded encodings.
-
-This splitting of the encoding into a "flattening pass" and an "primtive
-encoding pass" results in the nice benefit that the encoding of the stream of
-primitive values can be chosen at runtime. Morover, it is more efficient, as
-the benchmarks demonstrate. In the beginning, I implemented a version that
-encodes the values directly using the new bytestring builder. This initial
-version did not result in any speedup with respect to binary and cereal. My
-current hypothesis is that the type of all of these builders leads to too many
-unknown and possibly even unsaturated calls, whereas the difference list for
-the stream of primitive values only results in calls to unknown THUNKs.
-Evaluating unknown THUNKs is the fastest unkwon call.
-
-
-High-level Encoding Format
-==========================
-
-In contrast to binary and cereal, this library encodes lists in a streaming
-fashion, tagging `(:)` with 1 and `[]` with 0. This results in only one pass
-through a list and reduces GC pressure as it retains less memory than the list
-serialization used by binary and cereal, which prefixes the list with the
-number of elements.
-
-We also do not use the `Put` monad. The monadic value-passing is just not
-required.
-
-
-Planned Changes
-===============
-
-For a first release, I think the following points need to be addressed.
-
- - provide efficient decoding support
-
-I'd also like to investigate the following issues.
-
- - measure performance of using base-128 variable length encodings
- - complete the API such that debugging of CloudHaskell works as described
- above
View
205 README.md
@@ -0,0 +1,205 @@
+Overview
+========
+
+Projects such as cloud-haskell or acid-state crucially rely on performant,
+generic serialization. As I've developed the new bytestring builder, I
+wondered what speedup I could gain using it for encoding a Haskell value to an
+unambiguous binary representation. This library, `blaze-binary`, is the
+(current) result state of this experiment.
+
+In preliminary benchmarks on my i7, 64bit Linux machine, this library is 2 - 4
+times faster for binary encoding than both `binary-0.5.0.2` and
+`cereal-0.3.5.1`. Decoding is as fast as `binary-0.5.0.2`, but allows feeding
+the input in a chunkwise fashion, like `attoparsec`. Our decoder is at least
+2x faster than using attoparsec directly.
+
+As an additional improvement over binary and cereal, this library can
+also output a textual representation of the sequence of primitive values
+(e.g., `Int`s, `Double`s, and strict `ByteString`s). Moreover, the encoding
+for this stream of primitive values can be chosen at runtime without any
+performance impact. This allows for example a developer of a CloudHaskell
+application to analyse the messages sent and received without having access to
+the type of the data being sent. This is especially interesting for displaying
+error messages in the case of a failed parse of a received message or to
+investigate the communication patterns of a running CloudHaskell application.
+
+
+Encoding Implementation
+=======================
+
+The implementation uses a two step approach to encode a Haskell value to a
+sequence of bytes. In a first step, the Haskell value is converted to a
+stream of primitive values, where a primitive value is an `IntX` or `WordX`
+for `X` in `["", "8","16","32","64"]`, a `Float`, `Double`, `Integer`, a
+`ByteString`, or a `Text` value. The conversion uses a difference list
+representation of the primitive stream to ensure *O(1)*-concatentation. In
+the second step, the stream of primitive values is converted to a sequence of
+bytes using the new bytestring builder and its support for bounded encodings.
+
+This splitting of the encoding into a "flattening pass" and an "primtive
+encoding pass" results in the nice benefit that the encoding of the stream of
+primitive values can be chosen at runtime. Morover, it is more efficient, as
+the benchmarks demonstrate. In the beginning, I implemented a version that
+encodes the values directly using the new bytestring builder. This initial
+version did not result in any speedup with respect to binary and cereal. My
+current hypothesis is that the type of all of these builders leads to too many
+unknown and possibly even unsaturated calls, whereas the difference list for
+the stream of primitive values only results in calls to unknown THUNKs.
+Evaluating unknown THUNKs is the fastest unkwon call.
+
+
+High-level Encoding Format
+==========================
+
+In contrast to binary and cereal, this library encodes lists in a streaming
+fashion, tagging `(:)` with 1 and `[]` with 0. This results in only one pass
+through a list and reduces GC pressure as it retains less memory than the list
+serialization used by binary and cereal, which prefixes the list with the
+number of elements.
+
+We also do not use the `Put` monad. The monadic value-passing is just not
+required.
+
+
+Encoding Primitive Values
+=========================
+
+
+I plan to implement two different encoding formats: one format optimized for
+compactness and one optimized for throughput. Both of these formats come in a
+tagged variant that allows decoding the stream of primitive values without
+access to the type.
+
+All results are prefixed with a 4-byte identifier. Currently, we use the
+following assignment of identifiers to formats.
+
+ 0xce,0xbb,0x2e,0x30 throughput, untagged
+ 0xce,0xbb,0x2e,0x31 throughput, tagged
+ 0xce,0xbb,0x2e,0x32 compact, untagged
+ 0xce,0xbb,0x2e,0x33 compact, tagged
+
+The compact and the throughput format only differ in how they encode `IntX`s,
+`WordX`s, and `Integer`s. For the common primitive values they use the
+following encodings.
+
+ - `Char`s are UTF-8 encoded.
+ - `Float`s are encoded as IEEE 754 values with their octets in little-endian
+ order.
+ - `Double`s are encoded as IEEE 754 values with their octets in
+ little-endian order.
+ - `ByteString`s are encoded with their length prefixed according to the
+ `Int` format.
+ - `Text` values are encoded using a zero-terminated, modified UTF-8 format
+ that works like UTF-8 except that it encodes `'\x0'` as `[0xC0,0x80]`.
+ This format never outputs a '0x00' for any Unicode codepoint and can
+ therefore be zero-terminated, which allows an efficient streaming
+ encoding.
+
+
+The compact format
+------------------
+
+This is the default format. It trades some performance for compactness and
+portability. `Int`s and `Word`s wider than 2 bytes are encoded using a
+variable length base-128 encoding, as used by (Google's protocol bufffers)[https://developers.google.com/protocol-buffers/docs/encoding].
+
+
+The throughput format
+---------------------
+
+This format is optmized for maximum throughput on 64bit, x86 machines. I
+assume they are the future server machines of choice. All primitive values are
+therefore encoded using a little-endian encoding.
+
+
+The tagged format
+-----------------
+
+Before every primitive value a tag-byte is written indicating the type of the
+following primitive value. This allows decoding a binary value to a
+human-readable stream of primitive values.
+
+
+API
+---
+
+In the first releases, all low-level encoding and decoding support is kept
+internal. This simplifies experimentation. There is one abstract type for
+`Encoder`s and one for `Decoder`s. The `Monoid` and `Monad` typeclasses are
+provided as combintors for them.
+
+ newtype Encoder a = ...
+ newtype Decoder a = ...
+
+ class Binary a where
+ toBinary :: Encoder a
+ fromBinary :: Decoder a
+
+Only one format is supported in the beginning. The untagged, throughput
+format. This format gives a good baseline for the possible speed of the
+implementation. We run an `Encoder` by converting it to a bytestring
+`Builder`.
+
+ encode :: Encoder a -> a -> Builder
+
+We run a `Decoder` by converting it to an
+(`Data.Attoparsec.ByteString.Result`)[http://hackage.haskell.org/packages/archive/attoparsec/0.10.1.1/doc/html/Data-Attoparsec-ByteString.html#t:Result].
+
+ decode :: Decoder a -> Result a
+
+Note that the decoder selects the appropriate format based on the 4-byte
+prefix.
+
+We provide convenience functions for the conversion to and from bytestrings.
+
+ toBinaryBuilder :: Binary a => a -> Builder
+ toBinaryByteString :: Binary a => a -> S.ByteString
+ toBinaryLazyByteString :: Binary a => a -> L.ByteString
+
+ fromBinaryByteString :: Binary a => S.ByteString -> Either String a
+ fromBinaryLazyByteString :: Binary a => L.ByteString -> Either String a
+
+
+
+Security Concerns
+=================
+
+Note that the input of the decoder is *untrusted* and may be an arbitrary
+sequence of bytes. The decoding implementation must make sure that for *any*
+bytestring either an error is reported via `Left` or a Haskell value
+satisfying *all* invariants of its type is returned. This entails for example
+that we must validate every `Text` value. This also excludes using functions
+such as `fromAscList` without having validated their input first.
+
+The benefit of implementing fully validating decoders is that we can use them
+for implementing public interfaces. If the cost of validation is too high then
+we can consider implementing a second `UnsafeBinary` typeclass whose decoder
+is only guaranteed to be correct for bytestrings in the range of the encoder.
+
+Note that we must also take care to provide good bounds on the resource usage
+of our implementation. This concerns heap space and stack space. Some
+implementations require considerable stack space. They might profit from
+catching `StackOverflow` exceptions and report them politely to their caller
+using a `Left` result.
+
+Note also that we must report overflows when decoding `Int` and `Word` values,
+as we cannot guarantee that using a truncated 64-bit number will work.
+
+
+TODO for a first release
+========================
+
+Implement the above API for the throughput format and benchmark against
+binary, cereal, and attoparsec to catch regressions.
+
+
+Future Work
+===========
+
+- Implement generic serialization (DONE by Bas van Dijk, needs benchmarking)
+- Implement all four suggested formats
+- Implement debugging decoder for tagged formats
+- Implement error reporting for tagged format that produces human-readable
+ output. It will still be flattened though.
+
+
View
113 bench/Benchmark.hs
@@ -13,10 +13,13 @@ module Main (main, testNewBinary) where
import Prelude hiding (words)
import Criterion.Main
import Control.DeepSeq
+import Control.Applicative
import Data.Blaze.Binary.Encoding (renderTextualUtf8, renderTagged)
import qualified Data.Blaze.Binary.Decoding as Blaze (Decoder, runDecoder)
import qualified Data.Blaze.Binary.ParamDecoding as ParamBlaze (Decoder, runDecoder, word8s, string)
+import qualified Data.Blaze.Binary.IterDecoding as IterBlaze (DStream, decodeWith, word8s, string )
+import qualified Data.Blaze.Binary.StreamDecoding as StreamBlaze (benchWord8s)
import qualified Data.ByteString as S
import qualified Data.ByteString.Internal as S
import qualified Data.ByteString.Lazy as L
@@ -33,6 +36,8 @@ import Data.Tree
import Data.Word
import qualified Data.Foldable as F (toList)
+import qualified Data.Attoparsec as A
+
------------------------------------------------------------------------------
-- Benchmark
@@ -40,7 +45,7 @@ import qualified Data.Foldable as F (toList)
-- | The number of repetitions to consider.
nRepl :: Int
-nRepl = 10000
+nRepl = 1000
-- We use NOINLINE to ensure that GHC has no chance of optimizing too much.
@@ -85,36 +90,57 @@ charData n = take n ['\0'..]
main :: IO ()
main = Criterion.Main.defaultMain $
[ bgroup ("decode (" ++ show nRepl ++ ")")
- [ bench "param-blaze-binary: string" $ nf
+ [ bench "param-blaze-binary: word8s" $ nf
+ (benchParamDecoder ParamBlaze.word8s . S.copy)
+ (Blaze.toByteString $ word8Data nRepl)
+ , bench "iter-blaze-binary: word8s" $ nf
+ (benchIterDecoder IterBlaze.word8s)
+ (Blaze.toByteString $ word8Data nRepl)
+ , bench "binary: word8s" $ nf (Binary.decode :: L.ByteString -> [Word8]) (Binary.encode $ word8Data nRepl)
+ , bench "attoparsec-noinline: word8s" $ nf
+ (benchAttoparsec attoBinaryWord8sNoInline)
+ (Blaze.toByteString $ word8Data nRepl)
+ , bench "param-blaze-binary: string" $ nf
(benchParamDecoder ParamBlaze.string)
(Blaze.toByteString $ charData nRepl)
- , bench "blaze-binary: string" $ nf
- (benchDecoder (Blaze.decode :: Blaze.Decoder String))
+ , bench "iter-blaze-binary: string" $ nf
+ (benchIterDecoder IterBlaze.string)
(Blaze.toByteString $ charData nRepl)
-- , bench "blaze-binary: word8sSimple" $ nf (benchDecoder Blaze.word8sSimple) (Blaze.toByteString $ word8Data nRepl)
-- , bench "cereal: word8s" $ nf (decodeLazy :: L.ByteString -> Either String [Word8]) (encodeLazy $ word8Data nRepl)
, bench "binary: string" $ nf (Binary.decode :: L.ByteString -> String) (Binary.encode $ charData nRepl)
-
- , bench "param-blaze-binary: word8s" $ nf
- (benchParamDecoder ParamBlaze.word8s)
+ , bench "stream-blaze-binary: word8s" $ nf
+ (StreamBlaze.benchWord8s . S.copy)
(Blaze.toByteString $ word8Data nRepl)
, bench "blaze-binary: word8s" $ nf
- (benchDecoder (Blaze.decode :: Blaze.Decoder [Word8]))
+ (benchDecoder (Blaze.decode :: Blaze.Decoder [Word8]) . S.copy)
(Blaze.toByteString $ word8Data nRepl)
+
+ , bench "blaze-binary: string" $ nf
+ (benchDecoder (Blaze.decode :: Blaze.Decoder String))
+ (Blaze.toByteString $ charData nRepl)
-- , bench "blaze-binary: word8sSimple" $ nf (benchDecoder Blaze.word8sSimple) (Blaze.toByteString $ word8Data nRepl)
-- , bench "cereal: word8s" $ nf (decodeLazy :: L.ByteString -> Either String [Word8]) (encodeLazy $ word8Data nRepl)
- , bench "binary: word8s" $ nf (Binary.decode :: L.ByteString -> [Word8]) (Binary.encode $ word8Data nRepl)
+ , bench "attoparsec-inlined: word8s" $ nf
+ (benchAttoparsec attoBinaryWord8s)
+ (Blaze.toByteString $ word8Data nRepl)
]
+
+ , bgroup "encode"
+ [ benchmarks "String " id (charData nRepl)
+ , benchmarks "[String] " id (stringData nRepl)
+ , benchmarks "testValue " id (testValue nRepl)
+ , benchmarks "Tree Int " id (treeIntData nRepl)
+ , benchmarks "Seq Int " id (seqIntData nRepl)
+ , benchmarks "[Int] " id (intData nRepl)
+ ]
]
- -- , bgroup "encode"
- -- [ benchmarks "testValue " id (testValue nRepl)
- -- , benchmarks "Tree Int " id (treeIntData nRepl)
- -- , benchmarks "Seq Int " id (seqIntData nRepl)
- -- , benchmarks "[Int] " id (intData nRepl)
- -- , benchmarks "[String] " id (stringData nRepl)
- -- ]
- -- ]
where
+ benchAttoparsec :: A.Parser a -> S.ByteString -> a
+ benchAttoparsec p bs = case A.eitherResult $ A.parse p bs of
+ Left msg -> error msg
+ Right x -> x
+
benchDecoder :: Blaze.Decoder a -> S.ByteString -> a
benchDecoder d bs = case Blaze.runDecoder d bs of
Left msg -> error msg
@@ -125,22 +151,27 @@ main = Criterion.Main.defaultMain $
Left msg -> error msg
Right x -> x
+ benchIterDecoder :: IterBlaze.DStream a -> S.ByteString -> a
+ benchIterDecoder d bs = case IterBlaze.decodeWith d bs of
+ Left msg -> error msg
+ Right x -> x
+
benchmarks :: forall a b. (Binary a, Blaze.Binary a, Serialize a, NFData a)
=> String -> (b -> a) -> b -> Benchmark
benchmarks name f x = bgroup (name ++ show nRepl)
- [ bgroup "decode"
- [ bench "blaze-binary" $ nf (benchDecoder Blaze.decode :: S.ByteString -> a) (Blaze.toByteString $ f x)
- -- , bench "blaze-binary tagged" $ whnf (L.length . renderTagged . Blaze.encode . f) x andrea
- , bench "cereal" $ nf (decodeLazy :: L.ByteString -> Either String a) (encodeLazy $ f x)
- , bench "binary" $ nf (Binary.decode :: L.ByteString -> a) (Binary.encode $ f x)
- ]
- -- , bgroup "encode"
- -- [ bench "blaze-binary" $ nf (L.length . Blaze.toLazyByteString . f) x
+ -- [ bgroup "decode"
+ -- [ bench "blaze-binary" $ nf (benchDecoder Blaze.decode :: S.ByteString -> a) (Blaze.toByteString $ f x)
-- -- , bench "blaze-binary tagged" $ whnf (L.length . renderTagged . Blaze.encode . f) x andrea
- -- , bench "cereal" $ nf (L.length . encodeLazy . f) x
- -- , bench "binary" $ nf (L.length . Binary.encode . f) x
+ -- , bench "cereal" $ nf (decodeLazy :: L.ByteString -> Either String a) (encodeLazy $ f x)
+ -- , bench "binary" $ nf (Binary.decode :: L.ByteString -> a) (Binary.encode $ f x)
-- ]
- ]
+ --, bgroup "encode"
+ [ bench "blaze-binary" $ nf (L.length . Blaze.toLazyByteString . f) x
+ -- , bench "blaze-binary tagged" $ whnf (L.length . renderTagged . Blaze.encode . f) x andrea
+ , bench "cereal" $ nf (L.length . encodeLazy . f) x
+ , bench "binary" $ nf (L.length . Binary.encode . f) x
+ ]
+ --]
-- | Testing the new binary encoding format.
testNewBinary :: Blaze.Binary a => a -> IO ()
@@ -154,3 +185,29 @@ instance NFData S.ByteString where
instance NFData a => NFData (Seq.Seq a) where
rnf = rnf . F.toList
+------------------------------------------------------------------------------
+-- Attoparsec
+------------------------------------------------------------------------------
+
+{-# INLINE genAttoBinaryWord8s #-}
+genAttoBinaryWord8s :: (A.Parser Word8) -> A.Parser [Word8]
+genAttoBinaryWord8s w8 = do
+ go
+ where
+ go = do
+ tag <- w8
+ case tag of
+ 0 -> return []
+ 1 -> (:) <$> w8 <*> go
+ _ -> fail $ "parseBinaryWord8s: unknown tag " ++ show tag
+
+attoBinaryWord8s :: A.Parser [Word8]
+attoBinaryWord8s = genAttoBinaryWord8s A.anyWord8
+
+attoBinaryWord8sNoInline :: A.Parser [Word8]
+attoBinaryWord8sNoInline = genAttoBinaryWord8s attoWord8_noinline
+
+{-# NOINLINE attoWord8_noinline #-}
+attoWord8_noinline :: A.Parser Word8
+attoWord8_noinline = A.anyWord8
+
View
30 bench/LICENSE
@@ -1,30 +0,0 @@
-Copyright (c) 2012, Simon Meier
-
-All rights reserved.
-
-Redistribution and use in source and binary forms, with or without
-modification, are permitted provided that the following conditions are met:
-
- * Redistributions of source code must retain the above copyright
- notice, this list of conditions and the following disclaimer.
-
- * Redistributions in binary form must reproduce the above
- copyright notice, this list of conditions and the following
- disclaimer in the documentation and/or other materials provided
- with the distribution.
-
- * Neither the name of Simon Meier nor the names of other
- contributors may be used to endorse or promote products derived
- from this software without specific prior written permission.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
-LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
-OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
-SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
-THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
View
2  bench/Setup.hs
@@ -1,2 +0,0 @@
-import Distribution.Simple
-main = defaultMain
View
28 bench/blaze-binary-bench.cabal
@@ -1,28 +0,0 @@
-name: blaze-binary-bench
-version: 0.1.0.0
-synopsis: Benchmarks for the blaze-binary library
--- description:
-homepage: git://github.com/meiersi/blaze-binary.git
-license: BSD3
-license-file: LICENSE
-author: Simon Meier <iridcode@gmail.com>
-maintainer: Simon Meier <iridcode@gmail.com>
--- copyright:
-category: System
-build-type: Simple
-cabal-version: >=1.8
-
-executable blaze-binary-bench
- main-is: Benchmark.hs
- hs-source-dirs: src .
- build-depends: base
- , ghc-prim
- , containers == 0.4.*
- , array == 0.3.*
- , bytestring == 0.9.*
- , criterion
- , cereal == 0.3.*
- , binary == 0.5.*
- , bytestring-builder == 0.1.*
- , deepseq == 1.2.*
- , primitive == 0.4.*
View
17 blaze-binary.cabal
@@ -44,3 +44,20 @@ library
if impl(ghc >= 7.2.1)
cpp-options: -DGENERICS
+
+benchmark bench
+ type: exitcode-stdio-1.0
+ main-is: Benchmark.hs
+ hs-source-dirs: bench, src
+ build-depends: base >= 4 && < 5
+ , ghc-prim >= 0.2 && < 0.3
+ , containers >= 0.4 && < 0.5
+ , array >= 0.4 && < 0.5
+ , bytestring >= 0.9 && < 0.10
+ , criterion >= 0.6 && < 0.7
+ , cereal >= 0.3 && < 0.4
+ , binary >= 0.5 && < 0.6
+ , bytestring-builder >= 0.1 && < 0.2
+ , deepseq >= 1.3 && < 1.4
+ , primitive >= 0.4 && < 0.5
+ , attoparsec >= 0.10 && < 0.11
View
36 src/Data/Blaze/Binary.hs
@@ -60,15 +60,13 @@ import GHC.Generics
------------------------------------------------------------------------
-type Encoding t = t -> VStream
-
-- | If your compiler has support for the @DeriveGeneric@ and
--- @DefaultSignatures@ language extensions (@ghc >= 7.2.1@), the 'encode' and 'get'
+-- @DefaultSignatures@ language extensions (@ghc >= 7.2.1@), the 'encode' and 'decode'
-- methods will have default generic implementations.
--
-- To use this option, simply add a @deriving 'Generic'@ clause to your datatype
-- and declare a 'Binary' instance for it without giving a definition for
--- 'encode' and 'get'.
+-- 'encode' and 'decode'.
class Binary t where
-- | Encode a value in the Put monad.
encode :: Encoding t
@@ -309,12 +307,12 @@ instance (Binary a, Binary b) => Binary (Either a b) where
instance Binary S.ByteString where
{-# INLINE encode #-}
- encode = \bs -> int (S.length bs) <> byteString bs
+ encode = byteString
{-# INLINE decode #-}
- decode = D.int >>= D.byteStringSlice
+ decode = D.byteString
instance Binary L.ByteString where
- encode = (<> int 0) . L.foldrChunks (\bs s -> encode bs <> s) mempty
+ encode = L.foldrChunks (\bs s -> encode bs <> s) (encode S.empty)
decode = do
bs <- decode
if S.null bs
@@ -417,7 +415,7 @@ instance (Binary i, Ix i, Binary e, IArray UArray e) => Binary (UArray i e) wher
#ifdef GENERICS
------------------------------------------------------------------------
--- Generic Serialze
+-- Generic Binary
class GBinary f where
gEncode :: Encoding (f a)
@@ -436,8 +434,8 @@ instance Binary a => GBinary (K1 i a) where
{-# INLINE gDecode #-}
instance GBinary U1 where
- gEncode _ = mempty
- gDecode = pure U1
+ gEncode = const mempty
+ gDecode = pure U1
{-# INLINE gEncode #-}
{-# INLINE gDecode #-}
@@ -482,31 +480,31 @@ class EncodeSum f where
encodeSum :: (Num word, Bits word, Binary word) => word -> word -> Encoding (f a)
instance (EncodeSum a, EncodeSum b, GBinary a, GBinary b) => EncodeSum (a :+: b) where
- encodeSum !code !size s = case s of
- L1 x -> encodeSum code sizeL x
- R1 x -> encodeSum (code + sizeL) sizeR x
+ encodeSum !tag !size s = case s of
+ L1 x -> encodeSum tag sizeL x
+ R1 x -> encodeSum (tag + sizeL) sizeR x
where
sizeL = size `shiftR` 1
sizeR = size - sizeL
{-# INLINE encodeSum #-}
instance GBinary a => EncodeSum (C1 c a) where
- encodeSum !code _ x = encode code <> gEncode x
+ encodeSum !tag _ x = encode tag <> gEncode x
{-# INLINE encodeSum #-}
------------------------------------------------------------------------
checkDecodeSum :: (Ord word, Bits word, DecodeSum f) => word -> word -> D.Decoder (f a)
-checkDecodeSum size code | code < size = decodeSum code size
- | otherwise = fail "Unknown encoding for constructor"
+checkDecodeSum size tag | tag < size = decodeSum tag size
+ | otherwise = fail "Unknown encoding for constructor"
{-# INLINE checkDecodeSum #-}
class DecodeSum f where
decodeSum :: (Ord word, Num word, Bits word) => word -> word -> D.Decoder (f a)
instance (DecodeSum a, DecodeSum b, GBinary a, GBinary b) => DecodeSum (a :+: b) where
- decodeSum !code !size | code < sizeL = L1 <$> decodeSum code sizeL
- | otherwise = R1 <$> decodeSum (code - sizeL) sizeR
+ decodeSum !tag !size | tag < sizeL = L1 <$> decodeSum tag sizeL
+ | otherwise = R1 <$> decodeSum (tag - sizeL) sizeR
where
sizeL = size `shiftR` 1
sizeR = size - sizeL
@@ -526,7 +524,9 @@ newtype Tagged (s :: * -> *) b = Tagged {unTagged :: b}
instance (SumSize a, SumSize b) => SumSize (a :+: b) where
sumSize = Tagged $ unTagged (sumSize :: Tagged a Word64) +
unTagged (sumSize :: Tagged b Word64)
+ {-# INLINE sumSize #-}
instance SumSize (C1 c a) where
sumSize = Tagged 1
+ {-# INLINE sumSize #-}
#endif
View
37 src/Data/Blaze/Binary/Decoding.hs
@@ -34,7 +34,8 @@ data ParseException = ParseException String -- {-# UNPACK #-} !(Ptr Word8)
instance Exception ParseException where
newtype Decoder a = Decoder {
- unDecoder :: ForeignPtr Word8 -> Addr# -> Addr#
+ -- unDecoder :: ForeignPtr Word8 -> Addr# -> Addr#
+ unDecoder :: ForeignPtr Word8 -> Ptr Word8 -> Ptr Word8
-> State# RealWorld -> (# State# RealWorld, Addr#, a #)
}
@@ -45,17 +46,17 @@ instance Functor Decoder where
instance Applicative Decoder where
{-# INLINE pure #-}
- pure x = Decoder $ \_ ip0 _ s0 -> (# s0, ip0, x #)
+ pure x = Decoder $ \_ ip0 _ s0 -> (# s0, getAddr ip0, x #)
{-# INLINE (<*>) #-}
Decoder fIO <*> Decoder xIO = Decoder $ \fpbuf ip0 ipe s0 ->
case fIO fpbuf ip0 ipe s0 of
- (# s1, ip1, f #) -> case xIO fpbuf ip1 ipe s1 of
+ (# s1, ip1, f #) -> case xIO fpbuf (Ptr ip1) ipe s1 of
(# s2, ip2, x #) -> (# s2, ip2, f x #)
{-# INLINE liftIO #-}
liftIO :: IO a -> Decoder a
-liftIO (IO io) = Decoder $ \_ ip0 _ s0 -> case io s0 of
+liftIO (IO io) = Decoder $ \_ !(Ptr ip0) _ s0 -> case io s0 of
(# s1, x #) -> (# s1, ip0, x #)
{-# INLINE runIO #-}
@@ -69,7 +70,7 @@ instance Monad Decoder where
{-# INLINE (>>=) #-}
Decoder xIO >>= f = Decoder $ \fpbuf ip0 ipe s0 ->
case xIO fpbuf ip0 ipe s0 of
- (# s1, ip1, x #) -> unDecoder (f x) fpbuf ip1 ipe s1
+ (# s1, ip1, x #) -> unDecoder (f x) fpbuf (Ptr ip1) ipe s1
{-# INLINE fail #-}
fail msg = liftIO $ throw $ ParseException msg
@@ -89,21 +90,21 @@ requires n p = Decoder $ \buf@(Buffer ip ipe) ->
{-# INLINE storable #-}
storable :: forall a. Storable a => Decoder a
storable = Decoder $ \fpbuf ip0 ipe s0 ->
- let ip1 = plusAddr# ip0 size in
- if Ptr ip1 <= Ptr ipe
- then case runIO (peek (Ptr ip0)) s0 of
- (# s1, x #) -> (# s1, ip1, x #)
+ let ip1 = ip0 `plusPtr` size in
+ if ip1 <= ipe
+ then case runIO (peek (castPtr ip0 :: Ptr a)) s0 of
+ (# s1, x #) -> (# s1, getAddr ip1, x #)
else unDecoder
- (fail $ "less than the required " ++ show (I# size) ++ " bytes left.")
+ (fail $ "less than the required " ++ show size ++ " bytes left.")
fpbuf ip0 ipe s0
where
- !(I# size) = sizeOf (undefined :: a)
+ size = sizeOf (undefined :: a)
runDecoder :: Decoder a -> S.ByteString -> Either String a
runDecoder p (S.PS fpbuf off len) = S.inlinePerformIO $ do
withForeignPtr fpbuf $ \pbuf -> do
- let !(Ptr ip) = pbuf `plusPtr` off
- !(Ptr ipe) = Ptr ip `plusPtr` len
+ let !ip = pbuf `plusPtr` off
+ !ipe = ip `plusPtr` len
(`catch` handler) $ do
x <- IO $ \s0 -> case unDecoder p fpbuf ip ipe s0 of
(# s1, _, x #) -> (# s1, x #)
@@ -163,15 +164,19 @@ float = storable
double :: Decoder Double
double = storable
+{-# INLINE byteString #-}
+byteString :: Decoder S.ByteString
+byteString = int >>= byteStringSlice
+
{-# INLINE byteStringSlice #-}
byteStringSlice :: Int -> Decoder S.ByteString
byteStringSlice len = Decoder $ \fpbuf ip0 ipe s0 ->
- let ip1 = Ptr ip0 `plusPtr` len
+ let ip1 = ip0 `plusPtr` len
in
- if ip1 <= Ptr ipe
+ if ip1 <= ipe
then (# s0
, getAddr ip1
- , S.PS fpbuf (Ptr ip0 `minusPtr` unsafeForeignPtrToPtr fpbuf) len
+ , S.PS fpbuf (ip0 `minusPtr` unsafeForeignPtrToPtr fpbuf) len
#)
else unDecoder
(fail $ "less than the required " ++ show len ++ " bytes left.")
View
68 src/Data/Blaze/Binary/Encoding.hs
@@ -4,7 +4,7 @@
-- Module : Data.Blaze.Binary.Encoding
-- Copyright : 2012, Simon Meier <iridcode@gmail.com>
-- License : BSD3-style (see LICENSE)
---
+--
-- Maintainer : Simon Meier <iridcode@gmail.com>
-- Stability :
-- Portability : portable
@@ -14,8 +14,10 @@
-----------------------------------------------------------------------------
module Data.Blaze.Binary.Encoding (
+ Encoding
+
-- * Streams of values to be encoded
- VStream
+ , VStream
, render
, renderTagged
, renderTextualUtf8
@@ -42,7 +44,7 @@ module Data.Blaze.Binary.Encoding (
, char
, byteString
-
+
, builder
, (<>)
@@ -96,6 +98,8 @@ data VStreamRep =
| VBuilder !B.Builder VStreamRep
| VEmpty
+type Encoding t = t -> VStream
+
-- | A stream of values to be encoded.
newtype VStream = VStream { toVStreamRep :: VStreamRep -> VStreamRep }
@@ -107,6 +111,20 @@ instance Monoid VStream where
{-# INLINE mconcat #-}
mconcat = foldr mappend mempty
+-- | Binary encode a 'VStream' to a lazy bytestring 'B.Builder' using a
+-- compact Base-128 encoding for integers and words.
+renderCompact :: VStream -> B.Builder
+renderCompact = renderWith
+ (E.fromF E.word8) E.word16Base128LE E.word32Base128LE E.word64Base128LE
+ E.wordBase128LE
+ (E.fromF E.int8) E.int16ZigZagBase128LE E.int32ZigZagBase128LE E.int64ZigZagBase128LE
+ E.intZigZagBase128LE
+ E.charUtf8
+ (E.fromF E.floatLE) (E.fromF E.doubleLE)
+ (error "render: integer: implement")
+ (\x -> E.encodeWithB E.intZigZagBase128LE (S.length x) <> B.byteString x)
+ id
+
-- | Binary encode a 'VStream' to a lazy bytestring 'B.Builder'.
render :: VStream -> B.Builder
render = renderWith
@@ -114,7 +132,7 @@ render = renderWith
(fe E.int8) (fe E.int16LE) (fe E.int32LE) (fe E.int64LE) (fe (fromIntegral E.>$< E.int64LE))
E.charUtf8 (fe E.floatLE) (fe E.doubleLE)
(error "render: integer: implement")
- B.byteString
+ (\x -> B.int64LE (fromIntegral $ S.length x) <> B.byteString x)
id
where
{-# INLINE fe #-}
@@ -123,8 +141,8 @@ render = renderWith
-- | Binary encode a 'VStream' to a lazy bytestring 'B.Builder' using a tagged
-- format that allows to reconstruct the value stream.
renderTagged :: VStream -> L.ByteString
-renderTagged =
- B.toLazyByteString
+renderTagged =
+ B.toLazyByteString
. renderWith
(tf 0 E.word8) (tf 1 E.word16LE) (tf 2 E.word32LE) (tf 3 E.word64LE) (tf 4 (fromIntegral E.>$< E.word64LE))
(tf 5 E.int8) (tf 6 E.int16LE) (tf 7 E.int32LE) (tf 8 E.int64LE) (tf 9 (fromIntegral E.>$< E.int64LE))
@@ -154,7 +172,7 @@ renderWith w8 w16 w32 w64 w i8 i16 i32 i64 i c f d ibig bs b =
-- take care that inlining is possible once all encodings are fixed
\vs0 -> B.builder $ step (toVStreamRep vs0 VEmpty)
where
- step vs1 k (B.BufferRange op0 ope0) =
+ step vs1 k (B.BufferRange op0 ope0) =
go vs1 op0
where
go vs !op
@@ -194,7 +212,7 @@ renderTextualUtf8 vs0 =
go (VWord8 1 (VChar x vs)) = line "w8,c 1," (B.charUtf8 x) vs
go (VWord8 1 (VWord x vs)) = line "w8,w 1," (B.wordDec x) vs
go (VWord8 1 (VInt x vs)) = line "w8,i 1," (B.intDec x) vs
- go (VInt l (VByteString x vs))
+ go (VInt l (VByteString x vs))
| l > 0 = line "i,bs " (B.intDec l <> B.char8 ',' <> B.byteStringHexFixed x) vs
go (VWord8 x vs) = line "w8 " (B.word8Dec x) vs
go (VWord16 x vs) = line "w16 " (B.word16Dec x) vs
@@ -221,67 +239,65 @@ renderTextualUtf8 vs0 =
------------------------------
{-# INLINE float #-}
-float :: Float -> VStream
+float :: Encoding Float
float = VStream . VFloat
{-# INLINE double #-}
-double :: Double -> VStream
+double :: Encoding Double
double = VStream . VDouble
{-# INLINE integer #-}
-integer :: Integer -> VStream
+integer :: Encoding Integer
integer = VStream . VInteger
{-# INLINE word #-}
-word :: Word -> VStream
+word :: Encoding Word
word = VStream . VWord
{-# INLINE word8 #-}
-word8 :: Word8 -> VStream
+word8 :: Encoding Word8
word8 = VStream . VWord8
{-# INLINE word16 #-}
-word16 :: Word16 -> VStream
+word16 :: Encoding Word16
word16 = VStream . VWord16
{-# INLINE word32 #-}
-word32 :: Word32 -> VStream
+word32 :: Encoding Word32
word32 = VStream . VWord32
{-# INLINE word64 #-}
-word64 :: Word64 -> VStream
+word64 :: Encoding Word64
word64 = VStream . VWord64
{-# INLINE int #-}
-int :: Int -> VStream
+int :: Encoding Int
int = VStream . VInt
{-# INLINE int8 #-}
-int8 :: Int8 -> VStream
+int8 :: Encoding Int8
int8 = VStream . VInt8
{-# INLINE int16 #-}
-int16 :: Int16 -> VStream
+int16 :: Encoding Int16
int16 = VStream . VInt16
{-# INLINE int32 #-}
-int32 :: Int32 -> VStream
+int32 :: Encoding Int32
int32 = VStream . VInt32
{-# INLINE int64 #-}
-int64 :: Int64 -> VStream
+int64 :: Encoding Int64
int64 = VStream . VInt64
{-# INLINE char #-}
-char :: Char -> VStream
+char :: Encoding Char
char = VStream . VChar
{-# INLINE byteString #-}
-byteString :: S.ByteString -> VStream
+byteString :: Encoding S.ByteString
byteString = VStream . VByteString
{-# INLINE builder #-}
-builder :: B.Builder -> VStream
+builder :: Encoding B.Builder
builder = VStream . VBuilder
-
-
View
387 src/Data/Blaze/Binary/IterDecoding.hs
@@ -0,0 +1,387 @@
+{-# LANGUAGE MagicHash, RankNTypes, BangPatterns #-}
+-----------------------------------------------------------------------------
+-- |
+-- Module : Data.Blaze.Binary.Encoding
+-- Copyright : 2012, Simon Meier <iridcode@gmail.com>
+-- License : BSD3-style (see LICENSE)
+--
+-- Maintainer : Simon Meier <iridcode@gmail.com>
+-- Stability :
+-- Portability : portable
+--
+-- Iteratee-style decoding of binary values.
+--
+-----------------------------------------------------------------------------
+module Data.Blaze.Binary.IterDecoding where
+
+import Prelude hiding (catch)
+
+import Control.Applicative
+
+import qualified Data.ByteString.Internal as S
+
+import Foreign
+import GHC.Word
+import GHC.Int
+import GHC.Prim
+import GHC.Types
+
+data DStreamRep a =
+ DWord8 (Word# -> DStreamRep a)
+ | DInt (Int# -> DStreamRep a)
+ | DChar (Char# -> DStreamRep a)
+ | DByteString {-# UNPACK #-} !Int (S.ByteString -> DStreamRep a)
+ | DWord (Word# -> DStreamRep a)
+ | DFloat (Float# -> DStreamRep a)
+ | DDouble (Double# -> DStreamRep a)
+ | DWord64 (Word# -> DStreamRep a)
+ | DInt64 (Int# -> DStreamRep a)
+ | DWord32 (Word# -> DStreamRep a)
+ | DInt32 (Int# -> DStreamRep a)
+ | DWord16 (Word# -> DStreamRep a)
+ | DInt16 (Int# -> DStreamRep a)
+ | DInt8 (Int# -> DStreamRep a)
+ | DSlowWord8 (Ptr Word8) String (Word8 -> DStreamRep a)
+ -- ^ For reading a multi-byte primitive at the boundary.
+ | DFail String
+ | DReturn a
+
+newtype DStream a = DStream {
+ unDStream :: forall r. (a -> DStreamRep r) -> DStreamRep r
+ }
+
+instance Functor DStream where
+ {-# INLINE fmap #-}
+ fmap f = \s -> DStream $ \k -> unDStream s (k . f)
+
+instance Applicative DStream where
+ {-# INLINE pure #-}
+ pure = \x -> DStream $ \k -> k x
+
+ {-# INLINE (<*>) #-}
+ (<*>) = \sf sx -> DStream $ \k ->
+ unDStream sf (\f -> unDStream sx (\x -> k (f x)))
+
+instance Monad DStream where
+ return = pure
+
+ {-# INLINE (>>=) #-}
+ (>>=) = \sm f -> DStream $ \k -> unDStream sm (\m -> unDStream (f m) k)
+
+ fail msg = DStream $ \_ -> DFail msg
+
+word8 :: DStream Word8
+word8 = DStream $ \k -> DWord8 (\x -> k (W8# x))
+
+char :: DStream Char
+char = DStream $ \k -> DChar (\x -> k (C# x))
+
+word8s :: DStream [Word8]
+word8s = decodeList word8
+
+string :: DStream String
+string = decodeList char
+
+{-# NOINLINE decodeList #-}
+decodeList :: DStream a -> DStream [a]
+decodeList decode =
+ go
+ where
+ go = do
+ tag <- word8
+ case tag of
+ 0 -> return []
+ 1 -> force ((:) <$> decode <*> go)
+ _ -> fail $ "decodeList: unexpected tag " ++ show tag
+
+-- | Use 'force' to ensure that the finally returned value is in WHNF. This
+-- reduces memory usage, as it flattens all the one-argument PAPS that were
+-- built up. Note that flattening too early may result in an increased
+-- runtime, as then some arguments are copied multiple times.
+{-# INLINE force #-}
+force :: DStream a -> DStream a
+force ds = DStream $ \k -> unDStream ds (\x -> x `seq` (k x))
+
+decodeWith :: DStream a -> S.ByteString -> Either String a
+decodeWith ds0 (S.PS fpbuf off len) = S.inlinePerformIO $ do
+ withForeignPtr fpbuf $ \pbuf -> do
+ let !ip0 = pbuf `plusPtr` off
+ !ipe = ip0 `plusPtr` len
+
+ err :: String -> Ptr Word8 -> IO (Either String a)
+ err msg ip = return $ Left $ msg ++
+ " (at byte " ++ show (ip `minusPtr` ip0) ++
+ " of " ++ show len ++ ")"
+
+ unexpectedEOI loc =
+ err ("unexpected end-of-input while decoding " ++ loc)
+
+ go :: Ptr Word8 -> DStreamRep a -> IO (Either String a)
+ go !ip ds = case ds of
+ DReturn x -> return $ Right x
+
+ DFail msg -> err msg ip
+
+ DWord8 k -> readN 1 $ \ip' -> do (W8# x) <- peek $ castPtr ip
+ go ip' (k x)
+
+ DWord16 k -> readN 2 $ \ip' -> do (W16# x) <- peek $ castPtr ip
+ go ip' (k x)
+
+ DWord32 k -> readN 4 $ \ip' -> do (W32# x) <- peek $ castPtr ip
+ go ip' (k x)
+
+ DWord64 k -> readN 8 $ \ip' -> do (W64# x) <- peek $ castPtr ip
+ go ip' (k x)
+
+ DWord k -> readN (sizeOf (undefined :: Word)) $ \ip' -> do
+ (W# x) <- peek $ castPtr ip
+ go ip' (k x)
+
+ DInt8 k -> readN 1 $ \ip' -> do (I8# x) <- peek $ castPtr ip
+ go ip' (k x)
+ DInt16 k -> readN 2 $ \ip' -> do (I16# x) <- peek $ castPtr ip
+ go ip' (k x)
+ DInt32 k -> readN 4 $ \ip' -> do (I32# x) <- peek $ castPtr ip
+ go ip' (k x)
+ DInt64 k -> readN 8 $ \ip' -> do (I64# x) <- peek $ castPtr ip
+ go ip' (k x)
+
+ DInt k -> readN (sizeOf (undefined :: Int)) $ \ip' -> do
+ (I# x) <- peek $ castPtr ip
+ go ip' (k x)
+
+ DFloat k -> readN (sizeOf (undefined :: Float)) $ \ip' -> do
+ (F# x) <- peek $ castPtr ip
+ go ip' (k x)
+
+ DDouble k -> readN (sizeOf (undefined :: Double)) $ \ip' -> do
+ (D# x) <- peek $ castPtr ip
+ go ip' (k x)
+
+ DChar k
+ | ip `plusPtr` 4 <= ipe -> do
+ let peek8 = peekByteOff ip
+ w0 <- peek ip
+ case () of
+ _ | w0 < 0x80 -> do
+ let !c# = chr1# w0
+ go (ip `plusPtr` 1) (k c#)
+
+ | w0 < 0xe0 -> do
+ w1 <- peek8 1
+ let !c# = chr2# w0 w1
+ go (ip `plusPtr` 2) (k c#)
+
+ | w0 < 0xf0 -> do
+ w1 <- peek8 1; w2 <- peek8 2
+ let !c# = chr3# w0 w1 w2
+ go (ip `plusPtr` 3) (k c#)
+
+ | otherwise -> do
+ w1 <- peek8 1; w2 <- peek8 2; w3 <- peek8 3
+ let !c# = chr4# w0 w1 w2 w3
+ go (ip `plusPtr` 4) (k c#)
+
+ | otherwise ->
+ go ip (unDStream (slowCharUtf8 ip) (\ !(C# c#) -> k c#))
+
+ DSlowWord8 ipErr locErr k
+ | ip < ipe -> do x <- peek $ castPtr ip
+ go (ip `plusPtr` 1) (k x)
+ | otherwise -> unexpectedEOI locErr ipErr
+ where
+ {-# INLINE readN #-}
+ readN :: Int
+ -> (Ptr Word8 -> IO (Either String a))
+ -> IO (Either String a)
+ readN n io =
+ let ip' = ip `plusPtr` n in
+ if ip' <= ipe
+ then io ip'
+ else unexpectedEOI ("reading " ++ show n ++ " bytes") ip
+
+ -- start the decoding
+ go ip0 (unDStream ds0 DReturn)
+{-
+{-# INLINE fastCharUtf8 #-}
+fastCharUtf8 :: Ptr Word8 -> State# RealWorld -> (# State RealWorld, Char# #)
+fastCharUtf8 ip = \s0 ->
+ case runIO (peek ip0) s0 of
+ (# s1, w0 #)
+ | w0 < 0x80 -> (# s1, chr1 w0 #)
+
+ | w0 < 0xe0 ->
+ case runIO (peekByteOff ip0 1) s1 of
+ (# s2, w1 #) -> (# s2, chr2 w0 w1 #)
+
+ | w0 < 0xf0 ->
+ case runIO (peekByteOff ip0 1) s1 of
+ (# s2, w1 #) -> case runIO (peekByteOff ip0 2) s2 of
+ (# s3, w2 #) -> (# s3, chr3 w0 w1 w2 #)
+
+ | otherwise ->
+ case runIO (peekByteOff ip0 1) s1 of
+ (# s2, w1 #) -> case runIO (peekByteOff ip0 2) s2 of
+ (# s3, w2 #) -> case runIO (peekByteOff ip0 3) s3 of
+ (# s4, w3 #) -> (# s4, chr4 w0 w1 w2 w3 #)
+-}
+
+slowCharUtf8 :: Ptr Word8 -> DStream Char
+slowCharUtf8 ip = do
+ w0 <- word8'
+ case () of
+ _ | w0 < 0x80 -> return (chr1 w0)
+ | w0 < 0xe0 -> chr2 w0 <$> word8'
+ | w0 < 0xf0 -> chr3 w0 <$> word8' <*> word8'
+ | otherwise -> chr4 w0 <$> word8' <*> word8' <*> word8'
+ where
+ word8' = slowWord8 ip "char (UTF-8)"
+ chr1 w0 = C# (chr1# w0)
+ chr2 w0 w1 = C# (chr2# w0 w1)
+ chr3 w0 w1 w2 = C# (chr3# w0 w1 w2)
+ chr4 w0 w1 w2 w3 = C# (chr4# w0 w1 w2 w3)
+
+slowWord8 :: Ptr Word8 -> String -> DStream Word8
+slowWord8 ip msg = DStream (\k -> DSlowWord8 ip msg k)
+
+{-
+data Res a = Res !a {-# UNPACK #-} !(Ptr Word8)
+
+data Buffer = Buffer {-# UNPACK #-} !(Ptr Word8) -- ^ First input byte
+ {-# UNPACK #-} !(Ptr Word8) -- ^ First byte after
+
+data ParseException = ParseException String {-# UNPACK #-} !(Ptr Word8)
+ deriving( Show, Typeable )
+
+instance Exception ParseException where
+
+newtype Parser a = Parser { unParser :: Buffer -> IO (Res a) }
+
+instance Functor Res where
+ {-# INLINE fmap #-}
+ fmap f (Res x ip) = Res (f x) ip
+
+instance Functor Parser where
+ fmap f = Parser . fmap (fmap (fmap f)) . unParser
+
+instance Applicative Parser where
+ {-# INLINE pure #-}
+ pure x = Parser $ \(Buffer ip _) -> return (Res x ip)
+
+ {-# INLINE (<*>) #-}
+ Parser fIO <*> Parser xIO = Parser $ \ !buf@(Buffer _ ipe0) -> do
+ Res f ip1 <- fIO buf
+ Res x ip2 <- xIO (Buffer ip1 ipe0)
+ evaluate (Res (f x) ip2)
+
+instance Monad Parser where
+ {-# INLINE return #-}
+ return = pure
+
+ {-# INLINE (>>=) #-}
+ Parser xIO >>= f = Parser $ \ !buf@(Buffer _ ipe0) -> do
+ Res x ip1 <- xIO buf
+ unParser (f x) (Buffer ip1 ipe0)
+
+ {-# INLINE fail #-}
+ fail msg = Parser $ \(Buffer ip _) -> throw $ ParseException msg ip
+
+
+requires :: Int -> Parser a -> Parser a
+requires n p = Parser $ \buf@(Buffer ip ipe) ->
+ if ipe `minusPtr` ip >= n
+ then unParser p buf
+ else throw $ (`ParseException` ip) $
+ "required " ++ show n ++
+ " bytes, but there are only " ++ show (ipe `minusPtr` ip) ++
+ " bytes left."
+
+
+{-# INLINE word8 #-}
+word8 :: Parser Word8
+word8 = Parser $ \(Buffer ip ipe) -> do
+ let ip' = ip `plusPtr` 1
+ if ip' < ipe
+ then do x <- peek ip
+ return (Res x ip')
+ else throw $ (`ParseException` (ip' `plusPtr` (-1))) $
+ "less than the one byte left"
+
+word8sSimple :: Parser [Word8]
+word8sSimple = do
+ tag <- word8
+ case tag of
+ 0 -> return []
+ 1 -> (:) <$> word8 <*> word8s
+ _ -> fail $ "word8s: unexpected tag " ++ show tag
+
+word8s :: Parser [Word8]
+word8s =
+ go []
+ where
+ go xs = do
+ tag <- word8
+ case tag of
+ 0 -> return (reverse xs)
+ 1 -> do x <- word8
+ go (x:xs)
+ _ -> fail $ "word8s: unexpected tag " ++ show tag
+
+runParser :: Parser a -> S.ByteString -> Either String a
+runParser p (S.PS fpbuf off len) = S.inlinePerformIO $ do
+ withForeignPtr fpbuf $ \pbuf -> do
+ let !ip = pbuf `plusPtr` off
+ !ipe = ip `plusPtr` len
+ (`catch` handler) $ do
+ Res x _ <- unParser p (Buffer ip ipe)
+ return (Right x)
+ where
+ handler :: ParseException -> IO (Either String a)
+ handler (ParseException msg _) = return $ Left msg
+
+-}
+
+------------------------------------------------------------------------------
+-- UTF-8 decoding helpers
+------------------------------------------------------------------------------
+
+chr1# :: Word8 -> Char#
+chr1# (W8# x#) = (chr# (word2Int# x#))
+{-# INLINE chr1# #-}
+
+chr2# :: Word8 -> Word8 -> Char#
+chr2# (W8# x1#) (W8# x2#) =
+ (chr# (z1# +# z2#))
+ where
+ !y1# = word2Int# x1#
+ !y2# = word2Int# x2#
+ !z1# = uncheckedIShiftL# (y1# -# 0xC0#) 6#
+ !z2# = y2# -# 0x80#
+{-# INLINE chr2# #-}
+
+chr3# :: Word8 -> Word8 -> Word8 -> Char#
+chr3# (W8# x1#) (W8# x2#) (W8# x3#) =
+ (chr# (z1# +# z2# +# z3#))
+ where
+ !y1# = word2Int# x1#
+ !y2# = word2Int# x2#
+ !y3# = word2Int# x3#
+ !z1# = uncheckedIShiftL# (y1# -# 0xE0#) 12#
+ !z2# = uncheckedIShiftL# (y2# -# 0x80#) 6#
+ !z3# = y3# -# 0x80#
+{-# INLINE chr3# #-}
+
+chr4# :: Word8 -> Word8 -> Word8 -> Word8 -> Char#
+chr4# (W8# x1#) (W8# x2#) (W8# x3#) (W8# x4#) =
+ (chr# (z1# +# z2# +# z3# +# z4#))
+ where
+ !y1# = word2Int# x1#
+ !y2# = word2Int# x2#
+ !y3# = word2Int# x3#
+ !y4# = word2Int# x4#
+ !z1# = uncheckedIShiftL# (y1# -# 0xF0#) 18#
+ !z2# = uncheckedIShiftL# (y2# -# 0x80#) 12#
+ !z3# = uncheckedIShiftL# (y3# -# 0x80#) 6#
+ !z4# = y4# -# 0x80#
+{-# INLINE chr4# #-}
View
137 src/Data/Blaze/Binary/ParamDecoding.hs
@@ -20,6 +20,7 @@ import qualified Data.Blaze.Binary.Decoding as D
import Control.Applicative
import Control.Exception
+import Control.Concurrent.MVar (MVar, newEmptyMVar, putMVar, takeMVar)
import Data.Typeable
import qualified Data.ByteString.Internal as S
@@ -28,6 +29,7 @@ import GHC.Ptr
import GHC.Word
import GHC.Exts
import GHC.IO (IO(IO))
+import GHC.Conc.Sync (forkIOWithUnmask)
import Foreign
@@ -68,10 +70,15 @@ instance Exception DecodingException where
-- calling pattern, which is precompiled in contrast to the 'stg_ap_nv'
-- calling pattern.
+-- Highly unsafe trick: primitive decoders are only called via an unknown call
+-- and never inlined. Pattern mathcing on their result makes sure that their
+-- evaluation took place. At least for the primitive ones that return unboxed
+-- values.
+
type PrimDecoder a = Ptr Word8 -> State# RealWorld -> (# State# RealWorld, Addr#, a #)
-type PrimDecoderWord = Ptr Word8 -> State# RealWorld -> (# State# RealWorld, Addr#, Word# #)
+type PrimDecoderWord = Addr# -> (# Addr#, Word# #)
type PrimDecoderInt = Ptr Word8 -> State# RealWorld -> (# State# RealWorld, Addr#, Int# #)
-type PrimDecoderChar = Ptr Word8 -> State# RealWorld -> (# State# RealWorld, Addr#, Char# #)
+type PrimDecoderChar = Addr# -> (# Addr#, Char# #)
type PrimDecoderFloat = Ptr Word8 -> State# RealWorld -> (# State# RealWorld, Addr#, Float# #)
type PrimDecoderDouble = Ptr Word8 -> State# RealWorld -> (# State# RealWorld, Addr#, Double# #)
@@ -128,18 +135,19 @@ decodersLE !fpbuf !ipe =
{-# INLINE word #-}
word :: forall a. Storable a => (Ptr a -> IO Word) -> PrimDecoderWord
- word peekAt = \ip0 s0 ->
- case nextPtr ip0 of
- ip1 | ip1 <= ipe -> case runIO (peekAt (castPtr ip0)) s0 of
- (# s1, W# x #) -> (# s1, getPtr ip1, x #)
+ word peekAt = \ip0a ->
+ let ip0 = Ptr ip0a in
+ case nextPtr (castPtr ip0 :: Ptr a) of
+ ip1 | ip1 <= ipe -> case runIO (peekAt (castPtr ip0)) realWorld# of
+ (# _, W# x #) -> (# getPtr ip1, x #)
| otherwise ->
- case runIO (tooFewBytes ip0) s0 of
- (# s1, W# x #) -> (# s1, getPtr ip0, x #)
+ case runIO (tooFewBytes ip0) realWorld# of
+ (# _, W# x #) -> (# getPtr ip0, x #)
{-# INLINE int #-}
int :: forall a. Storable a => (Ptr a -> IO Int) -> PrimDecoderInt
int peekAt = \ip0 s0 ->
- case nextPtr ip0 of
+ case nextPtr (castPtr ip0 :: Ptr a) of
ip1 | ip1 <= ipe -> case runIO (peekAt (castPtr ip0)) s0 of
(# s1, I# x #) -> (# s1, getPtr ip1, x #)
| otherwise ->
@@ -185,22 +193,22 @@ decodersLE !fpbuf !ipe =
-}
charUtf8 :: PrimDecoderChar
- charUtf8 = \ip0 s0 ->
+ charUtf8 = \ip0a -> let ip0 = Ptr ip0a in
if ip0 < ipe then
- case runIO (peek ip0) s0 of
+ case runIO (peek ip0) realWorld# of
(# s1, w0 #)
- | w0 < 0x80 -> (# s1, getPtr (ip0 `plusPtr` 1), chr1 w0 #)
+ | w0 < 0x80 -> (# getPtr (ip0 `plusPtr` 1), chr1 w0 #)
| w0 < 0xe0 && ip0 `plusPtr` 2 <= ipe ->
case runIO (peekByteOff ip0 1) s1 of
(# s2, w1 #) ->
- (# s2, getPtr (ip0 `plusPtr` 2), chr2 w0 w1 #)
+ (# getPtr (ip0 `plusPtr` 2), chr2 w0 w1 #)
| w0 < 0xf0 && ip0 `plusPtr` 3 <= ipe ->
case runIO (peekByteOff ip0 1) s1 of
(# s2, w1 #) -> case runIO (peekByteOff ip0 2) s2 of
(# s3, w2 #) ->
- (# s3, getPtr (ip0 `plusPtr` 3), chr3 w0 w1 w2 #)
+ (# getPtr (ip0 `plusPtr` 3), chr3 w0 w1 w2 #)
| ip0 `plusPtr` 4 <= ipe ->
case runIO (peekByteOff ip0 1) s1 of
@@ -209,14 +217,14 @@ decodersLE !fpbuf !ipe =
(# s4, w3 #) ->
let x = chr4 w0 w1 w2 w3 in
if I# x <= 0x10ffff
- then (# s4, getPtr (ip0 `plusPtr` 4), chr# x #)
- else runIOChr ip0 (throw (DecodingException ("invalid Unicode codepoint: " ++ show (I# x)) ip0)) s4
+ then (# getPtr (ip0 `plusPtr` 4), chr# x #)
+ else runIOChr ip0 (throw (DecodingException ("invalid Unicode codepoint: " ++ show (I# x)) ip0))
- | otherwise -> runIOChr ip0 (tooFewBytes ip0) s1
- else runIOChr ip0 (tooFewBytes ip0) s0
+ | otherwise -> runIOChr ip0 (tooFewBytes ip0)
+ else runIOChr ip0 (tooFewBytes ip0)
where
- runIOChr ip io s0 = case runIO io s0 of
- (# s1, C# c #) -> (# s1, getPtr ip, c #)
+ runIOChr ip io = case runIO io realWorld# of
+ (# s1, C# c #) -> (# getPtr ip, c #)
------------------------------------------------------------------------------
@@ -226,9 +234,8 @@ decodersLE !fpbuf !ipe =
-- | One decoding step. Note that we use a 'Ptr Word8' because the
-- 'stg_ap_pnv' calling patterns is not precompiled in GHC.
type DecodeStep a =
- Ptr Word8 -- ^ Next byte to read
- -> State# RealWorld -- ^ World state before
- -> (# State# RealWorld, Addr#, a #)
+ Ptr Word8 -- ^ Next byte to read
+ -> (# Addr#, a #)
-- ^ World state, new next byte to read, and decoded value
-- | A decoder for Haskell values.
@@ -242,43 +249,44 @@ newtype Decoder a = Decoder {
-- | Convert an 'IO' action to a 'Decoder' action.
{-# INLINE ioToDecoder #-}
ioToDecoder :: IO a -> Decoder a
-ioToDecoder (IO io) = Decoder $ \_ !(Ptr ip0) s0 -> case io s0 of
- (# s1, x #) -> (# s1, ip0, x #)
+ioToDecoder (IO io) = Decoder $ \_ !(Ptr ip0) -> case io realWorld# of
+ (# _, x #) -> (# ip0, x #)
-- | A 'DecodeStep' that fails with the given message.
failStep :: String -> DecodeStep a
-failStep msg ip0 s0 =
- case runIO (throw (DecodingException msg ip0)) s0 of
+failStep msg ip0 =
+ case runIO (throw (DecodingException msg ip0)) realWorld# of
-- unreachable, but makes the type checker happy.
- (# s1, x #) -> (# s1, getPtr ip0, x #)
+ (# _, x #) -> (# getPtr ip0, x #)
-- Instances
------------
instance Functor Decoder where
- fmap f = \(Decoder io) -> Decoder $ \pd ip0 s0 ->
- case io pd ip0 s0 of
- (# s1, ip1, x #) -> (# s1, ip1, f x #)
+ {-# INLINE fmap #-}
+ fmap = \f (Decoder io) -> Decoder $ \pd ip0 ->
+ case io pd ip0 of
+ (# ip1, x #) -> (# ip1, f x #)
instance Applicative Decoder where
{-# INLINE pure #-}
- pure x = Decoder $ \_ !(Ptr ip0) s0 -> (# s0, ip0, x #)
+ pure x = Decoder $ \_ ip0 -> (# getPtr ip0, x #)
{-# INLINE (<*>) #-}
- Decoder fIO <*> Decoder xIO = Decoder $ \pd ip0 s0 ->
- case fIO pd ip0 s0 of
- (# s1, ip1, f #) -> case xIO pd (Ptr ip1) s1 of
- (# s2, ip2, x #) -> (# s2, ip2, f x #)
+ Decoder fIO <*> Decoder xIO = Decoder $ \pd ip0 ->
+ case fIO pd ip0 of
+ (# ip1, f #) -> case xIO pd (Ptr ip1) of
+ (# ip2, x #) -> (# ip2, f x #)
instance Monad Decoder where
{-# INLINE return #-}
return = pure
{-# INLINE (>>=) #-}
- Decoder xIO >>= f = Decoder $ \pd ip0 s0 ->
- case xIO pd ip0 s0 of
- (# s1, ip1, x #) -> unDecoder (f x) pd (Ptr ip1) s1
+ Decoder xIO >>= f = Decoder $ \pd ip0 ->
+ case xIO pd ip0 of
+ (# ip1, x #) -> unDecoder (f x) pd (Ptr ip1)
{-# INLINE fail #-}
fail = Decoder . const . failStep
@@ -295,16 +303,45 @@ runDecoder p (S.PS fpbuf off len) = S.inlinePerformIO $ do
!ipe = ip0 `plusPtr` len
!pd = decodersLE fpbuf ipe
- (`catch` (handler ip0)) $ do
- x <- IO $ \s0 -> case unDecoder p pd ip0 s0 of
- (# s1, _, x #) -> (# s1, x #)
- return (Right x)
+ decodeFast = (handle (decodingException ip0)) $ do
+ x <- IO $ \s0 -> case unDecoder p pd ip0 of
+ (# _, x #) -> (# s0, x #)
+ return (Right x)
+
+ -- For deeply nested messages our decoder might overflow the
+ -- stack. We report this error politely as a decoding failure
+ -- instead of killing all the pure code above us. As a
+ -- stackoverflow is an asynchronous exception, we have to ensure
+ -- that we are not masked. Note that we are essentially allocating
+ -- a fresh stack for the decoding using 'forkIOWithUnmask' :-)
+ decodeLarge = do
+ mv <- newEmptyMVar
+ _ <- forkIOWithUnmask $ \unmask ->
+ handle (allExceptions mv) $
+ handle (stackOverflow mv) $
+ unmask $ putMVar mv =<< (Right <$> decodeFast)
+ res <- takeMVar mv
+ case res of
+ Left e -> throw e
+ Right x -> return x
+
+ if len < 1024 then decodeFast else decodeLarge
where
- handler :: Ptr Word8 -> DecodingException -> IO (Either String a)
- handler ip0 (DecodingException msg ip) = return $ Left $
+ decodingException :: Ptr Word8 -> DecodingException -> IO (Either String a)
+ decodingException ip0 (DecodingException msg ip) = return $ Left $
msg ++
" (at byte " ++ show (ip `minusPtr` ip0) ++ " of " ++ show len ++ ")"
+ stackOverflow :: MVar (Either e (Either String a)) -> AsyncException -> IO ()
+ stackOverflow mv StackOverflow = putMVar mv $ Right $ Left $
+ "stack overflow: the message of size " ++ show len ++
+ " may be nested too deeply."
+ stackOverflow _ e = throw e
+
+ allExceptions :: MVar (Either SomeException a) -> SomeException -> IO ()
+ allExceptions mv e = putMVar mv (Left e)
+
+
-- Decoder construction
-----------------------
@@ -332,14 +369,14 @@ prim = error "PDecoder: prim - implement"
--------------------
word8 :: Decoder Word8
-word8 = Decoder $ \pd ip0 s0 -> case pdWord8 pd ip0 s0 of
- (# s1, ip1, w #) -> (# s1, ip1, W8# w #)
+word8 = Decoder $ \pd !(Ptr ip0) -> case pdWord8 pd ip0 of
+ (# ip1, w #) -> (# ip1, W8# w #)
word8s = decodeList word8
string = decodeList char
-{-# INLINABLE decodeList #-}
+{-# NOINLINE decodeList #-}
decodeList :: Decoder a -> Decoder [a]
decodeList x = go
where
@@ -401,8 +438,8 @@ byteString = \len -> prim (`pdByteString` len)
-}
char :: Decoder Char
-char = Decoder $ \pd ip0 s0 -> case pdChar pd ip0 s0 of
- (# s1, ip1, x #) -> (# s1, ip1, C# x #)
+char = Decoder $ \pd ip0 -> case pdChar pd (getPtr ip0) of
+ (# ip1, x #) -> (# ip1, C# x #)
{-# INLINE getAddr #-}
getAddr :: Ptr a -> Addr#
View
120 src/Data/Blaze/Binary/StreamDecoding.hs
@@ -0,0 +1,120 @@
+{-# LANGUAGE UnboxedTuples, BangPatterns #-}
+-----------------------------------------------------------------------------
+-- |
+-- Module : Data.Blaze.Binary.Encoding
+-- Copyright : 2012, Simon Meier <iridcode@gmail.com>
+-- License : BSD3-style (see LICENSE)
+--
+-- Maintainer : Simon Meier <iridcode@gmail.com>
+-- Stability :
+-- Portability : portable
+--
+-- Stream based decoding of binary values.
+--
+-----------------------------------------------------------------------------
+module Data.Blaze.Binary.StreamDecoding (
+ benchWord8s
+ ) where
+
+import Data.Word
+import Control.Applicative
+
+import qualified Data.ByteString.Internal as S
+
+import Foreign
+
+------------------------------------------------------------------------
+
+-- | The representation for a stream of values to be serialized.
+data VStream =
+ -- VChar {-# UNPACK #-} !Char VStreamRep
+ -- | VWord {-# UNPACK #-} !Word VStreamRep
+ VWord8 {-# UNPACK #-} !Word8 VStream
+ -- | VWord16 {-# UNPACK #-} !Word16 VStreamRep
+ -- | VWord32 {-# UNPACK #-} !Word32 VStreamRep
+ -- | VWord64 {-# UNPACK #-} !Word64 VStreamRep
+ -- | VInt {-# UNPACK #-} !Int VStreamRep
+ -- | VInt8 {-# UNPACK #-} !Int8 VStreamRep
+ -- | VInt16 {-# UNPACK #-} !Int16 VStreamRep
+ -- | VInt32 {-# UNPACK #-} !Int32 VStreamRep
+ -- | VInt64 {-# UNPACK #-} !Int64 VStreamRep
+ -- | VFloat {-# UNPACK #-} !Float VStreamRep
+ -- | VDouble {-# UNPACK #-} !Double VStreamRep
+ -- | VInteger !Integer VStreamRep
+ -- | VByteString !S.ByteString VStreamRep
+ -- | VBuilder !B.Builder VStreamRep
+ | VFail String
+ | VEmpty
+
+newtype Decoder a = Decoder { unDecoder :: VStream -> (# a, VStream #) }
+
+toVStream :: S.ByteString -> VStream
+toVStream (S.PS fpbuf off len) =
+ go ip0
+ where
+ pbuf = unsafeForeignPtrToPtr fpbuf
+ ip0 = pbuf `plusPtr` off
+ ipe = ip0 `plusPtr` len
+
+ go !ip
+ | ip < ipe = S.inlinePerformIO $ do
+ w <- peek ip
+ touchForeignPtr fpbuf
+ return $ VWord8 w (go (ip `plusPtr` 1))
+ | otherwise =
+ VEmpty
+
+runDecoder :: Decoder a -> S.ByteString -> Either String a
+runDecoder d bs = case unDecoder d (toVStream bs) of
+ (# _, VFail msg #) -> Left msg
+ (# x, _ #) -> Right x
+
+instance Functor Decoder where
+ {-# INLINE fmap #-}
+ fmap = \f d -> Decoder $ \vs0 -> case unDecoder d vs0 of
+ (# x, vs1 #) -> (# f x, vs1 #)
+
+
+instance Applicative Decoder where
+ {-# INLINE pure #-}
+ pure = \x -> Decoder $ \vs -> (# x, vs #)
+
+ {-# INLINE (<*>) #-}
+ (<*>) = \fd xd -> Decoder $ \vs0 ->
+ case unDecoder fd vs0 of
+ (# f, vs1 #) -> case unDecoder xd vs1 of
+ (# x, vs2 #) -> (# f x, vs2 #)
+
+instance Monad Decoder where
+ {-# INLINE return #-}
+ return = pure
+
+ {-# INLINE (>>=) #-}
+ (>>=) = \md f -> Decoder $ \vs0 ->
+ case unDecoder md vs0 of
+ (# _, vs1@(VFail _) #) -> (# error "impossible", vs1 #)
+ (# m, vs1 #) -> unDecoder (f m) vs1
+
+ -- We store the failure in the remainder of the stream to piggy-back failure
+ -- detection on the pattern matching of the stream constructor.-
+ fail msg = Decoder $ \_ -> (# error "Decoder:fail: impossible", VFail msg #)
+
+{-# INLINE word8 #-}
+word8 :: Decoder Word8
+word8 = Decoder $ \vs0 -> case vs0 of
+ VWord8 w vs1 -> (# w, vs1 #)
+ _ -> (# error "impossible", VFail "expected Word8, but got something else" #)
+
+word8s :: Decoder [Word8]
+word8s = do
+ tag <- word8
+ case tag of
+ 0 -> return []
+ 1 -> (:) <$> word8 <*> word8s
+ _ -> fail $ "word8s: unexpected tag " ++ show tag
+
+
+benchWord8s :: S.ByteString -> [Word8]
+benchWord8s bs = case runDecoder word8s bs of
+ Left msg -> error msg
+ Right x -> x

No commit comments for this range

Something went wrong with that request. Please try again.