Like Parsec, but two-way! (Experimental!)
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

FormatParser: Parsec-inspired (Simultaneous) Parsing & Formatting

Parsec and friends makes writing parsers in Haskell a lovely experience. But, if you're like me, you find yourself irate with the duplication of effort that comes with writing separate formaters. After all, they both convey essentially the same information!

The following, from Real World Haskell is one of my favorite Parsec examples:

csvFile = endBy line eol
line = sepBy cell (char ',')
cell = many (noneOf ",\n")
eol = char '\n'

Beautiful! We just said what a csvFile was and had a parser! But there is certainly enough information there to format the data as well. Why can't we?

With FormatParser you can!

import Text.FormatParser.Primitives

csvFile = endBy line eol
line = sepBy cell (char ',')
cell = many (noneOf ",\n")
eol = char '\n'
> "abc,def,egh\nfoo,bar,blah\n" `parseBy` csvFile 
Just [["abc","def","egh"],["foo","bar","blah"]]
> [["abc","def","egh"],["foo","bar","blah"]] `formatBy` csvFile
Just "abc,def,egh\nfoo,bar,blah\n"

FormatParser also provides really cool binary parsing tools. But we'll need to get used to working with FormatParser Monads before we get to that...

Monadic Examples and Friends

So, how do these work as monads?

int :: FormatParser Char Int Int
int = do
    n <- show =|= many digit
    return (read n)
> "123" `parseBy` int
Just 123
> 123 `formatBy` int
Just "123"

For the most part, this will look familiar to users of Parsec. The only new thing is show =|=. This is for describing how the input value should be converted into the input value of that component of the parser (in this case, the many digits we want to show correspond to showing the input integer). This may be more clear if we wrote it as:

int :: FormatParser Char Int Int
int = do
    n <- (\inVal -> show inVal) =|= many digit
    return (read n)

Read it as many digit being equal to showing the input value.

From here we can make something more interesting. For example, let's parse a two-tuple of numbers:

twoTuple :: FormatParser Char (Int, Int) (Int, Int)
twoTuple = do
    char '('
    -- first number parse corresponds to first
    -- component of input
    a <- fst =|= int
    char ','
    -- first number parse corresponds to second
    -- component of input
    b <- snd =|= int
    char ')'
    return (a,b)
> "(1, 3)" `parseBy` twoTuple
Just (1,3)
> (1, 3) `formatBy` twoTuple
Just "( 1 , 3 )"

Now let's consider a strange format where a list of comma separated numbers where the first one tells us how many of the others we should parse.

strange :: FormatParser Char [Int] [Int]
strange = do
    -- The first number corresponds to the number of values we'll parse,
    -- the length of our input value
    len  <- length =|= int
    -- Then, using our friend from Control.Monad, we loop over those.
    vals <- forM [0 .. len - 1] $ \n -> do
        -- The comma separator
        char ','
        -- And then the value, which corresponds to the 
        -- value in position n of the input
        (!! n) =|= int
    return vals
> "3,23,45,21,5" `parseBy` strange
Just [23,45,21]
> [2^n | n <- [0..4]] `formatBy` strange 
Just "5,1,2,4,8,16"

Note on Style: If you need no information from the input value, make it polymorphic instead of (). For example:

-- Good
whitespace :: FormatParser Char a String
whitespace = const " " =|= many (oneOf " \t\n")

-- bad
char :: Char -> FormatParser Char () Char
char c = const c =|= oneOf [c]

The reason is that one wants to be able to write things like

    char ','

Instead of

    const () =|= whitespace
    const () =|= char ','

On the other hand, when you need an input of a parser with zero information, use a (). This avoids second rank types and makes them more flexible. For example, sepBy :: FormatParser s i o -> FormatParser s () o2 -> FormatParser s [i] [o].

Binary Parsing

Text.FormatParser.Binary provides a number of useful binary parsing primitives: bin8, bin32le (little Endian), bin32be (big Endian)... These are all polymorphic because of the different things that can be represented in 8 bits or 32 bits. If you want to get a 32 bit Int

    foo :: Int <- bin32be

and if you want a Float

    foo :: Float <- bin32be

For a more specific example, lets write a binary STL parser.

type Point = (Float, Float, Float)
type Triangle = (Point, Point, Point)

point :: FormatParser ByteString Point Point
point = manyT3 $ bin32le

bstl :: FormatParser ByteString [Triangle] [Triangle]
bstl = do
    header :: [Int]      <- (const [0..80]) =|= manyN 80 bin8
    len    :: Int        <- length =|= bin32le
    tris   :: [Triangle] <- manyN len $ do
                        normal   <- const (0,0,0) =|= point
                        triangle <- manyT3 point
                        unknown  <- const (0::Int) =|= bin16le
                        return triangle
    return tris