various refactoring; GHC 9.4 support #29

raehik · 2022-10-20T10:38:53Z

This is kind of a playground for things I had been thinking about. It should not be merged without lots of history clean up (I think more likely it would happen piecemeal, in a few chunks). Please throw any criticism or thoughts at me here!

To-dos

re-add stateful parser (temporarily removed to make reorganising easier)
refactor export names using @AndrasKovacs 's suggestions

Things done

Support GHC 9.4; Remove `FlatParse.Internal.UnboxedNumerics`

The shims in UnboxedNumerics allowed working with unboxed machine integers across various compiler versions. This was accomplished with lots of CPP macros.

Lots of the explicit unboxing is removed. Where previously we did unary/binary operations on unboxed values, we now wrap them with their respective boxed type constructor (e.g. W8# :: Word8). There is little CPP, and it no longer impacts any types.

Break up main parser module into smaller submodules

Basic is ~1100 lines long. Many primitive parsers and combinators don't interact with each other, so putting them in their own modules isn't particularly hard, and lets us work with smaller modules.

Integer parsers were a reason for this: I've pulled them into a 350 line module with more docs. FlatParse.Basic still exports everything at the end of the day.

Does this potentially impact performance due to differences in how programs are optimized cross-module? Or are we OK because almost everything is INLINEd?

Lay clear naming conventions

Parsers are prefixed get* (this used to be split between none, any*, read*). parse is another candidate.
CPS parsers are prefixed with*
"value checking" parsers like Word8 -> Parser e () are getXOf (idea taken from the CBOR library)
combinators are anything (not really touched)
*Unsafe suffix means unsafe, *# suffix means some arguments are unboxed

The default "view" taken is bytewise, so take and takeRest now refer to the ByteString functions; takeString, takeRestString do it for Strings (this is swapped from current).

Maybe there should be a text-focused interface exported separately, which would export all the old names e.g. char, string, take using String.

raehik · 2022-10-26T12:39:08Z

From a very quick benchmark check, it seems we haven't lost performance with the removal of many manually unboxed operations. Though I noticed an unusual pattern of worsening performance on newer GHCs.

AndrasKovacs · 2022-10-29T13:23:44Z

I find the get in the naming scheme to be redundant. Instead of getX or getXOf we can have X and XOf. Anyway, currently I'm leaning more towards the old naming scheme because that's closer to *parsec naming. Not to say though that the old naming scheme couldn't be made more consistent, like:

X parses a concrete value with type X.
anyX parses any value with type X (X could also denote a "subtype" of a type that we don't want to enshrine in newtype-s, like when returning Integer from anyPosInteger)
withAnyX for CPS'd versions. This is a bit awkward, because Any is redundant here (because CPS is useless for concrete X value parsers), but perhaps better than withX for consistency reasons.

So readInteger should become anyPosInteger, for instance. It's a good question though whether every X should have all variations, when the variations make at least some sense. Currently we're far from that, but it could be also a bit too much clutter in the library to have that many functions.

In this approach, we would be keeping *parsec conventions but aiming for more consistency than in *parsec libraries.

Alternatively, we could try to just rethink API and naming from the ground up with no regard for *parsecs at all. In this case, we could consider less orthodox things like using classes for primitive parsers and making use of TypeApplications. For example,

class PrimParser a where
 get     :: a -> Parser e a -- concrete
 any     :: Parser e a -- any
 withAny :: (a -> Parser e b) -> Parser e b -- CPS any

Then any @Word8 would parse any Word8.

raehik · 2022-11-04T16:21:36Z

I decided on getX to better separate primitive parsers from the combinators (distinction is a little arbitrary). It conveniently side steps any possible export overlaps. And I went with parsing any value as the "default" probably because those are the parsers I use personally. But thinking about the whole module as a Parse.X prefix, X and anyX are sensible.

The readX and scanX functions were the big outliers. I redesigned the machine integer parsers to export lots more definitions, including scanX as getXOfUnsafe (with your recommendation, XUnsafe). readInteger was renamed to getAsciiDecimalInteger (or, anyAsciiDecimalInteger). I kind of don't mind having lots of exports, and would shy away from the type class approach -- I've done similar before in my own code for elegance, but for whatever reason it rubs me the wrong way for a low level library.

Thanks for the suggestions! I'll make a note to refactor using that "more consistent *parsec" scheme.

raehik · 2022-11-04T16:29:14Z

Is the modularization sensible and useful to you? I find it easier to understand the parser internals, and it lets us export internals/unsupported functionality easier, but otherwise it was mostly on a whim. (I am leaning towards explicitly re-exporting everything in the main parser module for a better Haddock experience.)

edit: and do we have to worry about performance changes due to splitting parsers up between modules? I want to say INLINE pragmas solve that, but I don't really know.

AndrasKovacs · 2022-11-04T20:42:15Z

I do like the modularization. I also like exporting everything in the parser module.

Indeed INLINE prevents performance issues arising from module splitting. Alternatively, we could use -fexpose-all-unfoldings which AFAIK neutralizes module splitting effects regardless of inline pragmas. I don't think it's needed though because everything that matters is already inline.

In the holidays, around December, I'll try to join the refactoring efforts here, and I'd like to overhaul tutorials, docs & error message infrastructure too.

AndrasKovacs · 2022-12-29T09:35:52Z

Hi, what's the status of this PR? I'd like to do a release sometime next month.

raehik · 2023-01-13T09:12:46Z

I'm actively looking at this again. It was already quite broken, now moreso and needs to be rewritten for ParserT. Which reminds me, I spoke to @dminuoso the other day about a potential shim for Basic & Stateful - that would be fantastic, since this change would otherwise be coming with lots of repeated definitions (and documentation!).

raehik · 2023-01-18T15:17:41Z

The discussion here is useful, but the code isn't -- I'm closing this and referencing it in the new PR #36 .

raehik mentioned this pull request Oct 20, 2022

GHC 9.4 support #25

Closed

raehik changed the title ~~partial work supporting GHC 9.4~~ various refactoring; GHC 9.4 support Oct 25, 2022

raehik force-pushed the ghc-9.4-tests branch from a9e5cfd to 02f7e06 Compare October 26, 2022 12:21

raehik added 2 commits October 26, 2022 13:45

partial work supporting GHC 9.4

8afaa68

fix benchmarks

a634c0d

raehik force-pushed the ghc-9.4-tests branch from e0ea7c6 to a634c0d Compare October 26, 2022 12:45

raehik added 9 commits October 26, 2022 13:53

CI: +GHC 9.4

3a00c4f

CI: build with LLVM

5aef355

split Addr# primitives into module

64e1b04

cleanup

82e5801

more cleanup

9b9bc66

more fiddling

f6e8091

last tweaking

19bd208

more fiddling

dde03d7

add efficient varint (protobuf) parser

e29f98b

raehik force-pushed the ghc-9.4-tests branch from c2697d7 to e29f98b Compare October 28, 2022 17:18

raehik mentioned this pull request Jan 13, 2023

module refactoring; GHC 9.4 support #36

Merged

5 tasks

raehik closed this Jan 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

various refactoring; GHC 9.4 support #29

various refactoring; GHC 9.4 support #29

raehik commented Oct 20, 2022 •

edited

raehik commented Oct 26, 2022 •

edited

AndrasKovacs commented Oct 29, 2022 •

edited

raehik commented Nov 4, 2022 •

edited

raehik commented Nov 4, 2022 •

edited

AndrasKovacs commented Nov 4, 2022 •

edited

AndrasKovacs commented Dec 29, 2022

raehik commented Jan 13, 2023

raehik commented Jan 18, 2023

various refactoring; GHC 9.4 support #29

various refactoring; GHC 9.4 support #29

Conversation

raehik commented Oct 20, 2022 • edited

To-dos

Things done

Support GHC 9.4; Remove FlatParse.Internal.UnboxedNumerics

Break up main parser module into smaller submodules

Lay clear naming conventions

raehik commented Oct 26, 2022 • edited

AndrasKovacs commented Oct 29, 2022 • edited

raehik commented Nov 4, 2022 • edited

raehik commented Nov 4, 2022 • edited

AndrasKovacs commented Nov 4, 2022 • edited

AndrasKovacs commented Dec 29, 2022

raehik commented Jan 13, 2023

raehik commented Jan 18, 2023

raehik commented Oct 20, 2022 •

edited

Support GHC 9.4; Remove `FlatParse.Internal.UnboxedNumerics`

raehik commented Oct 26, 2022 •

edited

AndrasKovacs commented Oct 29, 2022 •

edited

raehik commented Nov 4, 2022 •

edited

raehik commented Nov 4, 2022 •

edited

AndrasKovacs commented Nov 4, 2022 •

edited