-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
various refactoring; GHC 9.4 support #29
Conversation
From a very quick benchmark check, it seems we haven't lost performance with the removal of many manually unboxed operations. Though I noticed an unusual pattern of worsening performance on newer GHCs. |
I find the
So In this approach, we would be keeping *parsec conventions but aiming for more consistency than in *parsec libraries. Alternatively, we could try to just rethink API and naming from the ground up with no regard for *parsecs at all. In this case, we could consider less orthodox things like using classes for primitive parsers and making use of class PrimParser a where
get :: a -> Parser e a -- concrete
any :: Parser e a -- any
withAny :: (a -> Parser e b) -> Parser e b -- CPS any Then |
I decided on The Thanks for the suggestions! I'll make a note to refactor using that "more consistent *parsec" scheme. |
Is the modularization sensible and useful to you? I find it easier to understand the parser internals, and it lets us export internals/unsupported functionality easier, but otherwise it was mostly on a whim. (I am leaning towards explicitly re-exporting everything in the main parser module for a better Haddock experience.) edit: and do we have to worry about performance changes due to splitting parsers up between modules? I want to say |
I do like the modularization. I also like exporting everything in the parser module. Indeed In the holidays, around December, I'll try to join the refactoring efforts here, and I'd like to overhaul tutorials, docs & error message infrastructure too. |
Hi, what's the status of this PR? I'd like to do a release sometime next month. |
I'm actively looking at this again. It was already quite broken, now moreso and needs to be rewritten for |
The discussion here is useful, but the code isn't -- I'm closing this and referencing it in the new PR #36 . |
This is kind of a playground for things I had been thinking about. It should not be merged without lots of history clean up (I think more likely it would happen piecemeal, in a few chunks). Please throw any criticism or thoughts at me here!
To-dos
Things done
Support GHC 9.4; Remove
FlatParse.Internal.UnboxedNumerics
The shims in
UnboxedNumerics
allowed working with unboxed machine integers across various compiler versions. This was accomplished with lots of CPP macros.Lots of the explicit unboxing is removed. Where previously we did unary/binary operations on unboxed values, we now wrap them with their respective boxed type constructor (e.g.
W8# :: Word8
). There is little CPP, and it no longer impacts any types.Break up main parser module into smaller submodules
Basic
is ~1100 lines long. Many primitive parsers and combinators don't interact with each other, so putting them in their own modules isn't particularly hard, and lets us work with smaller modules.Integer parsers were a reason for this: I've pulled them into a 350 line module with more docs.
FlatParse.Basic
still exports everything at the end of the day.Does this potentially impact performance due to differences in how programs are optimized cross-module? Or are we OK because almost everything is INLINEd?
Lay clear naming conventions
get*
(this used to be split between none,any*
,read*
).parse
is another candidate.with*
Word8 -> Parser e ()
aregetXOf
(idea taken from the CBOR library)*Unsafe
suffix means unsafe,*#
suffix means some arguments are unboxedThe default "view" taken is bytewise, so
take
andtakeRest
now refer to theByteString
functions;takeString
,takeRestString
do it forString
s (this is swapped from current).Maybe there should be a text-focused interface exported separately, which would export all the old names e.g.
char
,string
,take
usingString
.