-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Playing with ideas on how to better componentize internal parsing #127
Conversation
Maybe to finish a thought on what the design would end up being:
|
I keep going back and forth in the desire to componentize things in Parsers.jl, but also make sure we have something relatively static for CSV.jl so we can try to achieve a good workflow there where the Parsers.jl code would almost never recompiled. |
Ok, this code actually parses Ints/floats now. And pretty fast too! Only a 10-20% regression w/o any code inspection/tuning, so that's pretty good IMO. |
Codecov ReportBase: 89.52% // Head: 90.98% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #127 +/- ##
==========================================
+ Coverage 89.52% 90.98% +1.45%
==========================================
Files 9 9
Lines 2445 1608 -837
==========================================
- Hits 2189 1463 -726
+ Misses 256 145 -111
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Ok @nickrobinson251, this is passing all tests for me locally, and passes CSV.jl tests with this PR. |
I haven't even tried to run this code yet, so very much WIP, but am
pretty much done for today and maybe the next few days, so wanted to
put what I have up in case anyone else wants to take a look.
The main new code is all in the
src/components.jl
file, where I trya functional "layer" approach for the different types of parsing we're doing,
delimited, quoted, strip whitespace, etc. Things I like:
custom "stacks" by including/excluding certain components; this would solve
the current
xparse
vsxparse2
awkwardnessfile of code, and it's factored in such a way that strings.jl now uses a
few pieces as needed and we avoid the tons of duplicated code we had before
more easily than the old design where everything was a big monolith
reviving the precompile efforts. I think a big part of the challenge is
just the sheer size of the
xparse
function body.Things I don't like:
Parsers.Options
object to pass around the various layer configurations (delim, oq, cq, etc.),
and then the layers are separate functions. Like, if I didn't include/want
a
delim
thedelimiter
layer could still be included and vice versa. Ifthe
delimiter
layer wasn't included in a stack, theParsers.Options
stillhas the
delim
field. At first, I tried making a separateDelimiter
structwith just the delimiter-related
Parsers.Options
fields and having that bethe
delimiter
function, but that means you can't have a "static stack"since building the stack would rely on knowing all the layer options. Keeping
the configs in
Options
means we could theoretically have a compiled-once,reused-forever parsing stack, but like I said, it feels smelly to need both.
if you tried an order other than
delimiter(quoted(sentinel(typeparser)))
I wouldn't be surprised if it just didn't work. It should be fine to exclude
certain layers, but it just doesn't feel like we quite have the flexibility
I had imagined we'd have by componentizing.
to just have a single static stack function that we compiled once and layers
would just skip themselves if needed (like if
stripwhitespace=false
, thewhitespace
layer would still be included, but would just checkopts.stripwhitespace
and skip if needed. It's probably better overall foravoiding any parsing layer-stack recompilation costs. But it also feels a
little lame since we're basically not taking advantage of the componentizing
at all. I had also imagined letting users compose their own parsing "stacks"
for custom parsing scenarios, but as I said, I'm not sure if the components
are too brittle to compose very well. Probably worth experimenting more on this.
I haven't looked at performance yet (or even run this code, as mentioned).