Rewrite the parser #78
Comments
I would like to take on some subset of this parser rewrite once I am finished with withoutboats/notty#23 (never closer), unfortunately I will be back at my dayjob next week so I'm not sure how much time/energy I will have to spend on it after that. Would it be possible to break this work up into smaller chunks/issues that can be incrementally tackled and which would target a "1.x" branch on this repository? Those smaller-chunk'd issues could be grouped under a "1.0" milestone in this github issue interface. Before work starts on interface changes, more details about what you have in mind would be nice so that whoever picks up the work can proceed with confidence in the implementation. I'd actually be more than happy to break this issue up into smaller ones based on the current outline which you can then fill out with additional details in comments whenever you have time. Let me know if this approach sounds sane. |
Thanks for the interest @waynr! Awhile back I managed to rewrite the lexer at least, although I haven't gotten around to morphing that into a parser. I'd be totally fine breaking up the work here, and that branch would probably need some cleanup as well before pushing much more on it :( |
@alexcrichton I have a work-in-progress TOML parser here. It's nearing its first release. There are some things you might like about it:
It parses to a It does do duplicate key checks, invalid table checks, heterogeneous array checks, and invalid datetime checks on the fly. But duplicate key and invalid tables are required otherwise they would overwrite and invalidate key/values already in the map. Mixed array checks are very simple, I don't see any benefit of deferring this to a post-parse step. The datetime check is one that can be deferred. If this slows down the parser too much I can make this optional depending on a flag and move it to a post-parsing
I don't understand what you mean by 'opaque error type which contains source information directly'. Currently my parser returns itself due to the way parsing structs are implemented in nom (you can read about how I ended up with this implementation here). Right now you have to call
For the first release it will only be able to parse a document, print a parsed document, get values, get a key's child keys, and set values (and there are a few caveats with setting arrays and inline tables). Adding new key/values and new tables, deleting key/values and tables, and key modification (except for inline table key modification which is possible right now) are slated for the second release which I hope will come quickly after the first. |
Holy cow, awesome work @joelself! Some thoughts of mine:
I'm pretty excited about this, though, the ability for Cargo to modify a TOML document preserving formatting and comments would be huge! It would unlock quite a bit of functionality I've wanted to have for quite a while now :) Would you be interested in merging your work back into this repo? Or would you like to keep it separate for now? |
The external API has to be stable because I'm constantly changing how the internals work. Partly because I'm indecisive and partly because this is babby's first Rust project, so I find myself writing some code then looking at it a few weeks later and thinking "What idiot wrote this crap?" I should probably change my errors. At first I had them return a key or vector of keys (if a key was involved) and then a Value or vector of values and a line number. But I recently changed it to only return a line number and a key[s] that you have to look up in the parser to get the value[s]. Going back to returning the value with the error will be more consistent since every error type that involves a value will contain a value instead of just the errors that can't store their values in the internal map containing a value. I started the project from scratch, so it doesn't have anything in common with toml-rs. I don't know how I could merge into it. Are there any particular features or APIs you want? Because right now I'm just targeting features for cargo-bump (for 0.1.0) and then for cargo-edit (for 0.2.0, although 0.1.0 will enable |
Ah ok, was just curious! Some idea features I can think of for a library might be:
I think that's what I could think of at least? If you're interested in merging back here I can definitely try to help out, but if not then sounds good to me as well :) |
tomllib (I renamed toml_parser) is out now at version 0.1.1. It is currently limited to parsing documents, getting values, setting/changing values and getting subkeys of a key. This should be enough to get cargo-bump going. For the next set of features I'm targeting cargo-edit. |
Thanks for the update @joelself! Sounds like some great progress is being made :) |
I've rewritten the parser in the serde 0.9 rewrite: #137 It does everything here except preserve whitespace/comments, but the tokenizer at the lowest level now supports this, so I hope it'll be easier to prototype such support in the future. |
Fixed in spirit in #137 |
There's a number of pain points with the current parser:
Cow
to slice back to the original source and only unescape when necessary. Arrays and maps may be able to be based on an iterator?toml::Value
but instead some form of AST. This AST may not have validation like no duplicate keys, no redefinitions, or arrays are homogneous. That kind of verification can perhaps be delayed to another step.Result
with an opaque error type which contains source information directly.The text was updated successfully, but these errors were encountered: