Example parsers #14

Open
Geal opened this Issue Feb 26, 2015 · 57 comments

Projects

None yet
@Geal
Owner
Geal commented Feb 26, 2015 edited

We currently have a few example parsers. In order to test the project and make it useful, other formats can be implemented. Here is a list, if anyone wants to try it:

@thehydroimpulse

I'm writing a Thrift library for Rust that'll use Nom for both their IDL and the network protocol, so that can be another example (although in a different repo).

@Geal
Owner
Geal commented Apr 3, 2015

Nice idea, that will be useful! Please notify me when it is done, I will add a link in this list.

@filipegoncalves
Contributor

This looks interesting. Is anyone actively working on any of these parsers? I'd like to work on a few of these.

@Geal
Owner
Geal commented Apr 27, 2015

I have some code for a GIF one at https://github.com/Geal/gif.rs but it is hard to test, since the graphical tools in Piston change a lot.

You can pick any of them. Network packets may be the easiest, since they don't require a decompression phase.

I am using the gif example to see what kind of API can be built over nom. Most of the parsing example are done as one pass over the data, but often there is some logic on the side, and it is not easy to encode correctly.

@Geal
Owner
Geal commented May 5, 2015

@elij this is a great idea! Was it easy to do?

@elij
elij commented May 5, 2015

yup it's a great framework -- though I struggled a bit with eof so I borrowed some code from rust-config (https://github.com/elij/fastq.rs/blob/master/src/parser.rs#L69) -- is there a better solution?

@Geal
Owner
Geal commented May 5, 2015

yes, eof should be a parser provided by nom, I am just waiting for @filipegoncalves to send a PR 😉

@filipegoncalves
Contributor

Hah, sorry for my silence. I've been busy lately. I just sent a PR (#31).

I will be working on one of these example parsers as soon as I get some spare time. There are some great ideas in here!

@Keruspe
Contributor
Keruspe commented May 29, 2015

I might give tar a try

@nelsonjchen
Contributor

Does this check off PCAP?

https://github.com/richo/pcapng-rs

@Geal
Owner
Geal commented Jun 19, 2015

pcap-ng and pcap are two different formats, right? It seems the consensus now is to move everything to pcap-ng, though.

@TechnoMancer

I will try a FLAC parser, need to add quite a few things for it though.

@badboy
Contributor
badboy commented Jul 17, 2015

ISO8601 is done in https://github.com/badboy/iso8601 (I hope it's mostly correct.)

@Geal
Owner
Geal commented Jul 17, 2015

ok, it should be up to date. More to come 😄

@sbeckeriv

WARC file format released. https://crates.io/crates/warc_parser

@Geal
Owner
Geal commented Aug 24, 2015

@sbeckeriv great, thanks!

@porglezomp

It might be informative to try parsing the rust grammar with nom, if nobody has yet. In any case, I'd like to see a few programming languages on that list, since that's my use case.

@Geal
Owner
Geal commented Sep 15, 2015

@porglezomp programming languages examples would definitely be useful, but the Rust grammar might be a bit too much for the first attempt. Which other languages would you like to handle?

@porglezomp

Yeah, I'm aware of the scale problem of Rust. I don't want to write that one, but I think it's a good holy grail for any parser library written in Rust. I'd like to try parsing the Lua grammar first, I think.

I recommend adding to the list:

  • Programming Languages
    • Rust
    • Lua (I'll do this)
    • Python (or some other whitespace significant language)
    • C
@Geal
Owner
Geal commented Sep 15, 2015

ok, I added them to the list :)

@chriskrycho

You have INI marked as done; do you have a link to it? (I'd love to use this for some tooling I'm hoping to build in 2016; need a good non-trivial example for it, though.)

@chriskrycho

Thanks very much, @badboy!

@fbernier

I'll try to make the TOML parser very soon.

@Geal
Owner
Geal commented Nov 16, 2015

Actually, I think I should rewrite that INI parser, now that more convenient combinators are available.
Also, I should really work on that combinator for space separated stuff

@Geal
Owner
Geal commented Nov 16, 2015

@fbernier great! Please keep me posted!

@l0calh05t

Maybe add a simple example for trailing commas in lists? Python has those, but is quite complex. Can't think of a simple example though.

@johshoff

That IRC example is no longer using nom. The parser was moved into its own repository: https://github.com/Detegr/RBot-parser

@Geal
Owner
Geal commented Nov 17, 2015

@l0calh05t to parse something like [a,b,c,] or [a,b,c] ?
@johshoff fixed, thanks

@l0calh05t

@Geal yes, exactly

@Geal
Owner
Geal commented Nov 22, 2015

@l0calh05t for [a,b,c], you can parse with delimited!(char!('['), separated_list!( char!(','), alphabetic), char!(']')).
For [a,b,c,], you can have delimited!(char!('['), terminated!(alphabetic, char!(',')), char!(']')).

A parser that would handle both cases is much trickier.

@l0calh05t

Both is really the more interesting case. And what is needed for Python for example

@Keruspe
Contributor
Keruspe commented Nov 22, 2015

Could there be something like maybe_char!(',') which would read a char, consume it if it's ',' or backtrack if it isn't?

EDIT: actually that's probably what opt!(char!(',')) would do, so you just have to take the one that parses [a,b,c] and stick that before the ']' or am I missing something?

@l0calh05t

Problem is that wont work unless a look ahead of more than one character is added automatically

@Geal
Owner
Geal commented Nov 22, 2015

In fact, it is easier than I thought, but requires some work:

preceded!(
  char!('['),
  terminated!(
    separated_list!(
      char!(','),
      alphabetic
    ),
    terminated!(
      opt!(char!(',')),
      char!(']')
    )
  )
)

opt! will return an option of the result of its child parser (Some if success, None if failure), so it will accept the trailing comma.

@badboy
Contributor
badboy commented Nov 23, 2015

I wrote a simplistic bencode parser: nom-bencode.

Not sure if it covers everything (yet).

@passcod
passcod commented Nov 28, 2015

I've started a TOML parser as a learning project: https://github.com/passcod/noml

@sourrust
Contributor
sourrust commented Jan 9, 2016

At this point I'm ready to share my flac implementation as an example parser.

@joeblew99

Hey sourrust. Nom looks great way to do this.
I am interested in parsing different video formats with nom. If there is some existing rust kibs in this space that anyone knows then u could start porting some to nom. Worth a crack to see how it goes.

I am very curious about using the streaming capabilities of nom. For my use case I want to stream data between servers, manipulate frames, and then fan it back into the main stream.
I would love to get some feedback on some potential gotchas.
Doing this type of work should ultimately feedback into making nom better.

@tomjakubowski

Correct me if I'm wrong, but the linked Redis project doesn't seem to use nom.

@porglezomp

I agree, I checked the history of it's Cargo.toml and at no point was nom listed as a dependency. I'm not sure how it ended up on the list, but it looks like it should be taken off.

@badboy
Contributor
badboy commented Mar 21, 2016

It does in another branch, which is still not merged because time.

@thehydroimpulse

@Geal you can remove my Thrift library as an example as I'm no longer using Nom in it.

@joelself
Contributor

I've released a TOML parser. It doesn't let you modify everything possible in the document or create documents from scratch, but does correctly parse TOML, report errors, allow some modification and then output the document with comments and whitespace intact.

@moosingin3space
moosingin3space commented May 17, 2016 edited

I've started working on a parser for IP, TCP, UDP, and Ethernet headers. It is located at https://github.com/moosingin3space/pktparse-rs.
Warning: there is little to no documentation right now!

@lyze
lyze commented Jun 6, 2016

Java class file parser! It is part of a larger class project.

The parser uses helper macros based on #160 to get more backtracking support.

@pwoolcoc
Contributor

Not sure if it's worth putting here or not, but I'm using nom to parse strings for the tracery library I am writing for rust: https://github.com/pwoolcoc/tracery-rs

@jethrogb
Contributor

A subset of C, namely C literals and expressions: https://crates.io/crates/cexpr

@gz
gz commented Sep 7, 2016 edited

FYI I used nom to parse the linux perf data format (https://github.com/gz/rust-perfcnt/blob/master/src/linux/parser.rs) in case you want to add it. In comparison to most examples listed here it parses binary data.

Also, it's roughly 25x faster than an equivalent parser written in python ;)

@nickbabcock
Contributor

Boxcars is an example of a Rocket League replay parser with serde serialization. Let boxcars be a good example of Rust code using nom, and serde as extensive examples are hard to come by. While lacking user friendly error message -- among other issues, tests and documentation strive to be thorough.

@dtolnay
dtolnay commented Oct 31, 2016 edited

Yeah, I'm aware of the scale problem of Rust. I don't want to write that one, but I think it's a good holy grail for any parser library written in Rust.

As of version 0.10.0, syn is now able to parse practically all of Rust syntax. One of my test cases is to parse the entire github.com/rust-lang/rust repo into an AST and print it back out, asserting that the output is identical to the original.

I am technically not using nom but instead a fork which removes the IResult::Incomplete variant. I found that the extra macro code generated to handle Incomplete was more than doubling the compile time for something that I didn't even want. Nevertheless, the code is enough like nom that I think we can check off the box.

Example snippet to parse one arm of a match expression:

named!(match_arm -> Arm, do_parse!(
    attrs: many0!(outer_attr) >>
    pats: separated_nonempty_list!(punct!("|"), pat) >>
    guard: option!(preceded!(keyword!("if"), expr)) >>
    punct!("=>") >>
    body: alt!(
        map!(block, |blk| ExprKind::Block(BlockCheckMode::Default, blk).into())
        |
        expr
    ) >>
    (Arm {
        attrs: attrs,
        pats: pats,
        guard: guard.map(Box::new),
        body: Box::new(body),
    })
));
@Geal
Owner
Geal commented Nov 1, 2016

@dtolnay syn is an amazing example, thanks for your hard work :)

@Geal
Owner
Geal commented Nov 1, 2016

@dtolnay could I get your input on #356? It might fix your issues with compile times, so I'd like to get your thoughts on this.

@J-F-Liu
J-F-Liu commented Dec 23, 2016

I am writing a PDF library using nom to parse PDF syntax. Released v0.1.0 just now.
https://github.com/J-F-Liu/lopdf

@valarauca

So I've implemented a EDI parser for the ANS standard EDI for work with this. Awesome library really useful. Sadly that's owned by my employer.

I've started implementing an x64 assembler with nom. I'm really struggling with writing the parser. The main reason is register names have a lot of overlap, and are very short. For example r8, r8w, r11, and r12d. Ideally I want to map these to an enum. map!() makes this easy, but how can I match those terms in nom?

@Keruspe
Contributor
Keruspe commented Dec 24, 2016

I converted several "keys" to enum values in my brainfuck parser, might or might not be relevant to your needs. See the first parsers defined with "named!" https://github.com/Keruspe/brainfuck.rs/blob/master/src/parser.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment