-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] How do I combine these parsers? #15
Comments
Apologies, this was a case of rubber duck debugging. I went with the following (not prime-time ready, just to demonstrate the idea):
|
Thanks for the follow-up! Just to throw another idea in there - when I find I'm trying to do context-sensitive parsing like this, I usually fall back to trying to use the parser just for the underlying grammar - i.e. splitting the options out, in this case - and then do the validation as a completely separate second step on the results of the parse. (Hope this makes sense.) Cheers, |
I'm going to write some thoughts here, my objective being:
I invite you @nblumhardt to have a look at what I've done and validate my thinking. This is optional as I know we are all quite busy, but I'm hoping that we both get something from it, I will get a better understanding how to use your library, and you'd have a chance for another perspective and possibly opportunities for improvements. Having said that I do fully appreciate if there are more important things in your life right now. In the next few minutes I'll start posting my thought below in this thread. This is the repository with my experiment that I'm going to walk through below. It can be referred to as something working together in case certain piece from the explanation below do not make sense. |
So I have a practical task at hand which I wanted to use to try out Superpower. My requirements can be sum up as following:
I tried not to over-complicate this description, yet I wanted the task to be non-trivial. I understand that these requirements are quite specific, and it is not likely that some one else would need to do the same same thing, I did not try to be generic. I tried to leverage Superpower for something non-trivial. Overall I'm quite please with the result, the whole thing takes about 150 lines excluding comments and except for one method is quite easy to understand. Here is how I approached it. |
Let's start with SwitchDescription class:
We are going to be passing an array of these to our command line parser. Most of the fields are self-explanatory and logically follow from the requirements. I'll talk a bit more about
Those directly map to the for value switches from the requirements and the binary switch. A program using the parser will probably also have an
The
|
So far we talked about stuff outside of the CommandLineParser, Those are pretty standard, now let's see how we can design the parser with help of Superpower. We face a particular challenge here Tokenizer/Parser model of Superpower assumes that we are tokenizing and parsing some continuous test. In our case we don't have that test, instead we are passed a string array of The two passes are still useful. We can view it like this. Each The paragraph above defines what our parser (second pass) will look like. In order for it to do its job each incoming token (corresponding to each of Defining ArgumentType is easy, based on the discussion above. It will look like this:
Note, that |
Now we need something that we can parse our each single
From the example above it distills to the following class that we call
This is quite straight forward. We can have zero or more binary switches, we can have zero or one value switch, and in case if there are zero binary switches and zero value switches, an instance of this class would signify a value (that is not a switch, but a value of a preceding switch). Let's codify this relationship, by adding a property
Hopefully, what's going on right above is obvious, it was just explained in the previous paragraph. The only unknown part is
|
Now when we have a good idea what our Since Superpower is parser combinator it makes sense to start with defining partial parsers that we will combine into the final parser. Now the most tricky part of this discussion follows. First, I want to construct a parser, that would parse a part of Now with short names this is not too difficult we can use
If it did, we'd have slightly easier time. As it is let us define a null parser:
We will use the parser as the seed (source) in Linq Aggregate method, to combine a list of parsers for each of the long name with Another peculiarity of the next method we'll write is that it needs to return separate parsers for binary and non binary switches. This is so that we could use those separately for constructing a rule that allows only one value switch but amy number of binary switches. With that in mind here is the code to achieve that:
Unfortunately this code is a bit difficult to parse, so I'm open for any improvement suggestions. What's important here is that by providing different combinations of |
For the next step let's tackle the two parsers for short names. We need to construct a rule, that allow as to have zero or more binary switches and zero or one value one. Here is how we do it:
This is much easier to understand than the previous piece of code. First, we use our previous function to get the binary and the value parsers, and then we combine those in the rule we have in the requirements. Finally we make sure that the input includes the switch signs Dealing with long names is even easier:
Finally, we combine each pair like this:
These to parser will be used in the final tokenizer to parse an |
Now we are ready to write the tokenizer code:
Again, this is pretty straightforward. One thing that you'll notice is that Superpower won't let you crate a token based on anything but TextSpan. I, frankly, would want Token to have a generic parameter that would allow any type of the underlying token, not just text. Because I cannot pass already parsed The tokenizer will give us a list of tokens for our parser, and the type of these tokens will be of |
We are getting to the end. We will need to helper methods. First of those two should really be part of Superpower:
This is similar to The second method is for getting use if the
|
The final part of the solution is the parser:
The above is also more or less straightforward. Note how we use Now we are done. We should chain tokenizer and parser for ease of use like this:
|
@nblumhardt and I'm done. I'd appreciate your opinion. The entirety of the code is here.
Thank you in advance, appreciate you time if you could have a look at this, I know it's a big ask ;) |
Hi Andrew! I'd love to dig into this - working on Superpower is a lot of fun, and it definitely does need thoughtful input like this - but I'm short of the time to properly digest such an enormous thread right now :-) Perhaps, leaving this here for reference and for others to explore, we could consider one smaller point at a time? (Didn't intend to close this earlier, sorry - comment button is badly placed on GitHub :-)) |
Struggling to move anything forward for you on this one, may have to pass - but hope it's all going well! |
@nblumhardt Was not really holding my breath;) Thank you! |
Some further thoughts here:
In order for good error reporting which is the major design goal of superpower, you have to have TextSpan data, so that the error position (line/column) can be tracked. Your normal tokenizer will usually build a It appears that the result of the Recoginzer parsing does not have to be a In this case though A solution to this could be a new
One possible solution is to use a proposed
Again,
So the thing is that you only need |
@AndrewSav I'm pretty thankful for your stream of conciousness. I was looking to see if anyone had implemented a command-line parser library using SuperPower, as the whole tokenization feature seems like it would allow for people to extend it with varying grammars. And the parser combinator feature would allow for nesting grammars. |
Try only makes sense for built-in parsers for identifiers, reserved symbols, operators, and reserved operators (e.g. for future use). Arbitrary lookahead is not a good default, because it increases run-time cost and makes analyzing performance of parser combinator-based grammars more difficult. The risk of looking lookahead with Or is you could end up with ambiguous parse forest. To provide some lookahead, the parser combinators I mentioned are oftened called lexeme parsers because they only require one lookahead. |
An example would be helpful, otherwise it's a bit too abstract. I'm sure there are examples, I just did not come across any in anything I've done so far with Superpower.
Superpower is never ambiguous. It always prefers the first valid match.
Which parser combinators did you mention? I must have missed that. |
Let's assume I have a Group 1 of parser a,b,c and d and Group 2 of parser w,x,y and z.
I would like to combine these parses so that the resulting parser would parse any combination of parsers from group 1 and exactly 1 parser from group 2 in any order.
Is this possible?
It is quite possible that I'm attacking the problem from a wrong angle (XY-problem), here is what I'm really trying to solve.
I'd like to parse a command line switch that starts with
-
or/
and then a number of options follows for example-abwdc
among these options there could be as many as desired binary ones (that is a,b,c or d, that do not require a parameter to follow) but only one that requires a parameter (w,x,u or z). The options can be specified in any order.The text was updated successfully, but these errors were encountered: