-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve ambiguities locally? #23
Comments
mlemerre ***@***.***> writes:
Hi,
I just ported a whole parser using pacomb and it is very nice.
Thanks!
I will profit of your remarks and this discussion to ameliorate the
documentation.
One thing that I don't understand well is how to deal with ambiguity. Specifically, the merge
function is attached to the %parser declaration, so as I understand
it, the whole file is parsed, and then ambiguities are resolved;
A representation of the grammar is build at runtime, translated to
parser combinators that are used for parsing. Ambiguïties are dealt with
only at parsing and locally, if you wish.
I cite the toy example of the documentation:
let%parser ***@***.*** (+.)] rec bin_seq =
() => 1.0
; (t1::bin_seq) 'a' (t2::bin_seq) => t1 *. t2
This grammar rule is highly ambiguous, and the ***@***.*** indication]
indicates how to deal with two parsing semantics for the same portion of
the parsed string. Here addition is not really something you will do in
practice, but gives a nice function computing Catalan numbers ;-)
You could more realistically have a "Or" constructor if the result of
parsing is an abstract syntax tree and your grammar is really ambiguous
(typically for parsing natural languages and to treat ambiguities
locally to avoid an explosion of the size of the parse tree)
let%parser ***@***.*** (fun x y -> Or(x,y))] ....
If you think your grammar in semantically unambiguous you could use
let%parser ***@***.*** (fun x y -> if eq x y then x else assert false)] ...
where eq is the equality you wish.
This is also a good choice to debug ambiguities if you think your
grammar should not be ambiguous at all. May be it should be the default
for pacomb, but it is hard to chose an suitable equality as OCaml
polymorphic equality does not work on closure.
but this can already be made
using parse_all_buffer.
The doc says this is not really a good idea, you will have rapidely an
exponential number of parse trees.
What I would prefer to do is to resolve ambiguites locally, i.e. being
able to attach merge function to rules, but I could not find how to do
that.
It was there: https://raffalli.eu/pacomb/pacomb/index.html#ambiguous
The only thing I found is partial_parse_buffer, but I don't think using directly would work correctly.
No this is only usefull if you do not want to parse the whole buffer
input. This has little to do with ambiguity.
Thanks again for you interest,
Christophe
…--
Christophe Raffalli
tél: +689 87 23 11 48
web: http://raffalli.eu
my mails should pass DKIM/SPF tests/mes messages doivent passer les tests DKIM/SPF
|
Thanks for your answer! I think my previous question was not very clear. If I only have one ambiguous recursive rule, the documentation makes it clear how to resolve the ambiguity:
Now, if I have mutually recursive rules, I would normally write:
But how can I now attach different merge functions to f and g? I think I can only attach a merge function to f. |
(By the way, the link in the github readme to the full documentation is broken) |
Hello
Thanks for your answer! I think my previous question was not very clear. If I only have one ambiguous recursive rule, the documentation makes it clear how to
resolve the ambiguity:
let%parser ***@***.*** f] rec f = ...
Now, if I have mutually recursive rules, I would normally write:
let%parser rec f = ...
and g = ...
let%parser ***@***.*** merge_f] rec f = ...
and ***@***.*** merge_g] g = ...
should work. I use that with ***@***.***. Did you try that ?
Cheers,
Christophe
…
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.
--
Christophe Raffalli
tél: +689 87 23 11 48
web: http://raffalli.eu
my mails should pass DKIM/SPF tests/mes messages doivent passer les tests DKIM/SPF
|
mlemerre ***@***.***> writes:
(By the way, the link in the github readme to the full documentation is broken)
Thanks, I fixed it.
… —
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.
--
Christophe Raffalli
tél: +689 87 23 11 48
web: http://raffalli.eu
my mails should pass DKIM/SPF tests/mes messages doivent passer les tests DKIM/SPF
|
I assume that you wrote
(github removes everything that looks like a mail addresse). Actually I tried:
which does not work. Only let %parser [@merge merge_f] works. If I underestood correctly, the merge function would then be attached to the rule f? In that case, I could reorder my grammer (but I would still have a problem if I want to attach two merge functions to two different rules). |
Actually, it is working. I must have been doing something wrong, I am very sorry. |
It seems that raising Give_up in the merge function discards one of the parsetrees. I wonder if this is expected behaviour it seems hard to predict which parse tree will be discarded. (I am using merge function to detect and report parsing ambiguities; failwith works here, but Give_up seemed convenient to report the location of the problem). |
Hello
Yes the semantics of give_up is to reject parse trees. If you do it in merge it is a bit strange. A nice solution would be using effect from 5.0. otherwise decorate you parse tree with positions.
I could actually have the default ambiguity message report positions. I probably only have access to the right position of the ambiguity, but it would be something better. To have the beginning of the ambiguous region your code must record it. Positions are not preserved by default because people that wants ultra fast parser would complain...
Cheers
Christophe
Le 1 février 2024 05:15:43 GMT-10:00, mlemerre ***@***.***> a écrit :
…It seems that raising Give_up in the merge function discards one of the parsetrees. I wonder if this is expected behaviour it seems hard to predict which parse tree will be discarded.
(I am using merge function to detect and report parsing ambiguities; failwith works here, but Give_up seemed convenient to report the location of the problem).
--
Reply to this email directly or view it on GitHub:
#23 (comment)
You are receiving this because you commented.
Message ID: ***@***.***>
|
mlemerre ***@***.***> writes:
It seems that raising Give_up in the merge function discards one of the parsetrees. I wonder if this is expected behaviour
it seems hard to predict which parse tree will be discarded.
Yes. Maybe an error in this case would be nice. Or at least write
something in the documentation.
(I am using merge function to detect and report parsing ambiguities; failwith works here, but Give_up seemed
convenient to report the location of the problem).
I just added a ***@***.***_with_pos] and much better for you
let _ = Grammar.set_debug_merge Format.std_formatter
You shoud call this before the compilation of the grammar.
This in on github ...
Using merge is expensive, so you do not want to do it when it is
unnecessary Therefore I find the above to be the best ?
… —
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.
--
Christophe Raffalli
tél: +689 87 23 11 48
web: http://raffalli.eu
my mails should pass DKIM/SPF tests/mes messages doivent passer les tests DKIM/SPF
|
The merge_with_pos would be perfect from my use case, where I use merge to display the possible parse trees (to ask the user to disambiguate). But you eventually removed it? I understand that computing pos is expensive, but I would have thought that passing around spos would not be... For context: my use case is to use case to detect ambiguities. The grammar describes complex types meant to be used by various people, and I believe that resolving all ambiguities in the grammar by giving arbitrary priorities to all constructs is bad. I think it is safer to only have the obvious priorities, and to warn the user when there are multiple parse trees for its file, asking him to parenthesize. I have just caught some errors in our tests using this design. This is why I needed a parser with support for ambiguous parsing, and pacomb worked very nicely. It is a great package which deserves to be more well known! |
Hi,
I just ported a whole parser using pacomb and it is very nice. One thing that I don't understand well is how to deal with ambiguity. Specifically, the merge function is attached to the %parser declaration, so as I understand it, the whole file is parsed, and then ambiguities are resolved; but this can already be made using parse_all_buffer. What I would prefer to do is to resolve ambiguites locally, i.e. being able to attach merge function to rules, but I could not find how to do that. The only thing I found is partial_parse_buffer, but I don't think using directly would work correctly.
Is there a way to solve this problem? Maybe by adding a and%parser rule suffices?
The text was updated successfully, but these errors were encountered: