-
Notifications
You must be signed in to change notification settings - Fork 631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More documentation on grammars and parsing #12717
Conversation
dev/doc/parsing.md
Outdated
|
||
TODO: creates wit_ variable--makes it possible to refer to when using alternative prodn syntax? | ||
|
||
TODO: best practice - use this or GRAMMAR EXTEND? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use GRAMMAR EXTEND, except when you know what you're doing, which is the unique privilege of ssr developers (and even them, sometimes they do strange stuff).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There must be more to the story. I see GRAMMAR EXTEND is used in 17 mlg files. Is part of this tech debt (e.g. in g_indfun.mlg
)? If so, I can say that. Also I can say that new tactics and commands should not be added with GRAMMAR EXTEND.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The funind instance looks like an oversight. This should be turned into a VERNAC ARGUMENT EXTEND
since there is not much more to it.
The typical use of GRAMMAR EXTEND
is for low-level complex syntactic constructs, typically when implementing a full-blown language grammar (e.g. gallina, LtacX, ssr). Not for the casual user, thus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. The commands in g_proofs.mlg
, g_vernac.mlg
and g_ltac.mlg
, inside GRAMMAR EXTEND
are OK because they're big extensions? And they provide the semantic information through other means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a nutshell, yes.
dev/doc/parsing.md
Outdated
END | ||
``` | ||
|
||
TODO: creates wit_ variable--makes it possible to refer to when using alternative prodn syntax? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wit variables are used by VERNAC / TACTIC EXTEND
that uses those arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, it seems tedious to add every new nonterminal with ARGUMENT EXTEND
instead of, say, using GRAMMAR EXTEND
. Is ARGUMENT EXTEND
always preferable to GRAMMAR EXTEND
? If not, when should it be used? Seems to me it might be a struggle to get everyone to use it, and considerable work to (perhaps) convert existing code to use it as well.
Also, FWIW, "ARGUMENT" doesn't seem like a very descriptive name for what it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is critical to use those macros. GRAMMAR EXTEND
is just about syntax, when ARGUMENT EXTEND
is about semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. How can I briefly explain which nonterminals it's critical for, because there are tons of nonterminals defined in GRAMMAR EXTEND
? If people understand the reason for the guideline they're more likely to follow it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except if you reimplement a tactic language, you shouldn't use GRAMMAR EXTEND
.
Updated in a new commit. If this looks OK, please approve. I expect @Zimmi48 will want to review the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work thanks! Here are my comments on the markdown part:
The newest documentation for camlp5 is available at https://camlp5.github.io/doc/htmlc/. I suggest you refer to it instead of the one hosted on gforge, especially given that the EOL of gforge has been announced.
I also suggest that you add a link to this new documentation from the index available in dev/README.md
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to the refman LGTM as well. I just suggest fixing the links to the camlp5 documentation to not point to gforge and I suggest adding direct links to the Coq sources in the master branch on GitHub (including the non-existent yet dev/doc/parsing.md
).
Can @ppedrot be the assignee for this PR? |
@Zimmi48 In |
Thanks for bringing this up. The link should be removed. The content of this file has moved to the contributing guide. |
@ppedrot Updated. I will squash after you approve. |
Squashed and updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, here are some comments.
Somehow, many features could be added to the grammar engine (automatic factorization, local precedences, local tokens, combining lexical and parsing analysis, ...). I don't know well the state-of-the-art in parsing, maybe are there tools or algorithms abroad that we could reuse or get inspiration from.
a few are defined directly in OCaml code. Since many developers have worked on the parser | ||
over the years, this code can be idiosyncratic, reflecting various design concepts. | ||
|
||
* The parser is a recursive descent parser that, by default, only looks at the next token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practise, it looks at the longuests sequence of terminals ahead (I don't know if there is a terminology for that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, IIRC it checks that the longest sequence of nonterminals in the current parse entry matches, so it handles more than 1 symbol of lookahead in that particular case, but it doesn't handle lookahead for some cases when 2 productions differ by a terminal vs a nonterminal or between 2 different nonterminals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think you are right.
By the way, did you see my little experiment at #12744? One day, we should try to ensure that infix symbols with no associativity have their no associativity respected.
Another question I'm wondering about is whether we should but NEXT
, SELF
or operconstr AT LEVEL XX
on the right-hard side of a notation recursive on its right-hand side, or instead rely on the associativity.
More generally, I'm less and less convinced that a global associativity per level is the right concept we need. Maybe should we consider separatedly the associating behavior on the left-hand side and on the right-hand side (knowing that having both side associative is excluded). By doing so, we could have levels which mixes right and no associativity, and levels which mixes left and no associativity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I hadn't seen #12744. I made some comments there.
I don't regularly check for new PRs. But feel free to include @jfehrle
in the PR if you want me to look at something.
- [Other components](#other-components) | ||
- [Parsing productions](#parsing-productions) | ||
- [Lookahead](#lookahead) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a way to have the table of contents automatically generated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be helpful, but I'm not aware there's anything that does that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in Markdown. That's a very simple and limited language.
``` | ||
|
||
On the other hand, `OPT` works just fine when the parser has already found the | ||
right production. For example `Back` and `Back <natural>` can be combined using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure we could improve the camlp5 engine to automatically do this expansion, i.e. at the time of inserting the rule in the tree, or even at runtime (at least when in the same entry).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how often we need or would want to handle this case. I included it in this write up because it was a simple example, not because I decided (or thought much about whether) the limitation really impacts users much. I think I'd first analyze the grammar, including the numerous edits that doc_grammar
makes for the doc and see how often this feature would be helpful. I'd also consider whether such a change would impact other improvements we may want to make to the parser--which requires having a wish list.
In a lot of the Coq code, we should think about how to make things simpler, more consistent and easier to understand/modify. And add new features when they give a clear user benefit.
|
||
This command doesn't display all nonterminals of the grammar. For example, | ||
productions shown by `Print Grammar tactic` refer to nonterminals `tactic_then_locality` | ||
and `tactic_then_gen` which are not shown and can't be printed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a branch which add supports for Print Grammar
of any entry. I'm not fully satisfied by it, though because collisions can happen, and entries should probably structured. E.g. an API with commands such as Print Grammar Tactic tactic_then_gen
or Print Grammar Constr fields_def
would already be better. Maybe would we need even stronger structuration.
The branch is https://github.com/herbelin/github-coq/pull/new/master+support-more-printable-grammar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd need to figure out how to handle locally-defined nonterminals that have the same name. How would we show such nonterminals to users in a non-confusing way? We could somehow require unique names or we'd need a way of qualifying names (ugh). Ltac2 has 35 or more such symbols while the rest of the grammar has maybe 6 or 7.
I'm not sure Print Grammar
is hugely useful in the long term. Once the grammar in the doc is updated and we update the mlgs to match that as much as possible, it's probably better for users to look in the doc. Of course, the doc won't show productions added by notations, but these additions appear only on specific nonterminals. The doc would not cover additions from plugins external to the Coq source tree either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a branch which add supports for Print Grammar of any entry
I think it's a bad idea to export such internals to end-users. A better way to do that would be to have a way to recursively extract all entries accessible from a few selected roots and display that to the user, with on-the-fly deambiguation of the entry names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not far from thinking that this is indeed too internal to be worth being exposed. However, it happens to be useful for debugging. Maybe should it be reserved to developers.
Yeah, though beyond the scope of this PR. We could create a new topic to get into more detail. Nonetheless, a few thoughts: It's not clear to me that the parsing algorithm significantly limits our ability to evolve Coq for the forseeable future. I'm sure some parts of the grammar can be improved. Here are some new capabilities that could be interesting:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good to go from my point of view except the two minor comments below.
I assume that the leftover TODO in the new document are on purpose.
Note that the productions printed by this command are represented in the form used by | ||
|Coq|'s parser (coqpp), which differs from how productions are shown in the documentation. | ||
Developer documentation for parsing is in | ||
`doc/parsing.md <http://github.com/coq/coq/blob/master/doc/parsing.md>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`doc/parsing.md <http://github.com/coq/coq/blob/master/doc/parsing.md>`_. | |
`dev/doc/parsing.md <http://github.com/coq/coq/blob/master/dev/doc/parsing.md>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, ready to go, thanks. The TODOs are intentional.
@jfehrle despite the TODOs, this is ready to be merged right? |
Yes, ready to merge, thanks. A few TODOs are OK in developer documentation. (I don't put user-visible TODOs in the user documentation.) |
Fixes: #12522 and adds some developer documentation of parsing.
Perhaps @ppedrot or other @coq/parsing-maintainers could help me resolve the
TODO
s in the developer documentation? Thanks.