More documentation on grammars and parsing #12717

jfehrle · 2020-07-21T07:01:55Z

Fixes: #12522 and adds some developer documentation of parsing.

Perhaps @ppedrot or other @coq/parsing-maintainers could help me resolve the TODOs in the developer documentation? Thanks.

dev/doc/parsing.md

ppedrot · 2020-07-21T11:39:09Z

dev/doc/parsing.md

+
+  TODO: creates wit_ variable--makes it possible to refer to when using alternative prodn syntax?
+
+  TODO: best practice - use this or GRAMMAR EXTEND?


Don't use GRAMMAR EXTEND, except when you know what you're doing, which is the unique privilege of ssr developers (and even them, sometimes they do strange stuff).

There must be more to the story. I see GRAMMAR EXTEND is used in 17 mlg files. Is part of this tech debt (e.g. in g_indfun.mlg)? If so, I can say that. Also I can say that new tactics and commands should not be added with GRAMMAR EXTEND.

The funind instance looks like an oversight. This should be turned into a VERNAC ARGUMENT EXTEND since there is not much more to it.

The typical use of GRAMMAR EXTEND is for low-level complex syntactic constructs, typically when implementing a full-blown language grammar (e.g. gallina, LtacX, ssr). Not for the casual user, thus.

OK. The commands in g_proofs.mlg, g_vernac.mlg and g_ltac.mlg, inside GRAMMAR EXTEND are OK because they're big extensions? And they provide the semantic information through other means?

In a nutshell, yes.

ppedrot · 2020-07-21T11:39:51Z

dev/doc/parsing.md

+  END
+  ```
+
+  TODO: creates wit_ variable--makes it possible to refer to when using alternative prodn syntax?


wit variables are used by VERNAC / TACTIC EXTEND that uses those arguments.

IIUC, it seems tedious to add every new nonterminal with ARGUMENT EXTEND instead of, say, using GRAMMAR EXTEND. Is ARGUMENT EXTEND always preferable to GRAMMAR EXTEND? If not, when should it be used? Seems to me it might be a struggle to get everyone to use it, and considerable work to (perhaps) convert existing code to use it as well.

Also, FWIW, "ARGUMENT" doesn't seem like a very descriptive name for what it does.

It is critical to use those macros. GRAMMAR EXTEND is just about syntax, when ARGUMENT EXTEND is about semantics.

OK. How can I briefly explain which nonterminals it's critical for, because there are tons of nonterminals defined in GRAMMAR EXTEND? If people understand the reason for the guideline they're more likely to follow it.

Except if you reimplement a tactic language, you shouldn't use GRAMMAR EXTEND.

jfehrle · 2020-07-22T19:17:06Z

Updated in a new commit. If this looks OK, please approve. I expect @Zimmi48 will want to review the .rst file changes.

Zimmi48

Great work thanks! Here are my comments on the markdown part:

The newest documentation for camlp5 is available at https://camlp5.github.io/doc/htmlc/. I suggest you refer to it instead of the one hosted on gforge, especially given that the EOL of gforge has been announced.

I also suggest that you add a link to this new documentation from the index available in dev/README.md.

dev/doc/parsing.md

Zimmi48

The changes to the refman LGTM as well. I just suggest fixing the links to the camlp5 documentation to not point to gforge and I suggest adding direct links to the Coq sources in the master branch on GitHub (including the non-existent yet dev/doc/parsing.md).

doc/sphinx/user-extensions/syntax-extensions.rst

Zimmi48 · 2020-07-23T08:25:06Z

Can @ppedrot be the assignee for this PR?

jfehrle · 2020-07-23T18:05:11Z

@Zimmi48 In doc/README.md the link to MERGING.md is broken, but I can't figure out what it should be.

Zimmi48 · 2020-07-23T18:35:37Z

Thanks for bringing this up. The link should be removed. The content of this file has moved to the contributing guide.

jfehrle · 2020-07-23T18:43:35Z

@ppedrot Updated. I will squash after you approve.

jfehrle · 2020-07-23T23:43:11Z

Squashed and updated.

herbelin

Hi, here are some comments.

Somehow, many features could be added to the grammar engine (automatic factorization, local precedences, local tokens, combining lexical and parsing analysis, ...). I don't know well the state-of-the-art in parsing, maybe are there tools or algorithms abroad that we could reuse or get inspiration from.

herbelin · 2020-07-22T14:26:58Z

dev/doc/parsing.md

+  a few are defined directly in OCaml code.  Since many developers have worked on the parser
+  over the years, this code can be idiosyncratic, reflecting various design concepts.
+
+* The parser is a recursive descent parser that, by default, only looks at the next token


In practise, it looks at the longuests sequence of terminals ahead (I don't know if there is a terminology for that).

Yeah, IIRC it checks that the longest sequence of nonterminals in the current parse entry matches, so it handles more than 1 symbol of lookahead in that particular case, but it doesn't handle lookahead for some cases when 2 productions differ by a terminal vs a nonterminal or between 2 different nonterminals.

Yes, I think you are right.

By the way, did you see my little experiment at #12744? One day, we should try to ensure that infix symbols with no associativity have their no associativity respected.

Another question I'm wondering about is whether we should but NEXT, SELF or operconstr AT LEVEL XX on the right-hard side of a notation recursive on its right-hand side, or instead rely on the associativity.

More generally, I'm less and less convinced that a global associativity per level is the right concept we need. Maybe should we consider separatedly the associating behavior on the left-hand side and on the right-hand side (knowing that having both side associative is excluded). By doing so, we could have levels which mixes right and no associativity, and levels which mixes left and no associativity.

No, I hadn't seen #12744. I made some comments there.

I don't regularly check for new PRs. But feel free to include @jfehrle in the PR if you want me to look at something.

dev/doc/parsing.md

herbelin · 2020-07-22T14:30:47Z

dev/doc/parsing.md

+  - [Other components](#other-components)
+  - [Parsing productions](#parsing-productions)
+  - [Lookahead](#lookahead)
+


Isn't there a way to have the table of contents automatically generated?

That would be helpful, but I'm not aware there's anything that does that.

Not in Markdown. That's a very simple and limited language.

dev/doc/parsing.md

herbelin · 2020-07-24T17:21:16Z

dev/doc/parsing.md

+  ```
+
+On the other hand, `OPT` works just fine when the parser has already found the
+right production.  For example `Back` and `Back <natural>` can be combined using


I'm pretty sure we could improve the camlp5 engine to automatically do this expansion, i.e. at the time of inserting the rule in the tree, or even at runtime (at least when in the same entry).

I'm not sure how often we need or would want to handle this case. I included it in this write up because it was a simple example, not because I decided (or thought much about whether) the limitation really impacts users much. I think I'd first analyze the grammar, including the numerous edits that doc_grammar makes for the doc and see how often this feature would be helpful. I'd also consider whether such a change would impact other improvements we may want to make to the parser--which requires having a wish list.

In a lot of the Coq code, we should think about how to make things simpler, more consistent and easier to understand/modify. And add new features when they give a clear user benefit.

herbelin · 2020-07-24T17:24:32Z

doc/sphinx/user-extensions/syntax-extensions.rst

-
+   This command doesn't display all nonterminals of the grammar.  For example,
+   productions shown by `Print Grammar tactic` refer to nonterminals `tactic_then_locality`
+   and `tactic_then_gen` which are not shown and can't be printed.


I have a branch which add supports for Print Grammar of any entry. I'm not fully satisfied by it, though because collisions can happen, and entries should probably structured. E.g. an API with commands such as Print Grammar Tactic tactic_then_gen or Print Grammar Constr fields_def would already be better. Maybe would we need even stronger structuration.

The branch is https://github.com/herbelin/github-coq/pull/new/master+support-more-printable-grammar

We'd need to figure out how to handle locally-defined nonterminals that have the same name. How would we show such nonterminals to users in a non-confusing way? We could somehow require unique names or we'd need a way of qualifying names (ugh). Ltac2 has 35 or more such symbols while the rest of the grammar has maybe 6 or 7.

I'm not sure Print Grammar is hugely useful in the long term. Once the grammar in the doc is updated and we update the mlgs to match that as much as possible, it's probably better for users to look in the doc. Of course, the doc won't show productions added by notations, but these additions appear only on specific nonterminals. The doc would not cover additions from plugins external to the Coq source tree either.

I have a branch which add supports for Print Grammar of any entry

I think it's a bad idea to export such internals to end-users. A better way to do that would be to have a way to recursively extract all entries accessible from a few selected roots and display that to the user, with on-the-fly deambiguation of the entry names.

I'm not far from thinking that this is indeed too internal to be worth being exposed. However, it happens to be useful for debugging. Maybe should it be reserved to developers.

jfehrle · 2020-07-24T21:50:05Z

Somehow, many features could be added to the grammar engine (automatic factorization, local precedences, local tokens, combining lexical and parsing analysis, ...)

Yeah, though beyond the scope of this PR. We could create a new topic to get into more detail.

Nonetheless, a few thoughts: It's not clear to me that the parsing algorithm significantly limits our ability to evolve Coq for the forseeable future. I'm sure some parts of the grammar can be improved. Here are some new capabilities that could be interesting:

Functions that permit recovering the symbolic and string form of grammar productions with the filename and line number where they were defined (whether in .mlg files or from notations in .v files) and associate them with parser action routines
An ability to generate parse trees that correspond exactly to the grammar for debugging and testing. (I've made some good progress on this.)
Make the Coq parser reusable in external tools.
More informative syntax errors, such as including more of the grammar/parse tree or suggesting corrections similar to what the OCaml compiler does. Fully documenting the grammar in the doc probably makes this a bit less important.
Create a routine that analyzes the grammar to identify ambiguous productions and/or productions that shadow (or partly shadow) other productions. This could generate warnings or errors both at build time and at runtime (when defining notations or loading a plugin).

jfehrle · 2020-07-30T04:34:58Z

@Zimmi48: Can @ppedrot be the assignee for this PR?

Yes, no, or waiting? Unless @herbelin wants changes--but I think we had a good chat that's beyond the scope of the PR. @Zimmi48, if you're good with this, would you approve?

Thanks, guys.

Zimmi48

This is good to go from my point of view except the two minor comments below.
I assume that the leftover TODO in the new document are on purpose.

dev/README.md

Zimmi48 · 2020-08-03T11:03:01Z

doc/sphinx/user-extensions/syntax-extensions.rst

-   Note that the productions printed by this command are represented in the form used by
-   |Coq|'s parser (coqpp), which differs from how productions are shown in the documentation.
+   Developer documentation for parsing is in
+   `doc/parsing.md <http://github.com/coq/coq/blob/master/doc/parsing.md>`_.


Suggested change

`doc/parsing.md <http://github.com/coq/coq/blob/master/doc/parsing.md>`_.

`dev/doc/parsing.md <http://github.com/coq/coq/blob/master/dev/doc/parsing.md>`_.

Updated, ready to go, thanks. The TODOs are intentional.

ppedrot · 2020-08-11T17:00:15Z

@jfehrle despite the TODOs, this is ready to be merged right?

jfehrle · 2020-08-11T18:27:27Z

Yes, ready to merge, thanks. A few TODOs are OK in developer documentation. (I don't put user-visible TODOs in the user documentation.)

jfehrle added the kind: documentation Additions or improvement to documentation. label Jul 21, 2020

jfehrle requested a review from a team as a code owner July 21, 2020 07:01

ppedrot reviewed Jul 21, 2020

View reviewed changes

Zimmi48 reviewed Jul 23, 2020

View reviewed changes

doc/sphinx/user-extensions/syntax-extensions.rst Outdated Show resolved Hide resolved

Zimmi48 added this to the 8.12.1 milestone Jul 23, 2020

ppedrot approved these changes Jul 23, 2020

View reviewed changes

jfehrle force-pushed the on_grammars branch from de30d4c to 1f4290a Compare July 23, 2020 23:42

herbelin reviewed Jul 24, 2020

View reviewed changes

herbelin mentioned this pull request Jul 25, 2020

Adding support for recursive blocs and associativity in ARGUMENT EXTEND #12743

Closed

4 tasks

ppedrot self-assigned this Jul 31, 2020

Zimmi48 approved these changes Aug 3, 2020

View reviewed changes

More documentation on grammars and parsing

1121a2d

jfehrle force-pushed the on_grammars branch from 1f4290a to 1121a2d Compare August 3, 2020 17:55

coqbot added this to Request 8.12.1 inclusion in Coq 8.12 Aug 11, 2020

ppedrot merged commit e0e07f5 into coq:master Aug 11, 2020

Zimmi48 added a commit to Zimmi48/coq that referenced this pull request Aug 12, 2020

Backport PR coq#12717: More documentation on grammars and parsing

a1cb604

coqbot moved this from Request 8.12.1 inclusion to Shipped in 8.12.1 in Coq 8.12 Aug 12, 2020


		TODO: creates wit_ variable--makes it possible to refer to when using alternative prodn syntax?

		TODO: best practice - use this or GRAMMAR EXTEND?

	`doc/parsing.md <http://github.com/coq/coq/blob/master/doc/parsing.md>`_.
	`dev/doc/parsing.md <http://github.com/coq/coq/blob/master/dev/doc/parsing.md>`_.

More documentation on grammars and parsing #12717

More documentation on grammars and parsing #12717

Conversation

jfehrle commented Jul 21, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfehrle Jul 21, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfehrle commented Jul 22, 2020

Zimmi48 left a comment

Choose a reason for hiding this comment

Zimmi48 left a comment

Choose a reason for hiding this comment

Zimmi48 commented Jul 23, 2020

jfehrle commented Jul 23, 2020

Zimmi48 commented Jul 23, 2020

jfehrle commented Jul 23, 2020

jfehrle commented Jul 23, 2020

herbelin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfehrle Jul 24, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

herbelin Jul 24, 2020 • edited

Choose a reason for hiding this comment

jfehrle Jul 24, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfehrle commented Jul 24, 2020 • edited

jfehrle commented Jul 30, 2020

Zimmi48 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ppedrot commented Aug 11, 2020

jfehrle commented Aug 11, 2020

jfehrle commented Jul 21, 2020 •

edited

jfehrle Jul 21, 2020 •

edited

jfehrle Jul 24, 2020 •

edited

herbelin Jul 24, 2020 •

edited

jfehrle Jul 24, 2020 •

edited

jfehrle commented Jul 24, 2020 •

edited