Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More documentation on grammars and parsing #12717

Merged
merged 1 commit into from
Aug 11, 2020
Merged

Conversation

jfehrle
Copy link
Member

@jfehrle jfehrle commented Jul 21, 2020

Fixes: #12522 and adds some developer documentation of parsing.

Perhaps @ppedrot or other @coq/parsing-maintainers could help me resolve the TODOs in the developer documentation? Thanks.

@jfehrle jfehrle added the kind: documentation Additions or improvement to documentation. label Jul 21, 2020
@jfehrle jfehrle requested a review from a team as a code owner July 21, 2020 07:01
dev/doc/parsing.md Outdated Show resolved Hide resolved
dev/doc/parsing.md Outdated Show resolved Hide resolved
dev/doc/parsing.md Outdated Show resolved Hide resolved

TODO: creates wit_ variable--makes it possible to refer to when using alternative prodn syntax?

TODO: best practice - use this or GRAMMAR EXTEND?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use GRAMMAR EXTEND, except when you know what you're doing, which is the unique privilege of ssr developers (and even them, sometimes they do strange stuff).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There must be more to the story. I see GRAMMAR EXTEND is used in 17 mlg files. Is part of this tech debt (e.g. in g_indfun.mlg)? If so, I can say that. Also I can say that new tactics and commands should not be added with GRAMMAR EXTEND.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The funind instance looks like an oversight. This should be turned into a VERNAC ARGUMENT EXTEND since there is not much more to it.

The typical use of GRAMMAR EXTEND is for low-level complex syntactic constructs, typically when implementing a full-blown language grammar (e.g. gallina, LtacX, ssr). Not for the casual user, thus.

Copy link
Member Author

@jfehrle jfehrle Jul 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. The commands in g_proofs.mlg, g_vernac.mlg and g_ltac.mlg, inside GRAMMAR EXTEND are OK because they're big extensions? And they provide the semantic information through other means?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a nutshell, yes.

END
```

TODO: creates wit_ variable--makes it possible to refer to when using alternative prodn syntax?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wit variables are used by VERNAC / TACTIC EXTEND that uses those arguments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, it seems tedious to add every new nonterminal with ARGUMENT EXTEND instead of, say, using GRAMMAR EXTEND. Is ARGUMENT EXTEND always preferable to GRAMMAR EXTEND? If not, when should it be used? Seems to me it might be a struggle to get everyone to use it, and considerable work to (perhaps) convert existing code to use it as well.

Also, FWIW, "ARGUMENT" doesn't seem like a very descriptive name for what it does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is critical to use those macros. GRAMMAR EXTEND is just about syntax, when ARGUMENT EXTEND is about semantics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. How can I briefly explain which nonterminals it's critical for, because there are tons of nonterminals defined in GRAMMAR EXTEND? If people understand the reason for the guideline they're more likely to follow it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except if you reimplement a tactic language, you shouldn't use GRAMMAR EXTEND.

@jfehrle
Copy link
Member Author

jfehrle commented Jul 22, 2020

Updated in a new commit. If this looks OK, please approve. I expect @Zimmi48 will want to review the .rst file changes.

Copy link
Member

@Zimmi48 Zimmi48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work thanks! Here are my comments on the markdown part:

The newest documentation for camlp5 is available at https://camlp5.github.io/doc/htmlc/. I suggest you refer to it instead of the one hosted on gforge, especially given that the EOL of gforge has been announced.

I also suggest that you add a link to this new documentation from the index available in dev/README.md.

dev/doc/parsing.md Outdated Show resolved Hide resolved
dev/doc/parsing.md Outdated Show resolved Hide resolved
dev/doc/parsing.md Outdated Show resolved Hide resolved
dev/doc/parsing.md Outdated Show resolved Hide resolved
dev/doc/parsing.md Outdated Show resolved Hide resolved
dev/doc/parsing.md Outdated Show resolved Hide resolved
Copy link
Member

@Zimmi48 Zimmi48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to the refman LGTM as well. I just suggest fixing the links to the camlp5 documentation to not point to gforge and I suggest adding direct links to the Coq sources in the master branch on GitHub (including the non-existent yet dev/doc/parsing.md).

doc/sphinx/user-extensions/syntax-extensions.rst Outdated Show resolved Hide resolved
@Zimmi48 Zimmi48 added this to the 8.12.1 milestone Jul 23, 2020
@Zimmi48
Copy link
Member

Zimmi48 commented Jul 23, 2020

Can @ppedrot be the assignee for this PR?

@jfehrle
Copy link
Member Author

jfehrle commented Jul 23, 2020

@Zimmi48 In doc/README.md the link to MERGING.md is broken, but I can't figure out what it should be.

@Zimmi48
Copy link
Member

Zimmi48 commented Jul 23, 2020

Thanks for bringing this up. The link should be removed. The content of this file has moved to the contributing guide.

@jfehrle
Copy link
Member Author

jfehrle commented Jul 23, 2020

@ppedrot Updated. I will squash after you approve.

@jfehrle
Copy link
Member Author

jfehrle commented Jul 23, 2020

Squashed and updated.

Copy link
Member

@herbelin herbelin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, here are some comments.

Somehow, many features could be added to the grammar engine (automatic factorization, local precedences, local tokens, combining lexical and parsing analysis, ...). I don't know well the state-of-the-art in parsing, maybe are there tools or algorithms abroad that we could reuse or get inspiration from.

a few are defined directly in OCaml code. Since many developers have worked on the parser
over the years, this code can be idiosyncratic, reflecting various design concepts.

* The parser is a recursive descent parser that, by default, only looks at the next token
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practise, it looks at the longuests sequence of terminals ahead (I don't know if there is a terminology for that).

Copy link
Member Author

@jfehrle jfehrle Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, IIRC it checks that the longest sequence of nonterminals in the current parse entry matches, so it handles more than 1 symbol of lookahead in that particular case, but it doesn't handle lookahead for some cases when 2 productions differ by a terminal vs a nonterminal or between 2 different nonterminals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you are right.

By the way, did you see my little experiment at #12744? One day, we should try to ensure that infix symbols with no associativity have their no associativity respected.

Another question I'm wondering about is whether we should but NEXT, SELF or operconstr AT LEVEL XX on the right-hard side of a notation recursive on its right-hand side, or instead rely on the associativity.

More generally, I'm less and less convinced that a global associativity per level is the right concept we need. Maybe should we consider separatedly the associating behavior on the left-hand side and on the right-hand side (knowing that having both side associative is excluded). By doing so, we could have levels which mixes right and no associativity, and levels which mixes left and no associativity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I hadn't seen #12744. I made some comments there.

I don't regularly check for new PRs. But feel free to include @jfehrle in the PR if you want me to look at something.

dev/doc/parsing.md Show resolved Hide resolved
- [Other components](#other-components)
- [Parsing productions](#parsing-productions)
- [Lookahead](#lookahead)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a way to have the table of contents automatically generated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be helpful, but I'm not aware there's anything that does that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in Markdown. That's a very simple and limited language.

dev/doc/parsing.md Outdated Show resolved Hide resolved
dev/doc/parsing.md Outdated Show resolved Hide resolved
```

On the other hand, `OPT` works just fine when the parser has already found the
right production. For example `Back` and `Back <natural>` can be combined using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure we could improve the camlp5 engine to automatically do this expansion, i.e. at the time of inserting the rule in the tree, or even at runtime (at least when in the same entry).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how often we need or would want to handle this case. I included it in this write up because it was a simple example, not because I decided (or thought much about whether) the limitation really impacts users much. I think I'd first analyze the grammar, including the numerous edits that doc_grammar makes for the doc and see how often this feature would be helpful. I'd also consider whether such a change would impact other improvements we may want to make to the parser--which requires having a wish list.

In a lot of the Coq code, we should think about how to make things simpler, more consistent and easier to understand/modify. And add new features when they give a clear user benefit.


This command doesn't display all nonterminals of the grammar. For example,
productions shown by `Print Grammar tactic` refer to nonterminals `tactic_then_locality`
and `tactic_then_gen` which are not shown and can't be printed.
Copy link
Member

@herbelin herbelin Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a branch which add supports for Print Grammar of any entry. I'm not fully satisfied by it, though because collisions can happen, and entries should probably structured. E.g. an API with commands such as Print Grammar Tactic tactic_then_gen or Print Grammar Constr fields_def would already be better. Maybe would we need even stronger structuration.

The branch is https://github.com/herbelin/github-coq/pull/new/master+support-more-printable-grammar

Copy link
Member Author

@jfehrle jfehrle Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need to figure out how to handle locally-defined nonterminals that have the same name. How would we show such nonterminals to users in a non-confusing way? We could somehow require unique names or we'd need a way of qualifying names (ugh). Ltac2 has 35 or more such symbols while the rest of the grammar has maybe 6 or 7.

I'm not sure Print Grammar is hugely useful in the long term. Once the grammar in the doc is updated and we update the mlgs to match that as much as possible, it's probably better for users to look in the doc. Of course, the doc won't show productions added by notations, but these additions appear only on specific nonterminals. The doc would not cover additions from plugins external to the Coq source tree either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a branch which add supports for Print Grammar of any entry

I think it's a bad idea to export such internals to end-users. A better way to do that would be to have a way to recursively extract all entries accessible from a few selected roots and display that to the user, with on-the-fly deambiguation of the entry names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not far from thinking that this is indeed too internal to be worth being exposed. However, it happens to be useful for debugging. Maybe should it be reserved to developers.

@jfehrle
Copy link
Member Author

jfehrle commented Jul 24, 2020

Somehow, many features could be added to the grammar engine (automatic factorization, local precedences, local tokens, combining lexical and parsing analysis, ...)

Yeah, though beyond the scope of this PR. We could create a new topic to get into more detail.

Nonetheless, a few thoughts: It's not clear to me that the parsing algorithm significantly limits our ability to evolve Coq for the forseeable future. I'm sure some parts of the grammar can be improved. Here are some new capabilities that could be interesting:

  • Functions that permit recovering the symbolic and string form of grammar productions with the filename and line number where they were defined (whether in .mlg files or from notations in .v files) and associate them with parser action routines
  • An ability to generate parse trees that correspond exactly to the grammar for debugging and testing. (I've made some good progress on this.)
  • Make the Coq parser reusable in external tools.
  • More informative syntax errors, such as including more of the grammar/parse tree or suggesting corrections similar to what the OCaml compiler does. Fully documenting the grammar in the doc probably makes this a bit less important.
  • Create a routine that analyzes the grammar to identify ambiguous productions and/or productions that shadow (or partly shadow) other productions. This could generate warnings or errors both at build time and at runtime (when defining notations or loading a plugin).

@jfehrle
Copy link
Member Author

jfehrle commented Jul 30, 2020

@Zimmi48: Can @ppedrot be the assignee for this PR?

Yes, no, or waiting? Unless @herbelin wants changes--but I think we had a good chat that's beyond the scope of the PR. @Zimmi48, if you're good with this, would you approve?

Thanks, guys.

@ppedrot ppedrot self-assigned this Jul 31, 2020
Copy link
Member

@Zimmi48 Zimmi48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good to go from my point of view except the two minor comments below.
I assume that the leftover TODO in the new document are on purpose.

dev/README.md Outdated Show resolved Hide resolved
Note that the productions printed by this command are represented in the form used by
|Coq|'s parser (coqpp), which differs from how productions are shown in the documentation.
Developer documentation for parsing is in
`doc/parsing.md <http://github.com/coq/coq/blob/master/doc/parsing.md>`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`doc/parsing.md <http://github.com/coq/coq/blob/master/doc/parsing.md>`_.
`dev/doc/parsing.md <http://github.com/coq/coq/blob/master/dev/doc/parsing.md>`_.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, ready to go, thanks. The TODOs are intentional.

@ppedrot
Copy link
Member

ppedrot commented Aug 11, 2020

@jfehrle despite the TODOs, this is ready to be merged right?

@jfehrle
Copy link
Member Author

jfehrle commented Aug 11, 2020

Yes, ready to merge, thanks. A few TODOs are OK in developer documentation. (I don't put user-visible TODOs in the user documentation.)

@coqbot coqbot added this to Request 8.12.1 inclusion in Coq 8.12 Aug 11, 2020
@ppedrot ppedrot merged commit e0e07f5 into coq:master Aug 11, 2020
Zimmi48 added a commit to Zimmi48/coq that referenced this pull request Aug 12, 2020
@coqbot coqbot moved this from Request 8.12.1 inclusion to Shipped in 8.12.1 in Coq 8.12 Aug 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: documentation Additions or improvement to documentation.
Projects
No open projects
Coq 8.12
  
Shipped in 8.12.1
Development

Successfully merging this pull request may close these issues.

Documentation needed for the output of "Print Custom Grammar"
4 participants