Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mangle auto-generated names #6582

Merged
merged 1 commit into from Mar 8, 2018
Merged

Conversation

jashug
Copy link
Contributor

@jashug jashug commented Jan 11, 2018

Kind: feature.
Related to #268, #2301.
RFC: Request For Comments

Tactic scripts relying on auto-generated names is a persistent problem.
It makes them more sensitive to

  • changes in the global or local contexts
  • changes to the name generation system
  • changes in implementation of tactics

#268 attempts to address the problem by tracking which names are generated,
and emitting a warning when they are used (for some definitions of generated and used).
However, despite (from reading the comment logs) heroic work by @herbelin and everyone involved, #268 appears to have stalled.

This is a different approach to the same problem. We introduce an option that hijacks the name generation system to generate completely different names (_0, _1, _2, ... by default, in general foo0, foo1, foo2, ... for user provided foo).
Basically it's a variant of the approach in this comment.
I take a pretty broad view of what counts as a generated name, since many default names can be renamed to avoid shadowing names in the global/local context.
If instead the default was guaranteed to be the name you get, even if that shadows other names, then we could leave those as non-generated. (c.f. #6577)

I like to think of it as an emergency drill for having the name generation system radically change.

The intent is that development will proceed as usual, and after a proof is completed
you can either add Set Mangle Names in the file, or run coqc/coqide with -mangle-names.
While Set Mangle Names is set, printing is also affected, but Unset Mangle Names can be used to locally restore some nice names for printing.
I think it is possible to write ltac such that it works both with and without Set Mangle Names. However, I believe that if you Set Mangle Names Prefix "foo" for some identifier "foo" that doesn't appear as the prefix of any identifier in your development,
then successful compilation ensures your script is immune to failures caused by changes in the environment or name generation system that cause unexpected renaming of your hypotheses.
(Failure due to tactics refusing to shadow are still plausible, again c.f. #6577)

  • Handle Set Mangle Names Prefix "_" gracefully
  • Add command line flag
  • Corresponding documentation was added / updated.
  • Entry added in CHANGES.

@Zimmi48 Zimmi48 added kind: feature New user-facing feature request or implementation. needs: feedback Feeback from users and testers would be welcome. labels Jan 12, 2018
@jashug
Copy link
Contributor Author

jashug commented Jan 15, 2018

This is also somehow related to the discussion in #307.

Copy link
Member

@Zimmi48 Zimmi48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain a bit more what is the use-case you envision for this kind of feature? I think I wouldn't like to work with it with an on-going development because the goal printing is obscured by this mangling:

Goal forall n m, n + m = m + n.

gets printed

1 subgoal
______________________________________(1/1)
forall _0 _1 : nat, _0 + _1 = _1 + _0

or

1 subgoal
______________________________________(1/1)
forall x0 x1 : nat, x0 + x1 = x1 + x0

with option:

Set Mangle Names Prefix "x".

So would this feature be a way to quickly fix a script that is suddenly failing because of names?

@@ -190,9 +190,45 @@ let it_mkLambda_or_LetIn_name env sigma b hyps =
(**********************************************************************)
(* Fresh names *)

(* Introduce a mode where auto-generated names are mangled
to test dependece of scripts on auto-generated names *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

@jashug
Copy link
Contributor Author

jashug commented Jan 16, 2018

The intent is that you first write your proofs with the option off, with the intent not to rely on auto-generated names. This part is the same as the current situation.
Then, if you want to check that you are really not using any auto-generated names, you compile with -mangle-names. If the development still compiles, great. If not, you accidentally used an auto-generated name.
Where compilation fails tells you where to look for the mistake. If you can't find the mistake just by looking, you can Set Mangle Names above the failing proof, and observe where it fails. While debugging, you can locally Unset Mangle Names (in the middle of a proof) to restore some nicer names in printing of the goal.

That's how I was imagining this feature would be used. It's intended to be comparable to a linter.

@Zimmi48
Copy link
Member

Zimmi48 commented Jan 16, 2018

Thanks for the explanation. The use case is really interesting and I had completely missed it from the initial description. Now, reading it again I can see it was already sketched there but not as precisely.

@Zimmi48
Copy link
Member

Zimmi48 commented Jan 16, 2018

Is there any advantage to change the auto-generated identifier prefix rather than just appending an additional prefix to it? (To avoid the goals to become too obscure when debugging the use of an auto-generated name.)

@jashug
Copy link
Contributor Author

jashug commented Jan 16, 2018

It's simpler (and being simpler here means less maintenance), it's independent of any changes in how default prefixes are chosen, it is easier to find short strings s such that for all strings of digits n, sn is not used, compared to s where for all strings t, stn is not used. It was also slightly easier to change the code to use this system. (in next_ident_away, id gets mangled twice, which is idempotent)

None of these are very significant considerations, and I did consider prepending instead. But I couldn't come up with a short generic default prefix that was likely to be unused. (_0, _1, _2 seem unlikely variable names, while people with C / Python / JavaScript backgrounds may be using _foo and _bar, which can sometimes become _foo0 and _foo1)
If the string prepended is too long, the goals become hard to read for a different reason.

I'm definitely open to other algorithms proposed in light of the above.

On the other hand, I am worried that the quadratic performance of generating new identifiers with the same prefix will be a noticeable problem. Prepending would address that issue.

@ppedrot
Copy link
Member

ppedrot commented Jan 16, 2018

Isn't that patch breaking many scripts that rely on the half-specified behaviour of tactic-in-term naming strategy for bound variables? From the code it looks like refine (fun x => _) with this option set would change the x to some mangled name unconditionally. But it is a contract (at least in Ltac2) that the name is preserved.

@jashug
Copy link
Contributor Author

jashug commented Jan 16, 2018

Refine with binders still works. My patch gives

Set Mangle Names.
Axiom x : Set.
Goal nat -> x.
refine (fun x => _).
(* 1 subgoal
x : nat
______________________________________(1/1)
Top.x *)

(side note, I didn't know Ltac would fall back to the qualified name for the axiom x here. I approve)
It seems that refine takes the approach of alpha-renaming hypotheses that conflict with the names it introduces, and definitely maintaining the user given names. The alpha-renamed stuff gets mangled, while the given names are not.

This seems very robust behavior. If only all the introduction tactics (intro / set) worked this way.

By the way, we also have

Set Mangle Names.
Goal nat -> Set -> Type.
refine (fun x x => _).
(* 1 subgoal
_1 : nat
x : Set
______________________________________(1/1)
Type *)

(Why this is _1 and not _0 is a bit strange, but probably not important)

@herbelin
Copy link
Member

Hi, sorry to come late to the discussion, which is very interesting. In particular, I'm sympathetic to the idea of not preventing ourselves to possibly move to a radically new naming model, and to the idea that it may be the right time, after so many years of experience, for designing a "conceptual" model about names, capitalizing on this experience.

I shall make some miscellaneous remarks, maybe some of them you will find commonplace, but, at least with the hope to suscitate further discussions or experimentations.

I like very much the idea of cartouches, probably not as an exclusive approach, but as one of the possible ways to refer to an hypothesis (resting on a mathematical usage where one can refer to a statement or an hypothesis either by what it says, or by a reference, such as (*) or [3]). On the other side, I wonder how this scales to big proofs where we have many hypotheses with similar statement.

I also consider fresh as the indispensable tool to deal with name in tactics.

I like the idea that a script should be invariant by the choice of generated names very much too.

On the other side, it is comfortable not to think about names and to use the names offered by the system (and in particular, I think it is worth to teach Coq recipes to invent "nice" names, as I was trying to do with induction and as @Matafou precisely was working on also at some time).

I regularly wonders whether things could be made simpler by having more powerful user interfaces. With user interfaces which know the structure of binders in a text and which are able to change all occurrences of the same name (its binding position and its bound occurrences) at the same time. With user interfaces which know in particular which names are generated and which are able to rename all occurrences of a generated name whenever a conflict arrive.

Now, it is also the case that it is technically convenient to think at Coq as a tool which can be controlled textually. Otherwise said, how to drive such user interface features by requests to a core system. Thus, I believe that we strongly need to raise proof scripts at the level of an object of study, with a language to describe proof scripts, to describe bindings within a proof scripts, to describe dependencies, to apply transformations to proof scripts, as we do for proof terms but at a higher level (as was actually already contemplated in the good old Mowgli project with unibo).

@jashug
Copy link
Contributor Author

jashug commented Feb 15, 2018

The behavior of abstract under this patch (mangling all hypothesis names in the context) is consistent with observable behavior (c.f. #3146); hypothesis names in the context can change inside abstract depending on the global environment. Thus this patch itself is working as intended.

@Matafou
Copy link
Contributor

Matafou commented Feb 16, 2018

A few remarks:

I like very much the idea of cartouches, probably not as an exclusive approach, but as one of the possible ways to refer to an hypothesis (resting on a mathematical usage where one can refer to a statement or an hypothesis either by what it says, or by a reference, such as (*) or [3]).

Note that with something like cartouches you can do:

rename < forall _:nat, _ < _ > into hyp3.

and then use hyp3 if you like. That is more or less what you are describing.

I like the idea that a script should be invariant by the choice of generated names very much too.

The other choice (that I use) is to make name generation itself invariant to definition changes. I don't know which is better. I will definitively give a real try to cortouche.

I regularly wonders whether things could be made simpler by having more powerful user interfaces.
[...]
Now, it is also the case that it is technically convenient to think at Coq as a tool which can be controlled textually.

It is not hard to implement something in IDEs that would help writing cortouche by clicking on a hypothesis (with some help from the prover I guess). That would not allow for refactoring but would make easy to write script that do not need refactoring.

@jashug
Copy link
Contributor Author

jashug commented Feb 17, 2018

In response to the comment by @Zimmi48 about goal printing, that the names in goals are mangled says that printing (of the goals, at least Check as well, and probably other places) is asking for fresh names everywhere. For example

Check (forall (n n : nat), n = n).
(* nat -> forall n0 : nat, n0 = n0 : Prop *)

where I believe the name n is claimed by the anonymous nat -> leading to the second nat being renamed to n0. Additionally, the goal printing is influenced by the global environment:
Goal (forall (n n : nat), n = n). prints differently if we write Axiom n : True. before.

We could envision a different printing strategy, which renames as refine does, guaranteeing the inner-most name is preserved (except for interaction with Section).

On the other hand, the Goal printing accurately reflects the names introduced by the plain intro tactic (I think), so any change might have to apply to both. And changing the behavior of the intro tactic would, again, be much easier if proof scripts were independent of the auto-generated name scheme.

Given the intended use as a linter, this shouldn't be a blocker for this PR. I'm working on rebasing and adding documentation; it should be ready for review after that.

@jashug jashug changed the title [In Progress] Mangle auto-generated names Mangle auto-generated names Feb 17, 2018
@jashug
Copy link
Contributor Author

jashug commented Feb 17, 2018

I rebased and added at least minimal documentation.
There turned out not to be much to benchmark here: In the normal mode this should have practically zero effect (one or two extra dereferences and boolean tests on each fresh name), and I would be very surprised if any large developments compile out of the box under the new mode.

I consider this ready for review by any interested parties.

@herbelin
Copy link
Member

Second round at thinking at this PR: it is remarkably short for a pretty powerful feature.

I wonder whether variants could be imagined. For instance, rather than having a mangle-names option on by default, can we imagine that Qed triggers a replay of the goal using a different prefix and alert about using auto-generated names. In this case, if I'm not mistaken, we would not even need to use a specific _ prefix but just the existing naming rules. It would only be at the second step that a new prefix could be used to detect the inconsistencies. But maybe that's after all already the methodology that you have in mind.

On a related subject, I had a dream that a file translator could automatically make (most of) names which are generated explicit in existing scripts. Maybe one day.

BTW, @jashug: did you make progress on #6781 or is it untractable?

@jashug
Copy link
Contributor Author

jashug commented Feb 26, 2018

I started work on #6781, but got blocked on #6785, which caused for example

Set Implicit Arguments.
Inductive ex A (P : A -> Type) := ex_intro : forall a, P a -> ex (A := A) P.

to fail, because the argument name A got changed to _0.
Unless this behavior is changed, the automatic implicit argument behavior interacts very poorly with this option. Now I'm reworking the implicit argument system...

@herbelin
Copy link
Member

So, you want to "mangle" not only proofs but statements?

Alternatively, the "mangling" could give: Inductive ex _A (P : _A -> Type) := ex_intro : forall a, P a -> ex (_A := _A) P?

@jashug
Copy link
Contributor Author

jashug commented Feb 26, 2018

No, I think that would be a step in the wrong direction. I think the implicit argument system should be changed so that if you ask for arguments named A and P, the default names are guaranteed to be A and P, where as currently you can get back e.x. A0 or _0.

@jashug
Copy link
Contributor Author

jashug commented Feb 27, 2018

Currently, the default names of named/implicit arguments (the id in foo (id := value)) are not guaranteed to be what you ask for. So "correct" code for ex with the option on would be
Inductive ex A (P : A -> Type) := ex_intro : forall a, P a -> ex (_0 := A) P. just changing the named argument label.
So code like the below works:

Fixpoint foo {arg_1 arg_1 _ : nat} :=
  match arg_1 with
  | O => 3
  | S n => foo (H := 5) (arg_2 := n) (arg_1 := S arg_1)
  end.
Eval hnf in foo (arg_2 := 1) (arg_3 := 5).

This mangling option points this fact out, by changing the argument names to _0, _1, ....
(Strictly speaking, that the name of the third argument is arg_3 on the outside remains true even with the option, because that is hardcoded in impargs.ml and is not a fresh name).

@herbelin
Copy link
Member

@jashug: thanks for your example (and for the pathological examples given at #6785). They obviously have to be fixed and there are certainly different possible remedies. To complete the picture, I can explain what was my motivation for using the name, say a, given by Print foo in foo (a:=t) (or more generally the name shown in the context for hypothesis called with a named argument) rather than the name textually written in the definition of foo.

The typical (borderline) situation to address was typically the following:

Definition a p := forall n, n+p=0.
Goal forall n, a n -> n=0.
unfold a. 
(* forall n : nat, (forall n0 : nat, n0 + n = 0) -> n = 0 *)
intros.
(* n : nat  H : forall n0 : nat, n0 + n = 0 |- n = 0 *)
apply H with (n0 := 0).
(* works *)

To add to the zoo of pathological examples, I thought the following would have worked but it does not:

Axiom b : forall p n, n + p = 0.
Goal forall n, n=0.
intros.
Check b.
(* forall p n : nat, n + p = 0 *)
Check b n.
(* forall n0 : nat, n0 + n = 0 *)
Fail apply (b n (n0:=0)).
(* Wrong argument name: n0. *)
Fail apply (b n (n:=0)).
(* Wrong argument name: n. *)

@jashug
Copy link
Contributor Author

jashug commented Feb 28, 2018

There are no implicit arguments in the above, right? It looks like apply ... with ... is using the same syntax for a slightly different action. (Of course, using n0 is still fragile if the naming scheme changes). So in your second example, b has no implicit arguments, so since named arguments are currently only supported for implicit arguments, it doesn't work.

For example, the below works:

Axiom b : forall p n a, n + a + p = 0.
Arguments b p [n] a.
Goal forall n, n=0.
intros.
Check b.
(* b : forall p n a : nat, n + a + p = 0 *)
Check b n.
(* b n : forall n0 a : nat, n0 + a + n = 0 *)
apply (b n (n:=0) 0).

(Added a so that n wouldn't become maximal and inserted as evar)

@herbelin
Copy link
Member

There are no implicit arguments in the above, right? It looks like apply ... with ... is using the same syntax for a slightly different action.

Oh right, sorry. (So please see my message as only a parenthesis in the discussion.)

@ppedrot
Copy link
Member

ppedrot commented Feb 28, 2018

For the record, I think this feature is the correct answer to the name generation problem. In particular, I'd definitely favour it by orders of magnitude over #268, which is trying to solve a slightly different but (IMO) unsolvable problem.

@Zimmi48
Copy link
Member

Zimmi48 commented Mar 6, 2018

This feature is ready basically, isn't it? It's just waiting for an approval.

@jashug
Copy link
Contributor Author

jashug commented Mar 6, 2018

Yeah, I think this is ready.
I'm not sure how usable it will be until issues like #6785 and #3146 are addressed, but there are probably workarounds for a determined enough user.

@maximedenes
Copy link
Member

maximedenes commented Mar 7, 2018

@ppedrot do you approve this PR? I mean, you said you preferred this one over #268, but are you concretely in favor of merging this one now?

@herbelin
Copy link
Member

herbelin commented Mar 7, 2018

I'm not sure how usable it will be until issues like #6785 and #3146 are addressed, but there are probably workarounds for a determined enough user.

I'd be confident that we can eventually do something for those two too and not have to wait to merge the current one.

Copy link
Member

@ppedrot ppedrot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For once, I'd be bold and support the merge the PR. It doesn't affect users if they're not using the option, and because it is typically used to make scripts more robust, it doesn't prevent us from changing the algorithm later to do something even more strict like e.g. generating purely random names.

@maximedenes
Copy link
Member

@jashug Thanks a lot for this work!

@maximedenes maximedenes merged commit 5968648 into coq:master Mar 8, 2018
maximedenes added a commit that referenced this pull request Mar 8, 2018
@jashug jashug deleted the autogen-name-mangling branch March 8, 2018 17:23
@jashug jashug mentioned this pull request Jul 22, 2021
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: feature New user-facing feature request or implementation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants