Map opaque definitions to lossy accumulators in the VM #18917

silene · 2024-04-09T09:36:46Z

This pull request introduces a new kind of accumulator that discards its arguments. If this accumulator is still present at the end of the reduction, the conversion check fails with an error message, because we do not have enough information to conclude. This accumulator is used for opaque definitions (i.e., Qed). This obviously breaks completeness when such definitions are not eliminated during computations, but I cannot imagine a case where the user would meaningfully want them to survive a reduction in the VM.

Currently, the bytecode compiler does not take advantage of the existence of this accumulator to remove useless computations.

On the following testcase, memory consumption is reduced by 2.2 GB, no major GC is needed, and time is divided by 3.

Require Import List Program BinPos.

Section Rev.
Variable T : Type.
Record vec n := { list_of_vec : list T; vec_spec : length list_of_vec = n }.

Definition build l := Build_vec (length l) l eq_refl.

Lemma length_rev' l : @length T (rev' l) = length l.
Proof. unfold rev'. rewrite <- rev_alt. apply rev_length. Qed.

Program Definition rev {n} (v : vec n) :=
  Build_vec n (rev' (list_of_vec _ v)) _.
Next Obligation. rewrite length_rev'. apply vec_spec. Qed.
End Rev.

Definition v := build _ (0::1::2::3::4::5::nil).
Time Eval vm_compute in list_of_vec _ _ (Pos.iter (rev _) v 10000000).

Added / updated test-suite.
Added changelog.
Added / updated documentation.
Opened overlay pull requests.
- aac-tactics: Replace calls to "vm_compute in hyps" by plain calls to vm_compute. coq-community/aac-tactics#138
- metacoq: Remove useless calls to Eval vm_compute. MetaCoq/metacoq#1079
- perennial: Do not call vm_compute on proof-encumbered terms. mit-pdos/perennial#64

ppedrot · 2024-04-09T09:44:17Z

Some quick remarks:

We need something like that for SProp anyways
I think that, like in the kernel, we want to separate conversion and reduction mode. For reduction we want to keep the accumulated terms around, for conversion we don't really care (except for completeness).

silene · 2024-04-09T10:33:57Z

My point is that, even for reduction, you never want to keep the accumulated terms around. (And just to be clear, my pull request drops accumulated terms only for Qeded definitions, not for Inductives and Axioms, where it makes sense to keep them.)

ppedrot · 2024-04-09T13:18:14Z

@silene but if you want to go that way, why don't you just write a specific variant of the VM that just doesn't care about open terms? Most things you will ever want to compute are closed booleans anyway.

silene · 2024-04-09T13:28:49Z

Unfortunately no. That was my original impression too, but it cannot be made to work. (I have a branch named VMaccu which dates back from 8 years ago and actually implemented that.)

The thing is, you are never computing closed booleans. You are always computing true = true, which is an open term!

JasonGross · 2024-04-09T14:46:26Z

This pull request introduces a new kind of accumulator that discards its arguments.

This is great! I've wanted something like this (although more general than the VM, I've wanted it in terms in general) for a decade.

This accumulator is used for opaque definitions (i.e., Qed). This obviously breaks completeness when such definitions are not eliminated during computations, but I cannot imagine a case where the user would meaningfully want them to survive a reduction in the VM.

What are module-sealed definitions considered? I have use-cases that require leaving around sealed opaque constants from VM reduction. As long as this is possible, working around a lack of completeness around Qed-opaque constants should be easy.

The thing is, you are never computing closed booleans. You are always computing true = true, which is an open term!

How is eq_refl bool true open?

silene · 2024-04-09T15:34:57Z

What are module-sealed definitions considered?

If I understand correctly what you mean, they are currently considered the same as axioms and inductive families:

Module Type T. Parameter P : bool -> Prop. End T.
Module M : T.
  Definition P : bool -> Prop. Proof. exact (fun b => b = true). Qed.
  Fail Eval vm_compute in P false. (* Error: vm_compute reduction has produced an incomplete term. *)
End M.
Eval vm_compute in M.P false. (* = M.P false : Prop *)

How is eq_refl bool true open?

It is closed, but this is not what the VM is called on (or any other reduction machinery of Coq). The VM is called on its type, which is eq b true, with b hopefully reducing to true, so that eq_refl true has a type convertible to eq b true.

Think about it. You always call vm_compute on a goal, which is a type, almost always axiomatic or inductive, hence open.

JasonGross · 2024-04-09T15:44:40Z

You always call vm_compute on a goal, which is a type, almost always axiomatic or inductive, hence open.

Ah, right. I think the actual confusion on my end, though, was that inductive types (such as eq bool true true) are considered open.

ppedrot · 2024-04-09T15:48:02Z

almost always axiomatic or inductive, hence open.

There is no fundamental reason why we implement inductive types as accumulators. We know the arity of the inductive type statically, we could choose a representation that does not discard the head but treats it as a particular value of type Type.

silene · 2024-04-09T16:01:44Z

First, the only thing an inductive type can do from a computational point of view is to accumulate the arguments that are passed to it. So, calling it an accumulator is just fine. We can tweak the implementation, but it will not fundamentally change what they are.

Second, if you mean that we do not need any of the other behaviors associated to accumulators (mostly, constructing match with around them), then I mostly agree. This is what the VMaccu branch was doing and it was a nice improvement on code complexity: "17 files changed, 121 insertions(+), 398 deletions(-)". But this has much stronger consequences and even the standard library needed modifications. So, I do not even want to imagine the amount of breakage in user developments.

ppedrot · 2024-04-09T17:16:56Z

But this has much stronger consequences and even the standard library needed modifications

What I am proposing is an internal change of representation of inductive types in the VM, it shouldn't affect any user-facing code. Basically, rather than implementing it as an accumulator, implement it as a closure of static arity (the length of the context of the inductive, essentially) that returns a block with a specific tag. Just like what the representation of Prod does, but with a different tag, and remembering the underlying inductive. This way you can have lossy accumulators but the VM still works when checking eq bool true true ≡ eq bool (eval big_term) true.

silene · 2024-04-09T18:25:02Z

First, without non-lossy accumulators, there are breakage all over the place, which is why I gave up on the original VMaccu experiment. You could argue that this is because people have been using vm_compute in places where they should never have (e.g., vm_compute; destruct p, as encountered in the standard library). So, if we change vm_compute, it is their own fault when it breaks. But I prefer to be conservative and not break existing user developments. So, non-lossy accumulators are there to stay.

Second, a non-lossy accumulator is already a closure. We could add a new construct so that, for inductive types, these closures have a static arity instead of a dynamic one. But I doubt it would cause a noticeable speedup. In fact, the evaluation of the code for non-lossy accumulators (ACCUMULATE) is already much faster than the one for standard closures (RESTART; GRAB), which is kind of ironical. So, I don't see any point in making inductive types a special case, but I am surely misunderstanding what you propose.

silene · 2024-04-10T04:15:22Z

@coqbot run full ci

They do not aggregate information during computations and thus are fast (no memory allocation, no costly copy). The downside is that the conversion check will fail if any incomplete accumulator remains in the final reduced term.

coqbot-app · 2024-04-10T07:04:30Z

🔴 CI failures at commit b13ae7f without any failure in the test-suite

✔️ Corresponding jobs for the base commit eccca3a succeeded

❔ Ask me to try to extract minimal test cases that can be added to the test-suite

🏃 @coqbot ci minimize will minimize the following targets: ci-aac_tactics, ci-mtac2, ci-perennial

You can also pass me a specific list of targets to minimize as arguments.

silene · 2024-04-10T09:32:47Z

Regarding the failure in aac-tactics, it happens because aac_normalize calls vm_compute on a function env_sym that contains opaque proof terms about the reified symbols (e.g., of type Proper eq). Before, the strong normalization of env_sym was silently exploding but it was eventually succeeding. Now, it errors out due to the opaque proofs. Replacing vm_compute by cbn fixes the issue. (And it even makes sense, because computing the strong normal form of env_sym is just pointless.)

coqbot-app · 2024-04-10T10:19:01Z

🔴 CI failures at commit cc5e20f without any failure in the test-suite

✔️ Corresponding jobs for the base commit eccca3a succeeded

❔ Ask me to try to extract minimal test cases that can be added to the test-suite

🏃 @coqbot ci minimize will minimize the following targets: ci-aac_tactics, ci-fiat_crypto, ci-fourcolor, ci-jasmin, ci-metacoq, ci-mtac2, ci-perennial

You can also pass me a specific list of targets to minimize as arguments.

silene · 2024-04-10T12:26:49Z

For Perennial, the calls to vm_compute produce terms like

(Works val #(U64_val 6) = Works val #(U64_val 6))

which seems nice enough. But if you display the implicit arguments, you get a 1100-line term, which shows that its strong normalization did explode. And you cannot even replace the calls to vm_compute by something else, because the constants involved are "basically transparent but considered opaque for reduction", as Coq says. In other words, Perennial uses vm_compute to force reduction despite opacity.

JasonGross · 2024-04-10T13:34:39Z

And you cannot even replace the calls to vm_compute by something else, because the constants involved are "basically transparent but considered opaque for reduction", as Coq says

Can we not replace it with a with_strategy ... cbv? Or if we need to make everything transparent, perhaps with_strategy can be extended with a way of specifying "all constants"?

I'm curious about @coqbot CI minimize ci-fiat_crypto

coqbot-app · 2024-04-10T13:34:44Z

I am now running minimization at commit cc5e20f on requested target ci-fiat_crypto. I'll come back to you with the results once it's done.

silene · 2024-04-10T14:00:09Z

For MetaCoq, vm_compute is called on terms such as

let (x, _) :=
   the_loop_terminate 1000000000 [] (NegSpaceClause [Eqv (Var 1) (Var 1)] [] [])
     {|
       M.this := M.Raw.Node Black M.Raw.Leaf (PureClause [] [] 1 eq_refl) M.Raw.Leaf;
       M.is_ok := M.Raw.add_ok M.Raw.Leaf (PureClause [] [] 1 eq_refl)
     |} (PureClause [] [] 1 eq_refl) in
x

Since the_loop_terminate is opaque (generated by Function), no meaningful result is produced before the pull request, and the normalization fails after the pull request. I wonder whether this code might have bitrotten and is no longer meaningful.

silene · 2024-04-10T14:51:35Z

Can we not replace it with a with_strategy ... cbv?

Indeed, it works. The implicit arguments are still huge, so the strong normalization takes some noticeable time. (About one second to check 3 + 3 ~~> 6.)

But that is only if I know beforehand which constants to make transparent. If I try to blindly run cbv, it fills all the memory and Coq dumps its core. So, I have no way to guess which constants I need to make transparent. (There are about 200 constants that are opaque.)

Having some with_strategy every_transparent would indeed help.

Blaisorblade · 2024-04-10T15:09:45Z

@Janno pointed out we have code normalizing AVL trees, and those embed Qed-opaque proofs. What then?

JasonGross · 2024-04-10T15:11:27Z

@Janno pointed out we have code normalizing AVL trees, and those embed Qed-opaque proofs. What then?

If there is some small fixed set of Qed-opaque proofs, we can just seal them in modules.

silene · 2024-04-10T15:21:56Z

we have code normalizing AVL trees, and those embed Qed-opaque proofs. What then?

So that there is no misunderstanding, there is an issue only if the final result produced by vm_compute still contains those Qed-opaque proofs. If they are only transient during the reduction, then there is no issue. (On the contrary, this pull request will drastically reduce the memory consumption in that case.)

silene · 2024-04-10T16:40:35Z

Regarding the Mtac2 failure, as far as I can guess from the code, it seems that the failing tactic creates an identifier using M.nu, then vm_compute is called on an expression containing this identifier, and it chokes. So, the question is what is being used to represent identifiers. (Since it chokes, I suppose a Qeded constant plays the role of the fresh name, but it is not obvious from the code.)

Janno · 2024-04-10T16:44:11Z

Regarding the Mtac2 failure, as far as I can guess from the code, it seems that the failing tactic creates an identifier using M.nu, then vm_compute is called on an expression containing this identifier, and it chokes. So, the question is what is being used to represent identifiers. (Since it chokes, I suppose a Qeded constant plays the role of the fresh name, but it is not obvious from the code.)

M.nu generates fresh variables in the coq context. Unless the type given to M.nu contains Qeded constants I don't see why vm_compute would now fail.

silene · 2024-04-10T16:54:09Z

As far as I can tell, the type seems innocuous. But how is the fresh variable added to the context? My understanding is probably naive (I prefer to stay away from this part of Coq), but I thought that Coq did not let you add an arbitrary name to the context. This name has to be backed up by an actual definition, or at the very least by an evar.

Janno · 2024-04-10T16:58:05Z

M.nu is always followed by M.abs_fun or some other abstraction primitive which then turns the variable into a binding term constructor. The binder must not escape the continuation of M.nu. If it does Mtac2 raises an exception. \nu (x : False), M.ret x is disallowed but \nu (x : False), M.abs_fun x x is fine because it simply returns fun x : False, x.

JasonGross · 2024-04-10T17:03:52Z

Since it chokes, I suppose a Qeded constant plays the role of the fresh name, but it is not obvious from the code.

Is abstract used under the hood?

JasonGross · 2024-04-10T17:36:04Z

I'm curious about @coqbot CI minimize ci-fiat_crypto

Minimizer seems to be having some trouble, but it looks like we're just using Compute on things that produce exist, and, from, OOB discussions with Andres, we're fine just replacing Compute with Eval compute in.

silene · 2024-04-10T18:31:46Z

Sorry about the noise, M.nu was a red herring. I had missed that vm_compute was called on a term containing M.bind (why?!), which is opaque, thus causing an expected failure. So, a reduced example would be as follows:

Definition bar :=
  let m :=
    let w := reduce RedVmCompute (M.bind (M.ret 0) (fun v => M.ret v)) in
    w in
  ltac:(mrun m). (* Failure, as expected *)

But there is something extremely strange about this example, which is why I had a hard time getting rid of M.nu. As you can see, there is a trivial let-binding w above. If I manually reduce it, then it works, and I have absolutely no idea why.

Definition bar :=
  let m := reduce RedVmCompute (M.bind (M.ret 0) (fun v => M.ret v)) in
  ltac:(mrun m). (* Success ??? *)

Janno · 2024-04-10T19:31:26Z

As you can see, there is a trivial let-binding w above. If I manually reduce it, then it works, and I have absolutely no idea why.

let x := reduce t in .. is a trick to get definitional equality between x and t. It is in some sense an Mtac2 primitive. And it is only recognized in exactly that form. Removing the outer let binding leaves the interpreter with just reduce .. which is not recognized as anything special and thus treated as the identity function that it actually is defined as.

I had missed that vm_compute was called on a term containing M.bind (why?!)

I've now had a few minutes to re-familiarize myself with the code. We are looking at a meta-meta-program. Computation on meta-programs is to be expected. As to why I chose vm_compute instead of lazy: I cannot remember. I suspect I had much bigger test cases that ran faster with vm_compute but given that there are user terms (or, rather, types) involved I probably planned on switching to something else eventually. I suggest ignoring Mtac2 for now. If this PR progresses I will fix the breakage.

Blaisorblade · 2024-04-11T09:51:50Z

@JasonGross @silene as a user, why should vm_compute treat Qed-opaque and module-opaque constants differently?

If the point is just a performance heuristic, it would be more orthogonal to treat it as such: discardability should rather be a tweakable per-constant discard flag. It can default to true for Qed terms and false otherwise, but it seems like it should be tweakable.
Alternatively, a global flag to control this could also be okay?

What are module-sealed definitions considered? I have use-cases that require leaving around sealed opaque constants from VM reduction. As long as this is possible, working around a lack of completeness around Qed-opaque constants should be easy.

That requires duplicating all statements, and moving all relevant clients from one constant to the other. Much more work than tweaking a flag. It's not even local work: if somebody wants to use vm_compute on something like FMapAVL.t , and there isn't already a separate "proof stripping" function, you need to write quite a lot of code for it.

silene · 2024-04-11T11:44:10Z

why should vm_compute treat Qed-opaque and module-opaque constants differently?

Note that it is not so much the module-opaque constants that are treated differently. It is Qed versus Parameter. In one case, the user explicitly marked the body as being irrelevant for computations, while in the other case, there is not even a body in the first place (so it is hard to guess what the intent of the user was). It just so happens that module-opaque constants are indistinguishable from Parameters, as far as the VM is concerned.

Alternatively, a global flag to control this could also be okay?

No. Or rather it should be clear that such a flag can only impact future definitions (because the previous ones have already been compiled to bytecode), which is presumably not what you expect.

As for the local attribute, I am fine with it and I actually considered it. It is presumably a one-liner in safe_typing.ml, but I have just no idea what this line looks like.

if somebody wants to use vm_compute on something like FMapAVL.t

Let me stress again that it only matters if you expect the end result of vm_compute to actually contain values of type FMapAVL.t. If such Qed-encumbered values are only transient during the reduction, you will not notice any difference of behavior (except a speed up). Also, is FMapAVL.t even a good example? I was under the impression that it was a proof-free type:

Inductive tree {elt : Type} :=
  | Leaf : tree
  | Node : tree -> key -> elt -> tree -> int -> tree.

ppedrot · 2024-04-11T11:51:36Z

because the previous ones have already been compiled to bytecode

Why can't we parameterize the runtime behaviour of accumulators with a flag passed to the VM? This would be the same bytecode, just that it behaves differently depending on the context. Note that we already have a similar mechanism for SProp values in the kernel machine, so as to differentiate conversion (where we erase irrelevant values) from reduction (where we care about the underlying term).

silene · 2024-04-11T14:03:24Z

Why can't we parameterize the runtime behaviour of accumulators with a flag passed to the VM?

Hmm... Let me think out loud. So, when the VM executes ACCUMULATE (or PROJ or ...) on a non-lossy accumulator of both tag 0 and inner tag 0 (case Aid (ConstKey _)), it would check the global boolean flag and turn the resulting accumulator into a lossy one. I guess that would work. (It might break vm_compute involving section variables, but who cares?) I will submit a parallel pull request.

However, it would prevent something I had on the old branch and that I would have liked to restore. When compiling an application (lossy arg1 arg2 ...), the bytecode compiler would eagerly drop the arguments, rather than having them evaluated and dynamically dropped at runtime. This was making a noticeable difference on performances. But obviously, such an optimization can only be performed if we know that opaque constants will be lossy. If the information is only available at runtime, the optimization becomes impossible.

JasonGross · 2024-04-11T14:43:18Z

This was making a noticeable difference on performances. But obviously, such an optimization can only be performed if we know that opaque constants will be lossy. If the information is only available at runtime, the optimization becomes impossible.

Is the size of byte code or the time to generate byte code ever a bottleneck? If not, we can just generate two copies of the byte code for each definition, one to use in lossy mode and one to use in lossless mode.

ppedrot · 2024-04-11T16:27:26Z

size of byte code

Bytecode is already a major contributor to the size of vo files... (Probably for bad reasons, but that'd require fixing it first.)

JasonGross · 2024-04-11T16:40:34Z

Hm, what about wrapping the entire application in some sort of "maybe lossy" wrapper? That is, do the equivalent of

Definition maybe_lossy {A} (f : unit -> A) : A := f tt.

and compiling (lossy arg1 arg2 ...) to (maybe_lossy (fun _ => lossy arg1 arg2 ...))? This way you won't bother evaluating the arguments.

silene · 2024-04-11T17:14:55Z

So, when the VM executes ACCUMULATE (or PROJ or ...) on a non-lossy accumulator of both tag 0 and inner tag 0 (case Aid (ConstKey _)), it would check the global boolean flag and turn the resulting accumulator into a lossy one. I guess that would work.

So, #18927 shows that this criteria is not quite right, as it breaks all the primitive types (int, array, etc).

andres-erbsen · 2024-04-17T01:27:07Z

Exciting work! On https://xavierleroy.org/publi/extensional-binary-tries.pdf benchmark "repeated", I get that this draft PR already reduces the slowdown from using a sigma type from 5x to 2.6x, and (EDIT) just 1.4x if instead using a dedicated primitive record.

silene added the part: VM Virtual machine. label Apr 9, 2024

coqbot-app bot added the needs: full CI The latest GitLab pipeline that ran was a light CI. Say "@coqbot run full ci" to get a full CI. label Apr 9, 2024

silene force-pushed the VMaccu2 branch from e2447b7 to ce0855b Compare April 9, 2024 11:38

coqbot-app bot removed the needs: full CI The latest GitLab pipeline that ran was a light CI. Say "@coqbot run full ci" to get a full CI. label Apr 10, 2024

silene added 3 commits April 10, 2024 08:58

Map opaque definitions to bytecode holes.

1337496

Fix testsuite.

cc5e20f

silene added the request: full CI Use this label when you want your next push to trigger a full CI. label Apr 10, 2024

silene force-pushed the VMaccu2 branch from b13ae7f to cc5e20f Compare April 10, 2024 07:04

coqbot-app bot removed the request: full CI Use this label when you want your next push to trigger a full CI. label Apr 10, 2024

silene mentioned this pull request Apr 10, 2024

Replace calls to "vm_compute in hyps" by plain calls to vm_compute. coq-community/aac-tactics#138

Open

silene mentioned this pull request Apr 11, 2024

Map opaque definitions to lossy accumulators in the VM (variant) #18927

Closed

4 tasks

This was referenced Apr 12, 2024

Do not call vm_compute on proof-encumbered terms. mit-pdos/perennial#64

Merged

Remove useless calls to Eval vm_compute. MetaCoq/metacoq#1079

Open

SkySkimmer added kind: performance Improvements to performance and efficiency. kind: redesign The same functionality is being re-implemented in a different way. and removed kind: redesign The same functionality is being re-implemented in a different way. labels Apr 16, 2024

andres-erbsen mentioned this pull request Apr 17, 2024

[FYI, do not merge as-is] Slightly faster variant of Sigma using a dedicated primitive record xavierleroy/canonical-binary-tries#4

Open

github-actions bot added the needs: rebase Should be rebased on the latest master to solve conflicts or have a newer CI run. label May 21, 2024

Map opaque definitions to lossy accumulators in the VM #18917

Are you sure you want to change the base?

Map opaque definitions to lossy accumulators in the VM #18917

Conversation

silene commented Apr 9, 2024 • edited

ppedrot commented Apr 9, 2024

silene commented Apr 9, 2024

ppedrot commented Apr 9, 2024

silene commented Apr 9, 2024

JasonGross commented Apr 9, 2024

silene commented Apr 9, 2024

JasonGross commented Apr 9, 2024

ppedrot commented Apr 9, 2024

silene commented Apr 9, 2024

ppedrot commented Apr 9, 2024

silene commented Apr 9, 2024

silene commented Apr 10, 2024

coqbot-app bot commented Apr 10, 2024

silene commented Apr 10, 2024

coqbot-app bot commented Apr 10, 2024

silene commented Apr 10, 2024

JasonGross commented Apr 10, 2024

coqbot-app bot commented Apr 10, 2024

silene commented Apr 10, 2024

silene commented Apr 10, 2024

Blaisorblade commented Apr 10, 2024

JasonGross commented Apr 10, 2024

silene commented Apr 10, 2024 • edited by aa755

silene commented Apr 10, 2024

Janno commented Apr 10, 2024 • edited

silene commented Apr 10, 2024

Janno commented Apr 10, 2024 • edited

JasonGross commented Apr 10, 2024

JasonGross commented Apr 10, 2024

silene commented Apr 10, 2024

Janno commented Apr 10, 2024

Blaisorblade commented Apr 11, 2024

silene commented Apr 11, 2024

ppedrot commented Apr 11, 2024

silene commented Apr 11, 2024

JasonGross commented Apr 11, 2024

ppedrot commented Apr 11, 2024

JasonGross commented Apr 11, 2024

silene commented Apr 11, 2024

andres-erbsen commented Apr 17, 2024 • edited

silene commented Apr 9, 2024 •

edited

silene commented Apr 10, 2024 •

edited by aa755

Janno commented Apr 10, 2024 •

edited

Janno commented Apr 10, 2024 •

edited

andres-erbsen commented Apr 17, 2024 •

edited