Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map opaque definitions to lossy accumulators in the VM #18917

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

silene
Copy link
Contributor

@silene silene commented Apr 9, 2024

This pull request introduces a new kind of accumulator that discards its arguments. If this accumulator is still present at the end of the reduction, the conversion check fails with an error message, because we do not have enough information to conclude. This accumulator is used for opaque definitions (i.e., Qed). This obviously breaks completeness when such definitions are not eliminated during computations, but I cannot imagine a case where the user would meaningfully want them to survive a reduction in the VM.

Currently, the bytecode compiler does not take advantage of the existence of this accumulator to remove useless computations.

On the following testcase, memory consumption is reduced by 2.2 GB, no major GC is needed, and time is divided by 3.

Require Import List Program BinPos.

Section Rev.
Variable T : Type.
Record vec n := { list_of_vec : list T; vec_spec : length list_of_vec = n }.

Definition build l := Build_vec (length l) l eq_refl.

Lemma length_rev' l : @length T (rev' l) = length l.
Proof. unfold rev'. rewrite <- rev_alt. apply rev_length. Qed.

Program Definition rev {n} (v : vec n) :=
  Build_vec n (rev' (list_of_vec _ v)) _.
Next Obligation. rewrite length_rev'. apply vec_spec. Qed.
End Rev.

Definition v := build _ (0::1::2::3::4::5::nil).
Time Eval vm_compute in list_of_vec _ _ (Pos.iter (rev _) v 10000000).

@silene silene added the part: VM Virtual machine. label Apr 9, 2024
@coqbot-app coqbot-app bot added the needs: full CI The latest GitLab pipeline that ran was a light CI. Say "@coqbot run full ci" to get a full CI. label Apr 9, 2024
@ppedrot
Copy link
Member

ppedrot commented Apr 9, 2024

Some quick remarks:

  • We need something like that for SProp anyways
  • I think that, like in the kernel, we want to separate conversion and reduction mode. For reduction we want to keep the accumulated terms around, for conversion we don't really care (except for completeness).

@silene
Copy link
Contributor Author

silene commented Apr 9, 2024

My point is that, even for reduction, you never want to keep the accumulated terms around. (And just to be clear, my pull request drops accumulated terms only for Qeded definitions, not for Inductives and Axioms, where it makes sense to keep them.)

@ppedrot
Copy link
Member

ppedrot commented Apr 9, 2024

@silene but if you want to go that way, why don't you just write a specific variant of the VM that just doesn't care about open terms? Most things you will ever want to compute are closed booleans anyway.

@silene
Copy link
Contributor Author

silene commented Apr 9, 2024

Unfortunately no. That was my original impression too, but it cannot be made to work. (I have a branch named VMaccu which dates back from 8 years ago and actually implemented that.)

The thing is, you are never computing closed booleans. You are always computing true = true, which is an open term!

@JasonGross
Copy link
Member

This pull request introduces a new kind of accumulator that discards its arguments.

This is great! I've wanted something like this (although more general than the VM, I've wanted it in terms in general) for a decade.

This accumulator is used for opaque definitions (i.e., Qed). This obviously breaks completeness when such definitions are not eliminated during computations, but I cannot imagine a case where the user would meaningfully want them to survive a reduction in the VM.

What are module-sealed definitions considered? I have use-cases that require leaving around sealed opaque constants from VM reduction. As long as this is possible, working around a lack of completeness around Qed-opaque constants should be easy.

The thing is, you are never computing closed booleans. You are always computing true = true, which is an open term!

How is eq_refl bool true open?

@silene
Copy link
Contributor Author

silene commented Apr 9, 2024

What are module-sealed definitions considered?

If I understand correctly what you mean, they are currently considered the same as axioms and inductive families:

Module Type T. Parameter P : bool -> Prop. End T.
Module M : T.
  Definition P : bool -> Prop. Proof. exact (fun b => b = true). Qed.
  Fail Eval vm_compute in P false. (* Error: vm_compute reduction has produced an incomplete term. *)
End M.
Eval vm_compute in M.P false. (* = M.P false : Prop *)

How is eq_refl bool true open?

It is closed, but this is not what the VM is called on (or any other reduction machinery of Coq). The VM is called on its type, which is eq b true, with b hopefully reducing to true, so that eq_refl true has a type convertible to eq b true.

Think about it. You always call vm_compute on a goal, which is a type, almost always axiomatic or inductive, hence open.

@JasonGross
Copy link
Member

You always call vm_compute on a goal, which is a type, almost always axiomatic or inductive, hence open.

Ah, right. I think the actual confusion on my end, though, was that inductive types (such as eq bool true true) are considered open.

@ppedrot
Copy link
Member

ppedrot commented Apr 9, 2024

almost always axiomatic or inductive, hence open.

There is no fundamental reason why we implement inductive types as accumulators. We know the arity of the inductive type statically, we could choose a representation that does not discard the head but treats it as a particular value of type Type.

@silene
Copy link
Contributor Author

silene commented Apr 9, 2024

First, the only thing an inductive type can do from a computational point of view is to accumulate the arguments that are passed to it. So, calling it an accumulator is just fine. We can tweak the implementation, but it will not fundamentally change what they are.

Second, if you mean that we do not need any of the other behaviors associated to accumulators (mostly, constructing match with around them), then I mostly agree. This is what the VMaccu branch was doing and it was a nice improvement on code complexity: "17 files changed, 121 insertions(+), 398 deletions(-)". But this has much stronger consequences and even the standard library needed modifications. So, I do not even want to imagine the amount of breakage in user developments.

@ppedrot
Copy link
Member

ppedrot commented Apr 9, 2024

But this has much stronger consequences and even the standard library needed modifications

What I am proposing is an internal change of representation of inductive types in the VM, it shouldn't affect any user-facing code. Basically, rather than implementing it as an accumulator, implement it as a closure of static arity (the length of the context of the inductive, essentially) that returns a block with a specific tag. Just like what the representation of Prod does, but with a different tag, and remembering the underlying inductive. This way you can have lossy accumulators but the VM still works when checking eq bool true true ≡ eq bool (eval big_term) true.

@silene
Copy link
Contributor Author

silene commented Apr 9, 2024

First, without non-lossy accumulators, there are breakage all over the place, which is why I gave up on the original VMaccu experiment. You could argue that this is because people have been using vm_compute in places where they should never have (e.g., vm_compute; destruct p, as encountered in the standard library). So, if we change vm_compute, it is their own fault when it breaks. But I prefer to be conservative and not break existing user developments. So, non-lossy accumulators are there to stay.

Second, a non-lossy accumulator is already a closure. We could add a new construct so that, for inductive types, these closures have a static arity instead of a dynamic one. But I doubt it would cause a noticeable speedup. In fact, the evaluation of the code for non-lossy accumulators (ACCUMULATE) is already much faster than the one for standard closures (RESTART; GRAB), which is kind of ironical. So, I don't see any point in making inductive types a special case, but I am surely misunderstanding what you propose.

@silene
Copy link
Contributor Author

silene commented Apr 10, 2024

@coqbot run full ci

@coqbot-app coqbot-app bot removed the needs: full CI The latest GitLab pipeline that ran was a light CI. Say "@coqbot run full ci" to get a full CI. label Apr 10, 2024
They do not aggregate information during computations and thus are fast
(no memory allocation, no costly copy). The downside is that the
conversion check will fail if any incomplete accumulator remains in the
final reduced term.
@silene silene added the request: full CI Use this label when you want your next push to trigger a full CI. label Apr 10, 2024
@coqbot-app coqbot-app bot removed the request: full CI Use this label when you want your next push to trigger a full CI. label Apr 10, 2024
Copy link
Contributor

coqbot-app bot commented Apr 10, 2024

🔴 CI failures at commit b13ae7f without any failure in the test-suite

✔️ Corresponding jobs for the base commit eccca3a succeeded

❔ Ask me to try to extract minimal test cases that can be added to the test-suite

🏃 @coqbot ci minimize will minimize the following targets: ci-aac_tactics, ci-mtac2, ci-perennial
  • You can also pass me a specific list of targets to minimize as arguments.

@silene
Copy link
Contributor Author

silene commented Apr 10, 2024

Regarding the failure in aac-tactics, it happens because aac_normalize calls vm_compute on a function env_sym that contains opaque proof terms about the reified symbols (e.g., of type Proper eq). Before, the strong normalization of env_sym was silently exploding but it was eventually succeeding. Now, it errors out due to the opaque proofs. Replacing vm_compute by cbn fixes the issue. (And it even makes sense, because computing the strong normal form of env_sym is just pointless.)

Copy link
Contributor

coqbot-app bot commented Apr 10, 2024

🔴 CI failures at commit cc5e20f without any failure in the test-suite

✔️ Corresponding jobs for the base commit eccca3a succeeded

❔ Ask me to try to extract minimal test cases that can be added to the test-suite

🏃 @coqbot ci minimize will minimize the following targets: ci-aac_tactics, ci-fiat_crypto, ci-fourcolor, ci-jasmin, ci-metacoq, ci-mtac2, ci-perennial
  • You can also pass me a specific list of targets to minimize as arguments.

@silene
Copy link
Contributor Author

silene commented Apr 10, 2024

For Perennial, the calls to vm_compute produce terms like

(Works val #(U64_val 6) = Works val #(U64_val 6))

which seems nice enough. But if you display the implicit arguments, you get a 1100-line term, which shows that its strong normalization did explode. And you cannot even replace the calls to vm_compute by something else, because the constants involved are "basically transparent but considered opaque for reduction", as Coq says. In other words, Perennial uses vm_compute to force reduction despite opacity.

@JasonGross
Copy link
Member

And you cannot even replace the calls to vm_compute by something else, because the constants involved are "basically transparent but considered opaque for reduction", as Coq says

Can we not replace it with a with_strategy ... cbv? Or if we need to make everything transparent, perhaps with_strategy can be extended with a way of specifying "all constants"?

I'm curious about @coqbot CI minimize ci-fiat_crypto

Copy link
Contributor

coqbot-app bot commented Apr 10, 2024

I am now running minimization at commit cc5e20f on requested target ci-fiat_crypto. I'll come back to you with the results once it's done.

@silene
Copy link
Contributor Author

silene commented Apr 10, 2024

For MetaCoq, vm_compute is called on terms such as

let (x, _) :=
   the_loop_terminate 1000000000 [] (NegSpaceClause [Eqv (Var 1) (Var 1)] [] [])
     {|
       M.this := M.Raw.Node Black M.Raw.Leaf (PureClause [] [] 1 eq_refl) M.Raw.Leaf;
       M.is_ok := M.Raw.add_ok M.Raw.Leaf (PureClause [] [] 1 eq_refl)
     |} (PureClause [] [] 1 eq_refl) in
x

Since the_loop_terminate is opaque (generated by Function), no meaningful result is produced before the pull request, and the normalization fails after the pull request. I wonder whether this code might have bitrotten and is no longer meaningful.

@silene
Copy link
Contributor Author

silene commented Apr 10, 2024

Can we not replace it with a with_strategy ... cbv?

Indeed, it works. The implicit arguments are still huge, so the strong normalization takes some noticeable time. (About one second to check 3 + 3 ~~> 6.)

But that is only if I know beforehand which constants to make transparent. If I try to blindly run cbv, it fills all the memory and Coq dumps its core. So, I have no way to guess which constants I need to make transparent. (There are about 200 constants that are opaque.)

Having some with_strategy every_transparent would indeed help.

@Blaisorblade
Copy link
Contributor

@Janno pointed out we have code normalizing AVL trees, and those embed Qed-opaque proofs. What then?

@JasonGross
Copy link
Member

@Janno pointed out we have code normalizing AVL trees, and those embed Qed-opaque proofs. What then?

If there is some small fixed set of Qed-opaque proofs, we can just seal them in modules.

@silene
Copy link
Contributor Author

silene commented Apr 10, 2024

we have code normalizing AVL trees, and those embed Qed-opaque proofs. What then?

So that there is no misunderstanding, there is an issue only if the final result produced by vm_compute still contains those Qed-opaque proofs. If they are only transient during the reduction, then there is no issue. (On the contrary, this pull request will drastically reduce the memory consumption in that case.)

@silene
Copy link
Contributor Author

silene commented Apr 10, 2024

Regarding the Mtac2 failure, as far as I can guess from the code, it seems that the failing tactic creates an identifier using M.nu, then vm_compute is called on an expression containing this identifier, and it chokes. So, the question is what is being used to represent identifiers. (Since it chokes, I suppose a Qeded constant plays the role of the fresh name, but it is not obvious from the code.)

@Janno
Copy link
Contributor

Janno commented Apr 10, 2024

Regarding the Mtac2 failure, as far as I can guess from the code, it seems that the failing tactic creates an identifier using M.nu, then vm_compute is called on an expression containing this identifier, and it chokes. So, the question is what is being used to represent identifiers. (Since it chokes, I suppose a Qeded constant plays the role of the fresh name, but it is not obvious from the code.)

M.nu generates fresh variables in the coq context. Unless the type given to M.nu contains Qeded constants I don't see why vm_compute would now fail.

@silene
Copy link
Contributor Author

silene commented Apr 10, 2024

As far as I can tell, the type seems innocuous. But how is the fresh variable added to the context? My understanding is probably naive (I prefer to stay away from this part of Coq), but I thought that Coq did not let you add an arbitrary name to the context. This name has to be backed up by an actual definition, or at the very least by an evar.

@Janno
Copy link
Contributor

Janno commented Apr 10, 2024

M.nu is always followed by M.abs_fun or some other abstraction primitive which then turns the variable into a binding term constructor. The binder must not escape the continuation of M.nu. If it does Mtac2 raises an exception. \nu (x : False), M.ret x is disallowed but \nu (x : False), M.abs_fun x x is fine because it simply returns fun x : False, x.

@JasonGross
Copy link
Member

Since it chokes, I suppose a Qeded constant plays the role of the fresh name, but it is not obvious from the code.

Is abstract used under the hood?

@JasonGross
Copy link
Member

I'm curious about @coqbot CI minimize ci-fiat_crypto

Minimizer seems to be having some trouble, but it looks like we're just using Compute on things that produce exist, and, from, OOB discussions with Andres, we're fine just replacing Compute with Eval compute in.

@silene
Copy link
Contributor Author

silene commented Apr 10, 2024

Sorry about the noise, M.nu was a red herring. I had missed that vm_compute was called on a term containing M.bind (why?!), which is opaque, thus causing an expected failure. So, a reduced example would be as follows:

Definition bar :=
  let m :=
    let w := reduce RedVmCompute (M.bind (M.ret 0) (fun v => M.ret v)) in
    w in
  ltac:(mrun m). (* Failure, as expected *)

But there is something extremely strange about this example, which is why I had a hard time getting rid of M.nu. As you can see, there is a trivial let-binding w above. If I manually reduce it, then it works, and I have absolutely no idea why.

Definition bar :=
  let m := reduce RedVmCompute (M.bind (M.ret 0) (fun v => M.ret v)) in
  ltac:(mrun m). (* Success ??? *)

@Janno
Copy link
Contributor

Janno commented Apr 10, 2024

As you can see, there is a trivial let-binding w above. If I manually reduce it, then it works, and I have absolutely no idea why.

let x := reduce t in .. is a trick to get definitional equality between x and t. It is in some sense an Mtac2 primitive. And it is only recognized in exactly that form. Removing the outer let binding leaves the interpreter with just reduce .. which is not recognized as anything special and thus treated as the identity function that it actually is defined as.

I had missed that vm_compute was called on a term containing M.bind (why?!)

I've now had a few minutes to re-familiarize myself with the code. We are looking at a meta-meta-program. Computation on meta-programs is to be expected. As to why I chose vm_compute instead of lazy: I cannot remember. I suspect I had much bigger test cases that ran faster with vm_compute but given that there are user terms (or, rather, types) involved I probably planned on switching to something else eventually. I suggest ignoring Mtac2 for now. If this PR progresses I will fix the breakage.

@Blaisorblade
Copy link
Contributor

@JasonGross @silene as a user, why should vm_compute treat Qed-opaque and module-opaque constants differently?

If the point is just a performance heuristic, it would be more orthogonal to treat it as such: discardability should rather be a tweakable per-constant discard flag. It can default to true for Qed terms and false otherwise, but it seems like it should be tweakable.
Alternatively, a global flag to control this could also be okay?

What are module-sealed definitions considered? I have use-cases that require leaving around sealed opaque constants from VM reduction. As long as this is possible, working around a lack of completeness around Qed-opaque constants should be easy.

That requires duplicating all statements, and moving all relevant clients from one constant to the other. Much more work than tweaking a flag. It's not even local work: if somebody wants to use vm_compute on something like FMapAVL.t , and there isn't already a separate "proof stripping" function, you need to write quite a lot of code for it.

@silene
Copy link
Contributor Author

silene commented Apr 11, 2024

why should vm_compute treat Qed-opaque and module-opaque constants differently?

Note that it is not so much the module-opaque constants that are treated differently. It is Qed versus Parameter. In one case, the user explicitly marked the body as being irrelevant for computations, while in the other case, there is not even a body in the first place (so it is hard to guess what the intent of the user was). It just so happens that module-opaque constants are indistinguishable from Parameters, as far as the VM is concerned.

Alternatively, a global flag to control this could also be okay?

No. Or rather it should be clear that such a flag can only impact future definitions (because the previous ones have already been compiled to bytecode), which is presumably not what you expect.

As for the local attribute, I am fine with it and I actually considered it. It is presumably a one-liner in safe_typing.ml, but I have just no idea what this line looks like.

if somebody wants to use vm_compute on something like FMapAVL.t

Let me stress again that it only matters if you expect the end result of vm_compute to actually contain values of type FMapAVL.t. If such Qed-encumbered values are only transient during the reduction, you will not notice any difference of behavior (except a speed up). Also, is FMapAVL.t even a good example? I was under the impression that it was a proof-free type:

Inductive tree {elt : Type} :=
  | Leaf : tree
  | Node : tree -> key -> elt -> tree -> int -> tree.

@ppedrot
Copy link
Member

ppedrot commented Apr 11, 2024

because the previous ones have already been compiled to bytecode

Why can't we parameterize the runtime behaviour of accumulators with a flag passed to the VM? This would be the same bytecode, just that it behaves differently depending on the context. Note that we already have a similar mechanism for SProp values in the kernel machine, so as to differentiate conversion (where we erase irrelevant values) from reduction (where we care about the underlying term).

@silene
Copy link
Contributor Author

silene commented Apr 11, 2024

Why can't we parameterize the runtime behaviour of accumulators with a flag passed to the VM?

Hmm... Let me think out loud. So, when the VM executes ACCUMULATE (or PROJ or ...) on a non-lossy accumulator of both tag 0 and inner tag 0 (case Aid (ConstKey _)), it would check the global boolean flag and turn the resulting accumulator into a lossy one. I guess that would work. (It might break vm_compute involving section variables, but who cares?) I will submit a parallel pull request.

However, it would prevent something I had on the old branch and that I would have liked to restore. When compiling an application (lossy arg1 arg2 ...), the bytecode compiler would eagerly drop the arguments, rather than having them evaluated and dynamically dropped at runtime. This was making a noticeable difference on performances. But obviously, such an optimization can only be performed if we know that opaque constants will be lossy. If the information is only available at runtime, the optimization becomes impossible.

@JasonGross
Copy link
Member

This was making a noticeable difference on performances. But obviously, such an optimization can only be performed if we know that opaque constants will be lossy. If the information is only available at runtime, the optimization becomes impossible.

Is the size of byte code or the time to generate byte code ever a bottleneck? If not, we can just generate two copies of the byte code for each definition, one to use in lossy mode and one to use in lossless mode.

@ppedrot
Copy link
Member

ppedrot commented Apr 11, 2024

size of byte code

Bytecode is already a major contributor to the size of vo files... (Probably for bad reasons, but that'd require fixing it first.)

@JasonGross
Copy link
Member

Hm, what about wrapping the entire application in some sort of "maybe lossy" wrapper? That is, do the equivalent of

Definition maybe_lossy {A} (f : unit -> A) : A := f tt.

and compiling (lossy arg1 arg2 ...) to (maybe_lossy (fun _ => lossy arg1 arg2 ...))? This way you won't bother evaluating the arguments.

@silene
Copy link
Contributor Author

silene commented Apr 11, 2024

So, when the VM executes ACCUMULATE (or PROJ or ...) on a non-lossy accumulator of both tag 0 and inner tag 0 (case Aid (ConstKey _)), it would check the global boolean flag and turn the resulting accumulator into a lossy one. I guess that would work.

So, #18927 shows that this criteria is not quite right, as it breaks all the primitive types (int, array, etc).

@SkySkimmer SkySkimmer added kind: performance Improvements to performance and efficiency. kind: redesign The same functionality is being re-implemented in a different way. and removed kind: redesign The same functionality is being re-implemented in a different way. labels Apr 16, 2024
@andres-erbsen
Copy link
Contributor

andres-erbsen commented Apr 17, 2024

Exciting work! On https://xavierleroy.org/publi/extensional-binary-tries.pdf benchmark "repeated", I get that this draft PR already reduces the slowdown from using a sigma type from 5x to 2.6x, and (EDIT) just 1.4x if instead using a dedicated primitive record.

@github-actions github-actions bot added the needs: rebase Should be rebased on the latest master to solve conflicts or have a newer CI run. label May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: performance Improvements to performance and efficiency. needs: rebase Should be rebased on the latest master to solve conflicts or have a newer CI run. part: VM Virtual machine.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants