Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up lia by vm_computing the certificate checker only at Qed time #15654

Open
samuelgruetter opened this issue Feb 9, 2022 · 8 comments
Open
Labels
kind: design discussion Discussion about the design of a feature. kind: performance Improvements to performance and efficiency. part: micromega The lia, nia, lra, nra and psatz tactics. Also the legacy omega tactic.

Comments

@samuelgruetter
Copy link
Contributor

This issue is actually good news: lia has become so fast that now evaluating the reflective certificate checker becomes a noticeable proportion (ca 25% in one example) of its running time.

Here's an example of the kind of goals that I'd like to solve as fast as possible:

Require Import Coq.ZArith.ZArith. Open Scope Z_scope.
Require Import Coq.micromega.Lia.

Unset Lia Cache. (* to get reproducible timings, makes quite a difference *)

Lemma foo: forall (word : Type) (xs x : list word) (w2 w3 : Z),
 (w2 - w3) mod 2 ^ 32 <> 0 ->
 (w2 - w3) mod 2 ^ 32 = 8 * Z.of_nat (length x) ->
 forall w0 w1 : Z,
 (w0 - w1) mod 2 ^ 32 = 8 * Z.of_nat (length xs) ->
 0 <= w0 < 2 ^ 32 ->
 0 <= w1 < 2 ^ 32 ->
 0 <= w2 < 2 ^ 32 ->
 0 <= w3 < 2 ^ 32 ->
 (w2 - ((w3 + ((w2 - w3) mod 2 ^ 32 / 2 ^ 4 * 2 ^ 3) mod 2 ^ 32) mod 2 ^ 32
  + 8 mod 2 ^ 32) mod 2 ^ 32) mod 2 ^ 32 =
 8 * Z.of_nat (length x - S (Z.to_nat
         (((w3 + ((w2 - w3) mod 2 ^ 32 / 2 ^ 4 * 2 ^ 3) mod 2 ^ 32) mod 2 ^ 32 - w3)
          mod 2 ^ 32 / (8 mod 2 ^ 32)))).
Proof.
  intros. Z.div_mod_to_equations.

  Set Ltac Profiling.
  Time lia.
  Show Ltac Profile.

Time Qed. (* 0.34 secs *)

It displays the following Ltac profile on my machine:

total time:      0.387s

 tactic                                   local  total   calls       max
────────────────────────────────────────┴──────┴──────┴───────┴─────────┘
─lia -----------------------------------   0.0% 100.0%       1    0.387s
─xlia (tactic) -------------------------  71.1%  97.4%       1    0.377s
─zchecker ------------------------------   0.0%  26.3%       1    0.102s
─exact (uconstr) -----------------------  25.6%  25.6%       1    0.099s
─Zify.zify -----------------------------   0.1%   2.6%       1    0.010s

 tactic                                   local  total   calls       max
────────────────────────────────────────┴──────┴──────┴───────┴─────────┘
─lia -----------------------------------   0.0% 100.0%       1    0.387s
 ├─xlia (tactic) -----------------------  71.1%  97.4%       1    0.377s
 │└zchecker ----------------------------   0.0%  26.3%       1    0.102s
 │└exact (uconstr) ---------------------  25.6%  25.6%       1    0.099s
 └─Zify.zify ---------------------------   0.1%   2.6%       1    0.010s

In this issue, I'll focus on the 25.6% of the time spent in exact (uconstr) (but of course, if this example inspires @fajb to optimize the xlia part that takes 71.1%, that would be a welcome side effect of this issue... 😉).

Currently, zchecker is implemented as follows:

Ltac zchecker :=
  let __wit := fresh "__wit" in
  let __varmap := fresh "__varmap" in
  let __ff := fresh "__ff" in
  intros __wit __varmap __ff; exact
   (ZMicromega.ZTautoChecker_sound __ff __wit
      (@eq_refl bool true <: @eq bool (ZMicromega.ZTautoChecker __ff __wit) true)
      (VarMap.find 0 __varmap))

Note that <: means VMCast, ie it uses vm_compute to prove that the inferred type matches the annotated type.
But this means that vm_compute is run twice: Once during the invocation of zchecker (even though we already know the equality check will succeed) and a second time when checking the Qed.

Now, if I override zchecker as follows:

Ltac zchecker ::=
  let __wit := fresh "__wit" in
  let __varmap := fresh "__varmap" in
  let __ff := fresh "__ff" in
  intros __wit __varmap __ff;
  simple refine (ZMicromega.ZTautoChecker_sound __ff __wit _ (VarMap.find 0 __varmap));
  vm_cast_no_check (@eq_refl bool true).

and remove the Ltac Profiling commands, but keep the Time commands, and run in bash

time for i in {1..10}; do coqc lia_slow_exact.v; done

and import the printed numbers in a spreadsheet, and add up all lia times and all Qed times, the lia time decreases by 25%, and the Qed time remains the same (I think the 1% improvement I measured is just noise).

So it might make sense to open a PR that changes zchecker as suggested above, but since I'm not very familiar with vm_cast_no_check wizardry, nor with lia internals, I first wanted to ask if someone (@JasonGross ?) has ideas to improve the vm_cast_no_check aspect, or if someone (@fajb ?) has comments on the lia aspects?

Coq version: 8.15+rc1

@JasonGross
Copy link
Member

I think this used to be wrapped in abstract vm_cast_no_check to speed up Qed-checking-time ... I don't recall why that was dropped (maybe to make lia support evars?), but maybe it's worth resurrecting abstract?

@Alizter Alizter added kind: design discussion Discussion about the design of a feature. kind: performance Improvements to performance and efficiency. part: micromega The lia, nia, lra, nra and psatz tactics. Also the legacy omega tactic. labels Feb 10, 2022
@samuelgruetter
Copy link
Contributor Author

Replacing vm_cast_no_check (@eq_refl bool true) by abstract (vm_cast_no_check (@eq_refl bool true)) increases both the running time of lia and of Qed by more than 2x on this example... Or did you mean something else?

@JasonGross
Copy link
Member

It's supposed to move the time from Qed to the tactic call, not double them both. What if you replace Qed with Defined? What if you replace abstract with clear; abstract?

@fajb
Copy link
Contributor

fajb commented Feb 10, 2022

Hi,
If I recall correctly, abstract was dropped because it failed in rare cases. I don't know the current status but abstract used to have a bad reputation but ZArith_hints calls abstract lia.
It may be also be worth looking at the abstract generated lemma to make sure instantiation is nice.
As @JasonGross said, abstract is generated on the fly an intermediate lemma that is therefore only typechecked once.
@samuelgruetter , vm_cast_no_check means that the conversion is not run. The proof system trusts that it will return true. If there is a bug in lia you will only get a typing error at Qed. So wrapping abstract around vm_cast_no_check localise the potential error.

@samuelgruetter
Copy link
Contributor Author

Am I assuming correctly that if zchecker fails, it's always a bug?

If yes, I'd actually prefer to run all the certificate checking at Qed time, rather than to slow down every interactive step. One single lia invocation is not too bad, but if each interactive step tries to rewrite in * and uses lia as the sidecondition solver for the rewrite lemmas, the time spent in lia quickly adds up and makes the interactive experience painfully slow. On the other hand, having to take a short break at each Qed is much less of a problem.

As for being able to better localize the error: If someone reports that they get a typechecking error at Qed involving (ZMicromega.ZTautoChecker __ff __wit), it should be quite obvious that it's a lia problem, and in order to localize the error, we could advise the person reporting the bug to override zchecker with ::= and to replace vm_cast_no_check by exact.

@Blaisorblade
Copy link
Contributor

As for being able to better localize the error: If someone reports that they get a typechecking error at Qed involving (ZMicromega.ZTautoChecker __ff __wit), it should be quite obvious that it's a lia problem, and in order to localize the error, we could advise the person reporting the bug to override zchecker with ::= and to replace vm_cast_no_check by exact.

I wonder if we could instead get a mutable flag to set, whatever the default? That way a user can enable/disable the early checking for their development without duplicating the zchecker body, and having to keep it in sync with upstream.

@fajb
Copy link
Contributor

fajb commented Feb 14, 2022

I like the idea of a Ltac flag.
Expert users get a performance gain by disabling a safety check.
This is transparent for the rest of us.

@silene
Copy link
Contributor

silene commented Feb 15, 2022

For the record, the following is what I use in CoqInterval:

Ltac do_reduction nocheck native :=
  clear ;
  lazymatch nocheck with
  | true =>
    match native with
    | true => native_cast_no_check (eq_refl true)
    | false => vm_cast_no_check (eq_refl true)
    end
  | false =>
    (abstract
    match native with
    | true => native_cast_no_check (eq_refl true)
    | false => vm_cast_no_check (eq_refl true)
    end) ||
    fail "Numerical evaluation failed to conclude. You may want to adjust some parameters"
  end.

In other words, reflexive tactics are controlled by two boolean flags. One flag indicates whether the proof term should be type-checked immediately (default behavior) or if the check should be delayed at Qed time. The other flag indicates whether to use vm_compute (default behavior) or native_compute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: design discussion Discussion about the design of a feature. kind: performance Improvements to performance and efficiency. part: micromega The lia, nia, lra, nra and psatz tactics. Also the legacy omega tactic.
Projects
None yet
Development

No branches or pull requests

6 participants