speed up `lia` by vm_computing the certificate checker only at Qed time #15654

samuelgruetter · 2022-02-09T22:52:21Z

This issue is actually good news: lia has become so fast that now evaluating the reflective certificate checker becomes a noticeable proportion (ca 25% in one example) of its running time.

Here's an example of the kind of goals that I'd like to solve as fast as possible:

Require Import Coq.ZArith.ZArith. Open Scope Z_scope.
Require Import Coq.micromega.Lia.

Unset Lia Cache. (* to get reproducible timings, makes quite a difference *)

Lemma foo: forall (word : Type) (xs x : list word) (w2 w3 : Z),
 (w2 - w3) mod 2 ^ 32 <> 0 ->
 (w2 - w3) mod 2 ^ 32 = 8 * Z.of_nat (length x) ->
 forall w0 w1 : Z,
 (w0 - w1) mod 2 ^ 32 = 8 * Z.of_nat (length xs) ->
 0 <= w0 < 2 ^ 32 ->
 0 <= w1 < 2 ^ 32 ->
 0 <= w2 < 2 ^ 32 ->
 0 <= w3 < 2 ^ 32 ->
 (w2 - ((w3 + ((w2 - w3) mod 2 ^ 32 / 2 ^ 4 * 2 ^ 3) mod 2 ^ 32) mod 2 ^ 32
  + 8 mod 2 ^ 32) mod 2 ^ 32) mod 2 ^ 32 =
 8 * Z.of_nat (length x - S (Z.to_nat
         (((w3 + ((w2 - w3) mod 2 ^ 32 / 2 ^ 4 * 2 ^ 3) mod 2 ^ 32) mod 2 ^ 32 - w3)
          mod 2 ^ 32 / (8 mod 2 ^ 32)))).
Proof.
  intros. Z.div_mod_to_equations.

  Set Ltac Profiling.
  Time lia.
  Show Ltac Profile.

Time Qed. (* 0.34 secs *)

It displays the following Ltac profile on my machine:

total time:      0.387s

 tactic                                   local  total   calls       max
────────────────────────────────────────┴──────┴──────┴───────┴─────────┘
─lia -----------------------------------   0.0% 100.0%       1    0.387s
─xlia (tactic) -------------------------  71.1%  97.4%       1    0.377s
─zchecker ------------------------------   0.0%  26.3%       1    0.102s
─exact (uconstr) -----------------------  25.6%  25.6%       1    0.099s
─Zify.zify -----------------------------   0.1%   2.6%       1    0.010s

 tactic                                   local  total   calls       max
────────────────────────────────────────┴──────┴──────┴───────┴─────────┘
─lia -----------------------------------   0.0% 100.0%       1    0.387s
 ├─xlia (tactic) -----------------------  71.1%  97.4%       1    0.377s
 │└zchecker ----------------------------   0.0%  26.3%       1    0.102s
 │└exact (uconstr) ---------------------  25.6%  25.6%       1    0.099s
 └─Zify.zify ---------------------------   0.1%   2.6%       1    0.010s

In this issue, I'll focus on the 25.6% of the time spent in exact (uconstr) (but of course, if this example inspires @fajb to optimize the xlia part that takes 71.1%, that would be a welcome side effect of this issue... 😉).

Currently, zchecker is implemented as follows:

Ltac zchecker :=
  let __wit := fresh "__wit" in
  let __varmap := fresh "__varmap" in
  let __ff := fresh "__ff" in
  intros __wit __varmap __ff; exact
   (ZMicromega.ZTautoChecker_sound __ff __wit
      (@eq_refl bool true <: @eq bool (ZMicromega.ZTautoChecker __ff __wit) true)
      (VarMap.find 0 __varmap))

Note that <: means VMCast, ie it uses vm_compute to prove that the inferred type matches the annotated type.
But this means that vm_compute is run twice: Once during the invocation of zchecker (even though we already know the equality check will succeed) and a second time when checking the Qed.

Now, if I override zchecker as follows:

Ltac zchecker ::=
  let __wit := fresh "__wit" in
  let __varmap := fresh "__varmap" in
  let __ff := fresh "__ff" in
  intros __wit __varmap __ff;
  simple refine (ZMicromega.ZTautoChecker_sound __ff __wit _ (VarMap.find 0 __varmap));
  vm_cast_no_check (@eq_refl bool true).

and remove the Ltac Profiling commands, but keep the Time commands, and run in bash

time for i in {1..10}; do coqc lia_slow_exact.v; done

and import the printed numbers in a spreadsheet, and add up all lia times and all Qed times, the lia time decreases by 25%, and the Qed time remains the same (I think the 1% improvement I measured is just noise).

So it might make sense to open a PR that changes zchecker as suggested above, but since I'm not very familiar with vm_cast_no_check wizardry, nor with lia internals, I first wanted to ask if someone (@JasonGross ?) has ideas to improve the vm_cast_no_check aspect, or if someone (@fajb ?) has comments on the lia aspects?

Coq version: 8.15+rc1

The text was updated successfully, but these errors were encountered:

JasonGross · 2022-02-10T00:53:36Z

I think this used to be wrapped in abstract vm_cast_no_check to speed up Qed-checking-time ... I don't recall why that was dropped (maybe to make lia support evars?), but maybe it's worth resurrecting abstract?

samuelgruetter · 2022-02-10T01:34:06Z

Replacing vm_cast_no_check (@eq_refl bool true) by abstract (vm_cast_no_check (@eq_refl bool true)) increases both the running time of lia and of Qed by more than 2x on this example... Or did you mean something else?

JasonGross · 2022-02-10T05:42:48Z

It's supposed to move the time from Qed to the tactic call, not double them both. What if you replace Qed with Defined? What if you replace abstract with clear; abstract?

fajb · 2022-02-10T09:01:29Z

Hi,
If I recall correctly, abstract was dropped because it failed in rare cases. I don't know the current status but abstract used to have a bad reputation but ZArith_hints calls abstract lia.
It may be also be worth looking at the abstract generated lemma to make sure instantiation is nice.
As @JasonGross said, abstract is generated on the fly an intermediate lemma that is therefore only typechecked once.
@samuelgruetter , vm_cast_no_check means that the conversion is not run. The proof system trusts that it will return true. If there is a bug in lia you will only get a typing error at Qed. So wrapping abstract around vm_cast_no_check localise the potential error.

samuelgruetter · 2022-02-10T16:50:48Z

Am I assuming correctly that if zchecker fails, it's always a bug?

If yes, I'd actually prefer to run all the certificate checking at Qed time, rather than to slow down every interactive step. One single lia invocation is not too bad, but if each interactive step tries to rewrite in * and uses lia as the sidecondition solver for the rewrite lemmas, the time spent in lia quickly adds up and makes the interactive experience painfully slow. On the other hand, having to take a short break at each Qed is much less of a problem.

As for being able to better localize the error: If someone reports that they get a typechecking error at Qed involving (ZMicromega.ZTautoChecker __ff __wit), it should be quite obvious that it's a lia problem, and in order to localize the error, we could advise the person reporting the bug to override zchecker with ::= and to replace vm_cast_no_check by exact.

Blaisorblade · 2022-02-14T17:53:16Z

As for being able to better localize the error: If someone reports that they get a typechecking error at Qed involving (ZMicromega.ZTautoChecker __ff __wit), it should be quite obvious that it's a lia problem, and in order to localize the error, we could advise the person reporting the bug to override zchecker with ::= and to replace vm_cast_no_check by exact.

I wonder if we could instead get a mutable flag to set, whatever the default? That way a user can enable/disable the early checking for their development without duplicating the zchecker body, and having to keep it in sync with upstream.

fajb · 2022-02-14T18:07:04Z

I like the idea of a Ltac flag.
Expert users get a performance gain by disabling a safety check.
This is transparent for the rest of us.

silene · 2022-02-15T07:54:50Z

For the record, the following is what I use in CoqInterval:

Ltac do_reduction nocheck native :=
  clear ;
  lazymatch nocheck with
  | true =>
    match native with
    | true => native_cast_no_check (eq_refl true)
    | false => vm_cast_no_check (eq_refl true)
    end
  | false =>
    (abstract
    match native with
    | true => native_cast_no_check (eq_refl true)
    | false => vm_cast_no_check (eq_refl true)
    end) ||
    fail "Numerical evaluation failed to conclude. You may want to adjust some parameters"
  end.

In other words, reflexive tactics are controlled by two boolean flags. One flag indicates whether the proof term should be type-checked immediately (default behavior) or if the check should be delayed at Qed time. The other flag indicates whether to use vm_compute (default behavior) or native_compute.

Alizter added kind: design discussion Discussion about the design of a feature. kind: performance Improvements to performance and efficiency. part: micromega The lia, nia, lra, nra and psatz tactics. Also the legacy omega tactic. labels Feb 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed up `lia` by vm_computing the certificate checker only at Qed time #15654

speed up `lia` by vm_computing the certificate checker only at Qed time #15654

samuelgruetter commented Feb 9, 2022

JasonGross commented Feb 10, 2022

samuelgruetter commented Feb 10, 2022

JasonGross commented Feb 10, 2022

fajb commented Feb 10, 2022

samuelgruetter commented Feb 10, 2022

Blaisorblade commented Feb 14, 2022

fajb commented Feb 14, 2022

silene commented Feb 15, 2022 •

edited

speed up lia by vm_computing the certificate checker only at Qed time #15654

speed up lia by vm_computing the certificate checker only at Qed time #15654

Comments

samuelgruetter commented Feb 9, 2022

JasonGross commented Feb 10, 2022

samuelgruetter commented Feb 10, 2022

JasonGross commented Feb 10, 2022

fajb commented Feb 10, 2022

samuelgruetter commented Feb 10, 2022

Blaisorblade commented Feb 14, 2022

fajb commented Feb 14, 2022

silene commented Feb 15, 2022 • edited

speed up `lia` by vm_computing the certificate checker only at Qed time #15654

speed up `lia` by vm_computing the certificate checker only at Qed time #15654

silene commented Feb 15, 2022 •

edited