Major `lia` and `Qed` performance regression in 8.10.1 #11063

robbertkrebbers · 2019-11-07T13:56:44Z

Description of the problem

As our CI indicates, there are some major performance regressions (of 305%) in std++ when switching from Coq 8.9.x to 8.10.1. See here.

I managed to extract at least one concrete minimal test case for the regression. The Time Qed in the below takes close to 0 time with Coq 8.9.x and more than 1 sec with Coq 8.10.1.

Require Import Psatz ZArith Program.

Arguments compose _ _ _ _ _ _ / : assert.

Definition max_infinite {A} (f : A -> A -> A) (a : A) (lt : A -> A -> Prop)
  (Hf : forall x x' y, f x (f x' y) = f x' (f x y)) : True := I.

Program Definition Z_infinite: True :=
  max_infinite (Z.max ∘ Z.succ) 0%Z Z.lt _.
Next Obligation. intros; simpl; lia. Time Qed.

The actual (non-minimized) code can be found here: https://gitlab.mpi-sws.org/iris/stdpp/blob/master/theories/infinite.v#L116

Coq Version

$ coqtop -v
The Coq Proof Assistant, version 8.10.1 (October 2019)
compiled on Oct 25 2019 10:21:28 with OCaml 4.05.0

The text was updated successfully, but these errors were encountered:

robbertkrebbers · 2019-11-07T14:01:18Z

This is strange. If I copy/paste the goal in the Next Obligation into a separate Lemma then it appears to be all fine:

Lemma foo x x' y :
  Z.max (Z.succ x) (Z.max (Z.succ x') y) = Z.max (Z.succ x') (Z.max (Z.succ x) y).
Proof. lia. Time Qed. (* takes close to 0 time *)

Janno · 2019-11-07T16:01:00Z

git bisect has chosen a winner:

6f1634d2f822037a482436a64d3ef3bfb2fac2a0 is the first bad commit
commit 6f1634d2f822037a482436a64d3ef3bfb2fac2a0
Author: Frédéric Besson <frederic.besson@inria.fr>
Date:   Fri Mar 8 09:56:49 2019 +0100

    Several improvements and fixes of Lia
    
    - Improved reification for Micromega (support for #8764)
    
    - Fixes #9268: Do not take universes into account in lia reification
    
      Improve #9291 by threading the evar_map during reification.
      Universes are unified.
    
    - Remove (potentially cyclic) dependency over lra for Rle_abs
    
    - Towards a complete simplex-based lia
    
      fixes  #9615
    
    Lia is now exclusively using cutting plane proofs.
    For this to always work, all the variables need to be positive.
    Therefore, lia is pre-processing the goal  for each variable x
    it introduces the constraints x = y - z , y>=0 , z >= 0
    for some fresh variable y and z.
    
    For scalability, nia is currently NOT performing this pre-processing.
    
    - Lia is using the FSet library
    
      manual merge of commit #230899e87c51c12b2f21b6fedc414d099a1425e4
      to work around a "leaked" hint breaking compatibility of eauto

 CHANGES.md                          |   5 +
 doc/sphinx/proof-engine/tactics.rst |   2 +
 doc/stdlib/hidden-files             |   1 +
 plugins/micromega/DeclConstant.v    |  68 +++
 plugins/micromega/Lia.v             |  19 +-
 plugins/micromega/Lqa.v             |   3 +-
 plugins/micromega/Lra.v             |   2 +-
 plugins/micromega/MExtraction.v     |   6 +-
 plugins/micromega/QMicromega.v      |  37 +-
 plugins/micromega/RMicromega.v      | 289 ++++++++---
 plugins/micromega/Refl.v            |  15 +
 plugins/micromega/RingMicromega.v   |  68 +--
 plugins/micromega/Tauto.v           | 675 +++++++++++++++++---------
 plugins/micromega/VarMap.v          |  30 +-
 plugins/micromega/ZMicromega.v      | 533 ++++++++++++++++++--
 plugins/micromega/certificate.ml    | 236 +++++----
 plugins/micromega/certificate.mli   |  17 +-
 plugins/micromega/coq_micromega.ml  | 942 ++++++++++++++++++------------------
 plugins/micromega/coq_micromega.mli |   1 +
 plugins/micromega/g_micromega.mlg   |   3 +
 plugins/micromega/micromega.ml      | 554 ++++++++++++++++-----
 plugins/micromega/micromega.mli     | 294 ++++++++---
 plugins/micromega/mutils.ml         |  74 ++-
 plugins/micromega/mutils.mli        |  19 +-
 plugins/micromega/polynomial.ml     | 233 +++++++--
 plugins/micromega/polynomial.mli    |  48 ++
 plugins/micromega/simplex.ml        | 216 +++++----
 plugins/micromega/vect.ml           |  43 ++
 plugins/micromega/vect.mli          |  19 +-
 test-suite/.csdp.cache              | Bin 165200 -> 169367 bytes
 test-suite/micromega/example_nia.v  |   6 +
 test-suite/micromega/rsyntax.v      |  75 +++
 test-suite/micromega/zomicron.v     |  39 +-
 test-suite/output/MExtraction.v     |   6 +-
 theories/FSets/FSetEqProperties.v   |  10 +-
 theories/FSets/FSetProperties.v     |  38 +-
 theories/Reals/Rbasic_fun.v         |   5 +-
 37 files changed, 3346 insertions(+), 1285 deletions(-)
 create mode 100644 plugins/micromega/DeclConstant.v
 create mode 100644 test-suite/micromega/rsyntax.v

This was done by looking for commits with a ~5x slowdown. The commit itself is about 10x slower on my machine. It is possible that there has been a gradual decline totaling less than 5x slowdown before this commit but I somehow doubt that judging by the commit message and the scope of the commit.

robbertkrebbers · 2019-11-07T16:03:32Z

cc @fajb

Blaisorblade · 2019-11-07T16:04:14Z

What happens with Unset Simplex? Per release notes:

The tactics lia, nia, lra, nra are now using a novel Simplex-based proof engine. In case of regression, unset Simplex to get the venerable Fourier-based engine (#8457, by Fréderic Besson).

Janno · 2019-11-07T16:05:28Z

What happens with Unset Simplex? Per release notes:

Same time on my machine (at that exact commit, at least.)

ppedrot · 2019-11-07T16:10:44Z

Profiling indicates that the time is passed in kernel conversion, so most likely some conversion problem went wrong. We should diff the terms generated by the two versions.

fajb · 2019-11-07T17:34:02Z

The culprit seems to be lia but:

This is fast:
Time Definition Z_infinite: True:= (max_infinite (Z.max ∘ Z.succ) 0%Z Z.lt (ltac:( (intros ; simpl; lia)))).

This is also fast:
Program Definition Z_infinite3: True := max_infinite (Z.max ∘ Z.succ) 0%Z Z.lt _. Next Obligation. intros; simpl; nia. Time Qed.

In this case, nia is using a less costly algorithm -- the proof by reflection is cheaper.
Yet, this does not completely explain why this has only a significant impact in the context of Program.

SkySkimmer · 2019-11-07T18:47:03Z

The prg_reduce which is

coq/vernac/obligations.ml

Lines 265 to 268 in 0afbbb0

    
           let reduce c = 
        
             let env = Global.env () in 
        
             let sigma = Evd.from_env env in 
        
             EConstr.Unsafe.to_constr (Reductionops.clos_norm_flags CClosure.betaiota env sigma (EConstr.of_constr c))

changes the proof in such a way that checking it becomes slower (tested by replacing it by the identity, which makes the Qed fast). AFAICT the specific issue is a disappearing VM cast.

Diff between doing lia in a Lemma vs obligation:

@@ -1,12 +1,8 @@
-orig_proof =
+Z_infinite_obligation_1 = 
 fun x x' y : Z =>
-(fun
-   lemma : (fun t : Z =>
-            Z.max (Z.succ x) (Z.max (Z.succ x') y) = Z.max (Z.succ x') (Z.max (Z.succ x) y) <->
-            t = Z.max (x' + 1) (Z.max (x + 1) y)) (Z.max (x + 1) (Z.max (x' + 1) y)) =>
- Morphisms.iff_flip_impl_subrelation
-   (Z.max (Z.succ x) (Z.max (Z.succ x') y) = Z.max (Z.succ x') (Z.max (Z.succ x) y))
-   (Z.max (x + 1) (Z.max (x' + 1) y) = Z.max (x' + 1) (Z.max (x + 1) y)) lemma)
+Morphisms.iff_flip_impl_subrelation
+  (Z.max (Z.succ x) (Z.max (Z.succ x') y) = Z.max (Z.succ x') (Z.max (Z.succ x) y))
+  (Z.max (x + 1) (Z.max (x' + 1) y) = Z.max (x' + 1) (Z.max (x + 1) y))
   (eq_ind (Z.max (Z.succ x) (Z.max (Z.succ x') y))
      (fun t : Z =>
       Z.max (Z.succ x) (Z.max (Z.succ x') y) = Z.max (Z.succ x') (Z.max (Z.succ x) y) <->
@@ -755,8 +751,9 @@
                                  RingMicromega.Flhs := EnvRing.PEX 1;
                                  RingMicromega.Fop := RingMicromega.OpEq;
                                  RingMicromega.Frhs := EnvRing.PEX 2 |} ())))))))) in
-     ZMicromega.ZTautoCheckerExt_sound __ff __wit
-       (eq_refl <: ZMicromega.ZTautoCheckerExt __ff __wit = true) (VarMap.find 0%Z __varmap) in
+     ZMicromega.ZTautoCheckerExt_sound __ff __wit eq_refl (VarMap.find 0%Z __varmap) in
    __arith z6 z4 y z3 z0 x' z2 x z z5 z1 H2 H1 Heqz4 H0 H Heqz0 Heqz2 Heqz)
      : forall x x' y : Z,
-       Z.max (Z.succ x) (Z.max (Z.succ x') y) = Z.max (Z.succ x') (Z.max (Z.succ x) y)
+       (Z.max ∘ Z.succ) x ((Z.max ∘ Z.succ) x' y) = (Z.max ∘ Z.succ) x' ((Z.max ∘ Z.succ) x y)

I don't think the fun lemma cut is critical so it's probably the vm cast.

Janno · 2019-11-08T19:21:20Z

I am not quite following. Why was this particular commit responsible for the slowdown? It didn't change whether or not Program reduces anything, and I don't think it changed how the kernel removes cast during reduction. The commit also did not introduce new casts around eq_refl. So what exactly changed? Did the commit introduce new occurrences with eq_refl <: .. whose performance relies on the cast not being removed accidentally? (Whereas any existing occurrences, if any, were not so sensitive to having their corresponding cast accidentally removed by Program?)

SkySkimmer · 2019-11-08T19:51:17Z

I think that commit made the produced term more complicated, which makes the use of the vm more important.
The timings look like this:

|--------+------------------+------------+--------------------|
|        | standalone lemma | obligation | obligation / lemma |
|--------+------------------+------------+--------------------|
|    8.9 | 0.02s            | 0.07s      |                3.5 |
|--------+------------------+------------+--------------------|
| master | 0.07s            | 0.8s       |               11.5 |
|--------+------------------+------------+--------------------|

robbertkrebbers · 2019-11-08T19:53:34Z

I would say that the factor 3.5 on 8.9 is already suspicious. Why should the obligation be more expensive to check?

RalfJung · 2019-12-02T14:19:54Z

@SkySkimmer is there a way to get that VM cast back in without reverting the entire patch?

It's not just std++ that is affected by this: the already very slow weak-memory version of lambdaRust got slowed down from 17min to 25min of overall execution time. Here's the per-file instruction count diff.

RalfJung · 2019-12-02T15:34:26Z

Actually that problem seems unrelated to lia after all -- stubbing out most lia calls in the two slowest files does not make any difference.

maximedenes · 2019-12-05T14:58:00Z

So it seems there are two problems here:

Program should not throw away VM casts
The lia commit mentioned above has introduced a performance regression

I can try to fix the first, @fajb can you comment on the second?

fajb · 2019-12-05T15:21:38Z

I am working on it PR #11047
I still have a few things to fix but I am optimistic.
If VM casts are not thrown away, will there be a noticeable regression?
Anyway, the PR should fix this.

- PR coq#9725 fixes completness bugs introduces some inefficiency. The current PR intends to fix the inefficiency while retaining completness. The fix removes a pre-processing step and instead relies on a more elaborate proof format introducing positivity constraints on the fly. fixes coq#11063, fixes coq#11242 - Better interval analysis for product of variables. (fixes coq#11191 , first case) - Generation of positivity constrains for Z.pow (fixes coq#11191 , second case) - More aggressive pruning of (useless) hypotheses

PR coq#9725 fixes completness bugs introduces some inefficiency. The current PR intends to fix the inefficiency while retaining completness. The fix removes a pre-processing step and instead relies on a more elaborate proof format introducing positivity constraints on the fly. Solve bootstrapping issues: RMicromega <-> Rbase <-> lia. Fixes coq#11063 and fixes coq#11242

PR coq#9725 fixes completness bugs introduces some inefficiency. The current PR intends to fix the inefficiency while retaining completness. The fix removes a pre-processing step and instead relies on a more elaborate proof format introducing positivity constraints on the fly. Solve bootstrapping issues: RMicromega <-> Rbase <-> lia. Fixes coq#11063 and fixes coq#11242 and fixes coq#11270

PR coq#9725 fixes completness bugs introduces some inefficiency. The current PR intends to fix the inefficiency while retaining completness. The fix removes a pre-processing step and instead relies on a more elaborate proof format introducing positivity constraints on the fly. Solve bootstrapping issues: RMicromega <-> Rbase <-> lia. Fixes coq#11063 and fixes coq#11242 and fixes coq#11270 (cherry picked from commit 7d961a9)

robbertkrebbers added kind: regression Problems that were not present in previous versions. kind: performance Improvements to performance and efficiency. labels Nov 7, 2019

RalfJung added the Coq Consortium support services Reporter subscribed Coq Consortium support services and requests processing by Consortium engineers. label Dec 2, 2019

maximedenes self-assigned this Dec 5, 2019

maximedenes mentioned this issue Dec 5, 2019

Lia: various impovements (support for #8764, fix #9268 and #9615) #9725

Merged

fajb mentioned this issue Dec 6, 2019

[micromega] primitive support for boolean operators #11047

Closed

maximedenes mentioned this issue Dec 6, 2019

lia performance regression #11242

Closed

fajb mentioned this issue Dec 9, 2019

[micromega] fix efficiency regression #11263

Merged

maximedenes closed this as completed in 7d961a9 Dec 18, 2019

coqbot added this to the 8.11.0 milestone Dec 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major `lia` and `Qed` performance regression in 8.10.1 #11063

Major `lia` and `Qed` performance regression in 8.10.1 #11063

robbertkrebbers commented Nov 7, 2019 •

edited

Loading

robbertkrebbers commented Nov 7, 2019 •

edited

Loading

Janno commented Nov 7, 2019

robbertkrebbers commented Nov 7, 2019

Blaisorblade commented Nov 7, 2019

Janno commented Nov 7, 2019

ppedrot commented Nov 7, 2019

fajb commented Nov 7, 2019

SkySkimmer commented Nov 7, 2019 •

edited

Loading

Janno commented Nov 8, 2019

SkySkimmer commented Nov 8, 2019

robbertkrebbers commented Nov 8, 2019

RalfJung commented Dec 2, 2019

RalfJung commented Dec 2, 2019

maximedenes commented Dec 5, 2019

fajb commented Dec 5, 2019

Major lia and Qed performance regression in 8.10.1 #11063

Major lia and Qed performance regression in 8.10.1 #11063

Comments

robbertkrebbers commented Nov 7, 2019 • edited Loading

Description of the problem

Coq Version

robbertkrebbers commented Nov 7, 2019 • edited Loading

Janno commented Nov 7, 2019

robbertkrebbers commented Nov 7, 2019

Blaisorblade commented Nov 7, 2019

Janno commented Nov 7, 2019

ppedrot commented Nov 7, 2019

fajb commented Nov 7, 2019

SkySkimmer commented Nov 7, 2019 • edited Loading

Janno commented Nov 8, 2019

SkySkimmer commented Nov 8, 2019

robbertkrebbers commented Nov 8, 2019

RalfJung commented Dec 2, 2019

RalfJung commented Dec 2, 2019

maximedenes commented Dec 5, 2019

fajb commented Dec 5, 2019

Major `lia` and `Qed` performance regression in 8.10.1 #11063

Major `lia` and `Qed` performance regression in 8.10.1 #11063

robbertkrebbers commented Nov 7, 2019 •

edited

Loading

robbertkrebbers commented Nov 7, 2019 •

edited

Loading

SkySkimmer commented Nov 7, 2019 •

edited

Loading