Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major lia and Qed performance regression in 8.10.1 #11063

Closed
robbertkrebbers opened this issue Nov 7, 2019 · 15 comments · Fixed by #11263
Closed

Major lia and Qed performance regression in 8.10.1 #11063

robbertkrebbers opened this issue Nov 7, 2019 · 15 comments · Fixed by #11263
Assignees
Labels
Coq Consortium support services Reporter subscribed Coq Consortium support services and requests processing by Consortium engineers. kind: performance Improvements to performance and efficiency. kind: regression Problems that were not present in previous versions.
Milestone

Comments

@robbertkrebbers
Copy link
Contributor

robbertkrebbers commented Nov 7, 2019

Description of the problem

As our CI indicates, there are some major performance regressions (of 305%) in std++ when switching from Coq 8.9.x to 8.10.1. See here.

I managed to extract at least one concrete minimal test case for the regression. The Time Qed in the below takes close to 0 time with Coq 8.9.x and more than 1 sec with Coq 8.10.1.

Require Import Psatz ZArith Program.

Arguments compose _ _ _ _ _ _ / : assert.

Definition max_infinite {A} (f : A -> A -> A) (a : A) (lt : A -> A -> Prop)
  (Hf : forall x x' y, f x (f x' y) = f x' (f x y)) : True := I.

Program Definition Z_infinite: True :=
  max_infinite (Z.max ∘ Z.succ) 0%Z Z.lt _.
Next Obligation. intros; simpl; lia. Time Qed.

The actual (non-minimized) code can be found here: https://gitlab.mpi-sws.org/iris/stdpp/blob/master/theories/infinite.v#L116

Coq Version

$ coqtop -v
The Coq Proof Assistant, version 8.10.1 (October 2019)
compiled on Oct 25 2019 10:21:28 with OCaml 4.05.0
@robbertkrebbers
Copy link
Contributor Author

robbertkrebbers commented Nov 7, 2019

This is strange. If I copy/paste the goal in the Next Obligation into a separate Lemma then it appears to be all fine:

Lemma foo x x' y :
  Z.max (Z.succ x) (Z.max (Z.succ x') y) = Z.max (Z.succ x') (Z.max (Z.succ x) y).
Proof. lia. Time Qed. (* takes close to 0 time *)

@robbertkrebbers robbertkrebbers added kind: regression Problems that were not present in previous versions. kind: performance Improvements to performance and efficiency. labels Nov 7, 2019
@Janno
Copy link
Contributor

Janno commented Nov 7, 2019

git bisect has chosen a winner:

6f1634d2f822037a482436a64d3ef3bfb2fac2a0 is the first bad commit
commit 6f1634d2f822037a482436a64d3ef3bfb2fac2a0
Author: Frédéric Besson <frederic.besson@inria.fr>
Date:   Fri Mar 8 09:56:49 2019 +0100

    Several improvements and fixes of Lia
    
    - Improved reification for Micromega (support for #8764)
    
    - Fixes #9268: Do not take universes into account in lia reification
    
      Improve #9291 by threading the evar_map during reification.
      Universes are unified.
    
    - Remove (potentially cyclic) dependency over lra for Rle_abs
    
    - Towards a complete simplex-based lia
    
      fixes  #9615
    
    Lia is now exclusively using cutting plane proofs.
    For this to always work, all the variables need to be positive.
    Therefore, lia is pre-processing the goal  for each variable x
    it introduces the constraints x = y - z , y>=0 , z >= 0
    for some fresh variable y and z.
    
    For scalability, nia is currently NOT performing this pre-processing.
    
    - Lia is using the FSet library
    
      manual merge of commit #230899e87c51c12b2f21b6fedc414d099a1425e4
      to work around a "leaked" hint breaking compatibility of eauto

 CHANGES.md                          |   5 +
 doc/sphinx/proof-engine/tactics.rst |   2 +
 doc/stdlib/hidden-files             |   1 +
 plugins/micromega/DeclConstant.v    |  68 +++
 plugins/micromega/Lia.v             |  19 +-
 plugins/micromega/Lqa.v             |   3 +-
 plugins/micromega/Lra.v             |   2 +-
 plugins/micromega/MExtraction.v     |   6 +-
 plugins/micromega/QMicromega.v      |  37 +-
 plugins/micromega/RMicromega.v      | 289 ++++++++---
 plugins/micromega/Refl.v            |  15 +
 plugins/micromega/RingMicromega.v   |  68 +--
 plugins/micromega/Tauto.v           | 675 +++++++++++++++++---------
 plugins/micromega/VarMap.v          |  30 +-
 plugins/micromega/ZMicromega.v      | 533 ++++++++++++++++++--
 plugins/micromega/certificate.ml    | 236 +++++----
 plugins/micromega/certificate.mli   |  17 +-
 plugins/micromega/coq_micromega.ml  | 942 ++++++++++++++++++------------------
 plugins/micromega/coq_micromega.mli |   1 +
 plugins/micromega/g_micromega.mlg   |   3 +
 plugins/micromega/micromega.ml      | 554 ++++++++++++++++-----
 plugins/micromega/micromega.mli     | 294 ++++++++---
 plugins/micromega/mutils.ml         |  74 ++-
 plugins/micromega/mutils.mli        |  19 +-
 plugins/micromega/polynomial.ml     | 233 +++++++--
 plugins/micromega/polynomial.mli    |  48 ++
 plugins/micromega/simplex.ml        | 216 +++++----
 plugins/micromega/vect.ml           |  43 ++
 plugins/micromega/vect.mli          |  19 +-
 test-suite/.csdp.cache              | Bin 165200 -> 169367 bytes
 test-suite/micromega/example_nia.v  |   6 +
 test-suite/micromega/rsyntax.v      |  75 +++
 test-suite/micromega/zomicron.v     |  39 +-
 test-suite/output/MExtraction.v     |   6 +-
 theories/FSets/FSetEqProperties.v   |  10 +-
 theories/FSets/FSetProperties.v     |  38 +-
 theories/Reals/Rbasic_fun.v         |   5 +-
 37 files changed, 3346 insertions(+), 1285 deletions(-)
 create mode 100644 plugins/micromega/DeclConstant.v
 create mode 100644 test-suite/micromega/rsyntax.v

This was done by looking for commits with a ~5x slowdown. The commit itself is about 10x slower on my machine. It is possible that there has been a gradual decline totaling less than 5x slowdown before this commit but I somehow doubt that judging by the commit message and the scope of the commit.

@robbertkrebbers
Copy link
Contributor Author

cc @fajb

@Blaisorblade
Copy link
Contributor

What happens with Unset Simplex? Per release notes:

The tactics lia, nia, lra, nra are now using a novel Simplex-based proof engine. In case of regression, unset Simplex to get the venerable Fourier-based engine (#8457, by Fréderic Besson).

@Janno
Copy link
Contributor

Janno commented Nov 7, 2019

What happens with Unset Simplex? Per release notes:

Same time on my machine (at that exact commit, at least.)

@ppedrot
Copy link
Member

ppedrot commented Nov 7, 2019

Profiling indicates that the time is passed in kernel conversion, so most likely some conversion problem went wrong. We should diff the terms generated by the two versions.

@fajb
Copy link
Contributor

fajb commented Nov 7, 2019

The culprit seems to be lia but:

This is fast:
Time Definition Z_infinite: True:= (max_infinite (Z.max ∘ Z.succ) 0%Z Z.lt (ltac:( (intros ; simpl; lia)))).

This is also fast:
Program Definition Z_infinite3: True := max_infinite (Z.max ∘ Z.succ) 0%Z Z.lt _. Next Obligation. intros; simpl; nia. Time Qed.

In this case, nia is using a less costly algorithm -- the proof by reflection is cheaper.
Yet, this does not completely explain why this has only a significant impact in the context of Program.

@SkySkimmer
Copy link
Contributor

SkySkimmer commented Nov 7, 2019

The prg_reduce which is

coq/vernac/obligations.ml

Lines 265 to 268 in 0afbbb0

let reduce c =
let env = Global.env () in
let sigma = Evd.from_env env in
EConstr.Unsafe.to_constr (Reductionops.clos_norm_flags CClosure.betaiota env sigma (EConstr.of_constr c))
changes the proof in such a way that checking it becomes slower (tested by replacing it by the identity, which makes the Qed fast). AFAICT the specific issue is a disappearing VM cast.

Diff between doing lia in a Lemma vs obligation:

@@ -1,12 +1,8 @@
-orig_proof =
+Z_infinite_obligation_1 = 
 fun x x' y : Z =>
-(fun
-   lemma : (fun t : Z =>
-            Z.max (Z.succ x) (Z.max (Z.succ x') y) = Z.max (Z.succ x') (Z.max (Z.succ x) y) <->
-            t = Z.max (x' + 1) (Z.max (x + 1) y)) (Z.max (x + 1) (Z.max (x' + 1) y)) =>
- Morphisms.iff_flip_impl_subrelation
-   (Z.max (Z.succ x) (Z.max (Z.succ x') y) = Z.max (Z.succ x') (Z.max (Z.succ x) y))
-   (Z.max (x + 1) (Z.max (x' + 1) y) = Z.max (x' + 1) (Z.max (x + 1) y)) lemma)
+Morphisms.iff_flip_impl_subrelation
+  (Z.max (Z.succ x) (Z.max (Z.succ x') y) = Z.max (Z.succ x') (Z.max (Z.succ x) y))
+  (Z.max (x + 1) (Z.max (x' + 1) y) = Z.max (x' + 1) (Z.max (x + 1) y))
   (eq_ind (Z.max (Z.succ x) (Z.max (Z.succ x') y))
      (fun t : Z =>
       Z.max (Z.succ x) (Z.max (Z.succ x') y) = Z.max (Z.succ x') (Z.max (Z.succ x) y) <->
@@ -755,8 +751,9 @@
                                  RingMicromega.Flhs := EnvRing.PEX 1;
                                  RingMicromega.Fop := RingMicromega.OpEq;
                                  RingMicromega.Frhs := EnvRing.PEX 2 |} ())))))))) in
-     ZMicromega.ZTautoCheckerExt_sound __ff __wit
-       (eq_refl <: ZMicromega.ZTautoCheckerExt __ff __wit = true) (VarMap.find 0%Z __varmap) in
+     ZMicromega.ZTautoCheckerExt_sound __ff __wit eq_refl (VarMap.find 0%Z __varmap) in
    __arith z6 z4 y z3 z0 x' z2 x z z5 z1 H2 H1 Heqz4 H0 H Heqz0 Heqz2 Heqz)
      : forall x x' y : Z,
-       Z.max (Z.succ x) (Z.max (Z.succ x') y) = Z.max (Z.succ x') (Z.max (Z.succ x) y)
+       (Z.max ∘ Z.succ) x ((Z.max ∘ Z.succ) x' y) = (Z.max ∘ Z.succ) x' ((Z.max ∘ Z.succ) x y)

I don't think the fun lemma cut is critical so it's probably the vm cast.

@Janno
Copy link
Contributor

Janno commented Nov 8, 2019

I am not quite following. Why was this particular commit responsible for the slowdown? It didn't change whether or not Program reduces anything, and I don't think it changed how the kernel removes cast during reduction. The commit also did not introduce new casts around eq_refl. So what exactly changed? Did the commit introduce new occurrences with eq_refl <: .. whose performance relies on the cast not being removed accidentally? (Whereas any existing occurrences, if any, were not so sensitive to having their corresponding cast accidentally removed by Program?)

@SkySkimmer
Copy link
Contributor

I think that commit made the produced term more complicated, which makes the use of the vm more important.
The timings look like this:

|--------+------------------+------------+--------------------|
|        | standalone lemma | obligation | obligation / lemma |
|--------+------------------+------------+--------------------|
|    8.9 | 0.02s            | 0.07s      |                3.5 |
|--------+------------------+------------+--------------------|
| master | 0.07s            | 0.8s       |               11.5 |
|--------+------------------+------------+--------------------|

@robbertkrebbers
Copy link
Contributor Author

I would say that the factor 3.5 on 8.9 is already suspicious. Why should the obligation be more expensive to check?

@RalfJung RalfJung added the Coq Consortium support services Reporter subscribed Coq Consortium support services and requests processing by Consortium engineers. label Dec 2, 2019
@RalfJung
Copy link
Contributor

RalfJung commented Dec 2, 2019

@SkySkimmer is there a way to get that VM cast back in without reverting the entire patch?

It's not just std++ that is affected by this: the already very slow weak-memory version of lambdaRust got slowed down from 17min to 25min of overall execution time. Here's the per-file instruction count diff.

@RalfJung
Copy link
Contributor

RalfJung commented Dec 2, 2019

Actually that problem seems unrelated to lia after all -- stubbing out most lia calls in the two slowest files does not make any difference.

@maximedenes
Copy link
Member

So it seems there are two problems here:

  • Program should not throw away VM casts
  • The lia commit mentioned above has introduced a performance regression

I can try to fix the first, @fajb can you comment on the second?

@fajb
Copy link
Contributor

fajb commented Dec 5, 2019

I am working on it PR #11047
I still have a few things to fix but I am optimistic.
If VM casts are not thrown away, will there be a noticeable regression?
Anyway, the PR should fix this.

fajb added a commit to fajb/coq that referenced this issue Dec 6, 2019
- PR coq#9725 fixes completness bugs introduces some inefficiency.
  The current PR intends to fix the inefficiency
  while retaining completness. The fix removes a pre-processing
  step and instead relies on a more elaborate proof
  format introducing positivity constraints on the fly.

fixes coq#11063, fixes coq#11242

- Better interval analysis for product of variables.
  (fixes coq#11191 , first case)

- Generation of positivity constrains for Z.pow
  (fixes coq#11191 , second case)

- More aggressive pruning of (useless) hypotheses
fajb added a commit to fajb/coq that referenced this issue Dec 9, 2019
PR coq#9725 fixes completness bugs introduces some inefficiency. The
current PR intends to fix the inefficiency while retaining
completness. The fix removes a pre-processing step and instead relies
on a more elaborate proof format introducing positivity constraints on
the fly.

Solve bootstrapping issues: RMicromega <-> Rbase <-> lia.

Fixes coq#11063 and fixes coq#11242
fajb added a commit to fajb/coq that referenced this issue Dec 12, 2019
PR coq#9725 fixes completness bugs introduces some inefficiency. The
current PR intends to fix the inefficiency while retaining
completness. The fix removes a pre-processing step and instead relies
on a more elaborate proof format introducing positivity constraints on
the fly.

Solve bootstrapping issues: RMicromega <-> Rbase <-> lia.

Fixes coq#11063 and fixes coq#11242 and fixes coq#11270
fajb added a commit to fajb/coq that referenced this issue Dec 12, 2019
PR coq#9725 fixes completness bugs introduces some inefficiency. The
current PR intends to fix the inefficiency while retaining
completness. The fix removes a pre-processing step and instead relies
on a more elaborate proof format introducing positivity constraints on
the fly.

Solve bootstrapping issues: RMicromega <-> Rbase <-> lia.

Fixes coq#11063 and fixes coq#11242 and fixes coq#11270
ejgallego pushed a commit to fajb/coq that referenced this issue Dec 13, 2019
PR coq#9725 fixes completness bugs introduces some inefficiency. The
current PR intends to fix the inefficiency while retaining
completness. The fix removes a pre-processing step and instead relies
on a more elaborate proof format introducing positivity constraints on
the fly.

Solve bootstrapping issues: RMicromega <-> Rbase <-> lia.

Fixes coq#11063 and fixes coq#11242 and fixes coq#11270
ejgallego pushed a commit to fajb/coq that referenced this issue Dec 14, 2019
PR coq#9725 fixes completness bugs introduces some inefficiency. The
current PR intends to fix the inefficiency while retaining
completness. The fix removes a pre-processing step and instead relies
on a more elaborate proof format introducing positivity constraints on
the fly.

Solve bootstrapping issues: RMicromega <-> Rbase <-> lia.

Fixes coq#11063 and fixes coq#11242 and fixes coq#11270
ejgallego pushed a commit to ejgallego/coq that referenced this issue Dec 14, 2019
PR coq#9725 fixes completness bugs introduces some inefficiency. The
current PR intends to fix the inefficiency while retaining
completness. The fix removes a pre-processing step and instead relies
on a more elaborate proof format introducing positivity constraints on
the fly.

Solve bootstrapping issues: RMicromega <-> Rbase <-> lia.

Fixes coq#11063 and fixes coq#11242 and fixes coq#11270
fajb added a commit to fajb/coq that referenced this issue Dec 16, 2019
PR coq#9725 fixes completness bugs introduces some inefficiency. The
current PR intends to fix the inefficiency while retaining
completness. The fix removes a pre-processing step and instead relies
on a more elaborate proof format introducing positivity constraints on
the fly.

Solve bootstrapping issues: RMicromega <-> Rbase <-> lia.

Fixes coq#11063 and fixes coq#11242 and fixes coq#11270
@coqbot coqbot added this to the 8.11.0 milestone Dec 18, 2019
ppedrot pushed a commit to ppedrot/coq that referenced this issue Dec 18, 2019
PR coq#9725 fixes completness bugs introduces some inefficiency. The
current PR intends to fix the inefficiency while retaining
completness. The fix removes a pre-processing step and instead relies
on a more elaborate proof format introducing positivity constraints on
the fly.

Solve bootstrapping issues: RMicromega <-> Rbase <-> lia.

Fixes coq#11063 and fixes coq#11242 and fixes coq#11270

(cherry picked from commit 7d961a9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Coq Consortium support services Reporter subscribed Coq Consortium support services and requests processing by Consortium engineers. kind: performance Improvements to performance and efficiency. kind: regression Problems that were not present in previous versions.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants