Spaces.No.Core: speed up the two slowest lines #1779

jdchristensen · 2023-10-09T17:24:39Z

Some simple changes make the two slowest lines instantaneous. I have no idea why one spot requires univalence to be passed in. Never seen that before. I tried various phrasings of that line, but all were slow unless the ua argument was given.

jdchristensen · 2023-10-09T17:27:08Z

theories/Spaces/No/Core.v

@@ -1065,7 +1067,7 @@ Section RaiseSort.
      strip_truncations.
      destruct sh as [[l sh]|[r sh]].
      + apply lt_l with l.
-        apply IHL0, (@No_decode_le _).
+        apply IHL0, (@No_decode_le ua). (* Need to pass in [Univalence] to make this fast. *)
        rewrite p; exact sh.
      + apply lt_r with r.
        apply IHR, No_decode_le.


At this spot with the same lemma (and many other spots), Coq finds Univalence quickly.

That is absolutely bizarre. Could it be that Coq is searching the typeclasses database for any instances that produce univalence? I wonder if we have some false promises lying around like being able to produce the term of any type which could cause typeclasses to spin out.

If you do Set Typeclasses Debug you can see what it is looking for.

There might also be a universe issue? If Set Typeclasses Debug doesn't give enough information (i.e., if the tc log is relatively short but spends a bunch of time producing no output), you can also do Set Typeclasses Debug Verbosity 2 and get all the things it runs unification/conversion on before deciding to discard things that don't match.

Also Set Typeclasses Depth 3 or another short number can let you understand what Coq is spending its time searching for more concisely.

i.e., if the tc log is relatively short but spends a bunch of time producing no output

Also btw, if you want an automated way to search for this sort of thing, you can pipe the output of coqc through https://github.com/JasonGross/coq-scripts/blob/master/timing/insert-timings.sh (I think we have this as a submodule here in etc/coq-scripts/timing/insert-timings.sh?) and then sort by timing.

Hmm, suspicious. I don't suppose Set Printing All. Set Printing Universes. gives anything interesting?

Summoning @ppedrot , do you know of any hotspots that would cause a .16 second lag between

Debug: considering goal 1 of status initial

and

Debug: 1: looking for Univalence without backtracking

in typeclass resolution?

Wait, how big is the context here @jdchristensen ? Does Time evar (ua2 : Univalence). also take .16s?

Time evar (ua2 : Univalence).

That takes 0s.

The context has 23 things in it, and (at least as displayed by default), none of them is particularly large.

I don't suppose Set Printing All. Set Printing Universes. gives anything interesting?

I didn't see anything that caught my eye in either the context or the goal with those set. Both are still relatively compact. And there's really no visible difference with the next goal a few lines down. Also note that Univalence is in Set, in case that matters.

JasonGross · 2023-10-09T17:48:51Z

theories/Spaces/No/Core.v

+    (* Coq can find this proof by typeclass search, but helping it out makes this faster. *)
+    { intro x; nrapply istrunc_prod; nrapply istrunc_arrow; exact _. }


I think we might be able to globally speed up truncation tc search with judicious use of #[export] Hint Mode ... : typeclass_instances. and #[export] Hint Opaque ... : typeclass_instances. We can use the first to say, e.g., "if you're looking to prove IsTrunc _ ?e and ?e is headed by an evar, quit early, and we can use the second to say "don't spend a bunch of time trying to unfold definitions to see if IsTrunc n <some definition with a complicated body> matches IsTrunc n <some other definition with a different complicated body>.

@JasonGross Those kinds of typeclasses speedups are really valuable for us but obviously its very difficult to know all the tricks. Would you be able to open an issue explaining some tricks? The kinds of classes that we use quite often (and often with issue) are truncatedness and the wildcat ones. Our mathclasses code also has a lot of these issues, but that's less of a priority rn.

I don't think I have a large bag of tricks here. Roughly the universal trick is "set up typeclass search such that for every typeclass problem, there is at most one 'right' answer; the 'right' answer should be obvious to the coder just looking at the goal; try to get Coq to consider only the 'right' answer" (Hint Mode and Hint Opaque are useful for this last part). Relaxations of this rule are permissible, but introduce slowness exponential in the nesting depth, so make exceptions sparingly, ideally with bounded depth (Hint Cut ... : typeclass_instances can help with this), and make make sure to order the instance priorities in a way that cuts the search short as early as possible.

Basically, if you're designing something to be solved by typeclasses, you should be able to run the full resolution in your head, including paths that fail, for all the goals you care about; if you can't, you're probably expecting magic that you will pay for in performance.

I guess one trick for helping with performance is doing Hint Constants Opaque : typeclass_instances at the outset, and then only doing Hint Transparent foo : typeclass_instances when you're sure. (And probably you want to pair #[export] Hint Transparent with #[export] Hint Unfold; if you're allowing tc to see through the constant, you very likely have no business depending on it being folded.) (And even better to use an idiom like #[local] Hint Transparent foo : typeclass_instances. #[export] Instance : TheClass foo := _. so that you don't pollute conversion problems with unfolding foo when really you only need to unfold it for one particular instance.)
Set Typeclasses Debug Verbosity 2. Set Debug "tactic-unification,unification". Set Printing All. is absolutely essential to debugging failures then, though, since often it won't be clear why tc isn't considering some instance when it seems to you like it should.

Feel free to make an issue or a doc out of this.

Yes, it would be great if some things were just faster by default, thanks to some clever settings. I don't understand these options well enough to play with this myself.

I would do something like

#[export] Hint Constants Opaque: typeclass_instances. (* all unfoldable constants have to be declared as such explicitly*) #[export] Hint Mode IsTrunc ! ! : typeclass_instances. (* if the level and type both have some known structure, we can proceed *) #[export] Hint Mode IsTrunc - + : typeclass_instances. (* if the type is fully known we can use tc to infer the level *) #[export] Hint Mode IsHProp ! : typeclass_instances. #[export] Hint Mode IsHSet ! : typeclass_instances. #[export] Hint Mode Contr ! : typeclass_instances. #[export] Hint Mode IsEquiv - - + : typeclass_instances. (* only infer equivalence if the function is known; - - ! would also be fine if we want to infer IsEquiv id without needing the types to be known *) #[export] Hint Mode Equiv + + : typeclass_instances. (* only infer Equiv if both types are fully known *)

and then see what breaks and why, and adjust from there.

If you describe what typeclasses are used in wildcats / mathclasses and how the information flow works, I might be able to guess a good starting point.

IsHProp, IsHSet and Contr are notations, so I had to exclude them. And we already set IsTrunc to be Opaque in typeclass search, so those lines are probably not needed in general.

I added the rest right before the slow spot here (again, not very slow, 0.15s) and it changed to 0.007s, as fast as my version. In other words, with the above, I can just write refine (No_ind_hprop _ _). and it's as fast as my version. Nice! I checked, and it's just the Hint Constants Opaque that affects the speed here.

However, I'm not sure how practical it will be to add Hint Constants Opaque to the library in general. We'd then need to add a lot of hints about unfolding throughout. If someone wants to add Hint Constants Opaque to Overture.v and then go through and add Hint Mode as needed to make the library build, that would be an interesting experiment.

then go through and add Hint Mode as needed to make the library build

It's not Hint Mode but Hint Transparent that needs to be added. Roughly the rule of thumb is that you need to mark transparent any definition whose truncatedness you want tc to infer, but for which no trunc instance is declared. (For example, if you want to use instances of IsTrunc _ { a : A & ... } interchangably with instances of IsTrunc _ (hfiber ...)). There are a couple of exceptions, though I expect them to be quite rare for IsTrunc. One example is that if you prove forall (A : Type) (ls : list A), IsHProp (nil = ls) and then want to automatically infer forall (A B : Type) (f : A -> B) (ls : list B), IsHProp (map f nil = ls), you'll need to declare map transparent. I wouldn't be all that surprised if none of the truncation instances broke.

Thanks, @JasonGross! I'll copy this to the issue as well.

SkySkimmer · 2023-10-09T19:34:15Z

theories/Spaces/No/Core.v

@@ -1065,7 +1067,7 @@ Section RaiseSort.
      strip_truncations.
      destruct sh as [[l sh]|[r sh]].
      + apply lt_l with l.
-        apply IHL0, (@No_decode_le _).
+        apply IHL0, (@No_decode_le ua). (* Need to pass in [Univalence] to make this fast. *)


Maybe the unification of apply does something faster when the univalence argument is not an evar?
Is refine faster than apply here?

@SkySkimmer No, refine takes the same amount of time. But here are some strange observations. The last two are really mysterious to me!

(* nrefine is "notypeclasses refine" *) Time apply (No_decode_le). (* slow *) Time apply (@No_decode_le). (* fails, unable to find Univalence *) Time apply (@No_decode_le ua). (* fast *) Time refine (@No_decode_le ua _ _ _ _). (* fast *) Time refine (@No_decode_le _ _ _ _ _). (* slow *) Time nrefine (@No_decode_le ua _ _ _ _). (* slow! *) Time nrefine (@No_decode_le _ _ _ _ _). (* slow! (and leaves Univalence as a goal, of course) *)

cc @JasonGross

Oh, and with the last one, it's still slow to solve the isolated Univalence goal:

Unshelve. Time 2: exact _.

It's also slow if you use typeclasses eauto instead of exact _?

@SkySkimmer Yes, it's the same. BTW, everything labelled "slow" takes essentially the same amount of time, around 0.28s on my machine, and everything labelled "fast" is 0s or very close to 0s. It's weird that the last one breaks the problem into two steps, and both steps take 0.28s.

In the situation where all I'm trying to do is solve the goal Univalence using exact _, if I clear just this one thing from the context:

sh : le' {{ fun l : L => No_raise (xL l) | (fun r : R => No_raise (xR r)) // rxcut }} (No_raise (xL0 l))

then it becomes fast. Strangely, an identical thing is in the context for the next goal, which is fast without manually supplying ua.

The problem seems to be in the generation of the "local" hint database corresponding to the local context. It reduces in an inefficient way just to find out that it's not a class.

coq/coq#18145

@SkySkimmer Great! Glad that this lead to something useful. If it works, I'll add a comment here that says that the ua can be removed when we move to Coq 8.19.

jdchristensen · 2023-10-10T14:29:41Z

Thanks for all the ideas here. The slowdown with univalence seems to be a quirk that hasn't bothered us elsewhere, so I'm fine with just hardcoding the solution that works here. The slowdown with typeclass search involving truncation is a broader problem that could be tackled another time. I'll open an issue. For now, I'll just merge this.

Spaces.No.Core: speed up the two slowest lines

47d19c5

jdchristensen commented Oct 9, 2023

View reviewed changes

JasonGross reviewed Oct 9, 2023

View reviewed changes

SkySkimmer reviewed Oct 9, 2023

View reviewed changes

jdchristensen merged commit 31d7fee into HoTT:master Oct 10, 2023
23 checks passed

jdchristensen deleted the faster_surreals branch October 10, 2023 14:30

jdchristensen mentioned this pull request Oct 10, 2023

experiment with Hint Constants Opaque in typeclass search #1780

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spaces.No.Core: speed up the two slowest lines #1779

Spaces.No.Core: speed up the two slowest lines #1779

jdchristensen commented Oct 9, 2023

jdchristensen Oct 9, 2023

Alizter Oct 9, 2023 •

edited

JasonGross Oct 9, 2023

Alizter Oct 9, 2023

JasonGross Oct 9, 2023

JasonGross Oct 9, 2023

JasonGross Oct 9, 2023

jdchristensen Oct 9, 2023

jdchristensen Oct 9, 2023

jdchristensen Oct 9, 2023

JasonGross Oct 9, 2023

Alizter Oct 9, 2023

JasonGross Oct 9, 2023

jdchristensen Oct 9, 2023

JasonGross Oct 9, 2023

jdchristensen Oct 9, 2023

JasonGross Oct 10, 2023

jdchristensen Oct 10, 2023

SkySkimmer Oct 9, 2023

jdchristensen Oct 9, 2023 •

edited

jdchristensen Oct 9, 2023

SkySkimmer Oct 9, 2023

jdchristensen Oct 9, 2023 •

edited

jdchristensen Oct 9, 2023

SkySkimmer Oct 10, 2023

SkySkimmer Oct 10, 2023

jdchristensen Oct 10, 2023

jdchristensen commented Oct 10, 2023

		(* Coq can find this proof by typeclass search, but helping it out makes this faster. *)
		{ intro x; nrapply istrunc_prod; nrapply istrunc_arrow; exact _. }

Spaces.No.Core: speed up the two slowest lines #1779

Spaces.No.Core: speed up the two slowest lines #1779

Conversation

jdchristensen commented Oct 9, 2023

Choose a reason for hiding this comment

Alizter Oct 9, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdchristensen Oct 9, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdchristensen Oct 9, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdchristensen commented Oct 10, 2023

Alizter Oct 9, 2023 •

edited

jdchristensen Oct 9, 2023 •

edited

jdchristensen Oct 9, 2023 •

edited