Discrimination trees for instance search #7109

plt-amy · 2024-02-12T20:58:28Z

This PR optimizes instance search by storing global instances in a discrimination tree(-like; see below) data structure, and using this to narrow the list of possible candidates, instead of the previous strategy of linearly trying all instances in scope. The performance gains are significant: see this comment below for some numbers.

The initial lookup in the discrimination tree still has to be followed by a linear narrowing of the candidates using the conversion checker proper, since some (classes of) terms are all bunched together — e.g., all sorts become one Key, and non-constant lambda abstractions become wildcards.

The trade-off here is (essentially) how much work it takes to turn a term into a Key, vs. how many candidate checks we expect the extra accuracy to save. For example, an accurate representation of constant lambdas (or reflexive paths) takes more work, but allows us to narrow Dec (PathP (λ i → A) x y) instances based on A --- since, without this extra bit of work, Dec (PathP (λ i → Bool) x y) and Dec (PathP (λ i → Nat) x y) look indistinguishable to the discrimination tree.

The representation of the tree data structure implemented here looks less like a Trie, and more like the abstract machine's fast case trees. Indeed they are essentially the same, with the difference being entirely in how the lookup function uses the tree to traverse the term: rather than blocking on metavariables, we instead explore all the possibilities; and rather than having fallback cases, for when none of the branches matched, there is an "and-also" used to implement overlap. This representation essentially bakes in a form of "path compression" for the instance tree.

Consider this instance from the 1Lab:

instance
  extensionality-uncurry
    : ∀ {ℓ ℓ' ℓ''} {A : Type ℓ} {B : A → Type ℓ'} {C : Type ℓ''}
    → Extensionality (Σ A B → C)

Naïvely, we could turn the "instance head", Extensionality ..., into a list of Keys, by repeatedly destructuring the term and noting down the actual rigid symbols we run into. The result is as follows:

term: Extensionality (Σ @2 @1 → @0)
path: [Extensionality², _, Fun, Σ⁴, _, _, _, _, _]

Here, _ is used wherever the term wasn't rigid enough. In addition to any variables bound by the type of the instance, they include (e.g.) level arguments. If we use this key to generate a literal Trie, there would be indirections standing for each of the underscores, including those at the end, which is quite expensive; Moreover, to look up in this trie, we would have to force the type of the instance constraint quite far, to generate the corresponding key.

By representing the instance tree through the phrasing of case trees, we simultaneously remove the extra indirections and record exactly what parts of the term need to be forced to look up the instance candidates. From the instance above, we obtain the case tree fragment

  case 0 of
    Extensionality² → case 1 of
      Fun → case 1 of
        Σ⁴ → done {extensionality-uncurry}

Note that the "variable to case on" field uses the same scoping as the AM's case trees, i.e. the term is unpacked into the spine for matching. This case tree precisely records that we have to force (a) the type of the instance meta itself, to find the class name; (b) the argument to Extensionality, to find out whether it is a function type; and (c) the domain of the function type, to find out whether it is a Σ.

The PR also adds benchmarking points for the various parts of instance search (and adds a benchmarking point for evalTCM, used in unquoting tactics), and quite verbose ticky profiling using --profile=instances; see this comment for information on how to read the output.

plt-amy · 2024-02-23T06:17:54Z

I actually managed to get motivated to finish this today, though not before getting nerd-sniped by a GHC bug.. Anyway, this PR is pretty much finished: the instances pulled in from the scope are now stored in a discrimination tree(-like) data structure (it did not change since last week; the same remarks about the case-tree-like representation being an implementation of path compression apply).

Other than wiring it up to the type checker, what did change was the handling of eta-equality. The simplest approach is to just replace everything that might be of an eta-record type by a dummy; this is simpler than the "correct" approach of having an EtaDT, corresponding to FEta, in the discrimination tree, but unlike for actual pattern matching, being sloppy here does not cost us correctness, only potential performance.

Perhaps more importantly, I added a billion profiling points everywhere. Running with --profile=instances will ~~vomit kilobytes of junk onto your terminal~~ do ticky profiling for basically every relevant part of the internals of instance selection, including on a per-class basis. Here's the highlights. First, there's a global counter for checking candidates:

checkCandidateForMeta                                          292,529
checkCandidateForMeta: no                                      127,999
checkCandidateForMeta: yes                                     116,200

Then, for each "class", we have a block like this, tallying the performance of the discrimination tree. In order, we have the total number of instance constraints headed by this name; the total number of candidates that the discrimination tree excluded; the number of times that there was only one possible candidate (this includes locals), and the total number of checkCandidateForMeta that instances of this class were responsible for.

class H-Level: attempts                                          6,249
class H-Level: discarded early                                 116,848
class H-Level: only one candidate                                1,567
class H-Level: total candidates visited                         25,071

A refreshingly simple counter, this time for the function which implements sorting candidates for #6955. The 1Lab uses getInstances a lot, so I wanted to know exactly how much time was being spent in sorting, to figure out whether optimising it is worth the effort (it probably isn't.)

doesCandidateSpecialise                                          1,358

Back to the useful statistics, we have a tally of what, if anything, made the discrimination tree matcher go exploring. "Exploration" is what I call what happens whenever you have something like

instance
  foo : X A
  bar : X B

⊢ _ : X ?0

Since ?0 might become either A or B in the future, instead of taking any of the branches, we take all of the branches in that position. Ideally, these would all be metavariables, but there are things which don't yet meaningfully contribute to the lookup process, e.g. sorts or lambdas. It looks like this:

explore: Def                                                     4,720
explore: Lit                                                     1,798
explore: Meta                                                   37,317
explore: Var                                                     1,015
explore: constant function                                           4 -- Lam _ NoAbs{}
flex term blocking instance                                     71,716 -- all of the above + non-specific

The benchmarking for instance search also got an overhaul, since it was pretty hard to tell where time was being spent before; this included splitting off evalTCM into its own category. I hope the names here are self-explanatory.

Typing.Reflection                       25,423ms           
Typing.InstanceSearch                      216ms  (8,109ms)
Typing.InstanceSearch.FilterCandidates   6,913ms           
Typing.InstanceSearch.InitialCandidates    783ms           
Typing.InstanceSearch.OrderCandidates      196ms

plt-amy · 2024-02-23T07:04:35Z

~~I'm convinced that instance projections exist solely to torment me.~~ After venting my anger with instance projections on Mastodon, I managed to whac-a-mole my way into the right context to get their types to make sense, so both the test suite and the standard library tests now pass.

plt-amy · 2024-02-25T21:09:49Z

Here are some more complete performance numbers for the 1Lab, this patch vs. master (with my profiling changes cherry-picked in). The time spent filtering candidates was essentially halved, though we do see a slight uptick in the time spent finding instance candidates to compensate (which makes sense, since this function now has to do some actual work). Additionally, a significant percentage of the time saved is in the occurs checker; presumably this is simply from it being called less.

Profiling point	Master	This patch
Total	703,384ms	693,122ms
Typing	48,922ms	50,900ms
Typing.Reflection	195,049ms	209,717ms
Typing.CheckRHS	71,251ms	73,941ms
Typing.InstanceSearch	918ms	852ms
Typing.InstanceSearch.FilterCandidates	64,238ms	34,063ms
Typing.InstanceSearch.InitialCandidates	1,954ms	7,546ms
Typing.InstanceSearch.OrderCandidates	840ms	797ms
Typing.OccursCheck	44,405ms	30,683ms

While 7 seconds might seem like a bit much spent looking up instances, keep in mind that the 1Lab calls instance search 4,695,879 times --- so, on average, looking up instance candidates takes 0.0016 ms per constraint. Here are some other interesting things I've noticed staring at the per-class statistics.

Overlap with sorts

This is now fixed

In the 1Lab, we have an Underlying class, which equips types with a privileged projection into universes — things like projecting the type of objects of a category. The most basic instances serve to demonstrate the problem:

instance
  Underlying-Type : ∀ {ℓ} → Underlying (Type ℓ)
  Underlying-n-Type : ∀ {ℓ n} → Underlying (n-Type ℓ n)

In the discrimination tree, Type becomes a FlexK --- in effect, the discrimination tree thinks the type of the first instance is Underlying-Type : ∀ {A} → Underlying A. This results in the Underlying class having these funny stats at the end:

class Underlying: attempts                                                        280,570
class Underlying: discarded early                                               1,232,608
class Underlying: total candidates visited                                        908,626

While quite a few instances are discarded early, not a single instance constraint had a single candidate after filtering with the discrimination tree. This could easily be remedied by adding a SortK constructor to Key.

Classes on universe-polymorphic data types

The discrimination tree hates classes like Functor when they're made universe-polymorphic. Taking a look at this code:

record Functor {adj : Level → Level} (M : ∀ {ℓ} → Type ℓ → Type (adj ℓ)) : Typeω where
  field
    map : ∀ {ℓ} {ℓ'} {A : Type ℓ} {B : Type ℓ'} → (A → B) → M A → M B

instance
  Functor-List : Functor List
  Functor-List = record { map = go } where
    go : ∀ {ℓ ℓ'} {A : Type ℓ} {B : Type ℓ'} → (A → B) → List A → List B
    go f []       = []
    go f (x ∷ xs) = f x ∷ go f xs

The "key" generated for Functor-List is Functor, _, _, because the type is really Functor (λ {l} → List {?}). The discrimination tree avoids dealing with lambdas entirely, so these classes essentially do not benefit. Conversely, the lookup code is smart enough to avoid forcing anything other than the outermost Term (to get the Functor class name). These classes all look like this:

class Bind: attempts                     228,834
class Bind: discarded early              0
class Bind: only one candidate           1,629
class Bind: total candidates visited     799,800

The PathP conundrum

This is now fixed

Because instances are added under the normalised form of their types, any x ≡ y or Path A x y will be reduced to PathP (λ i → A) x y. This comes up a lot when Dec is used as a class. While Dec does get some amount of early filtering done, this basically stops at the outermost constructor of the type being decided. Deciding equality will do a linear search over the instances. This results in these sad-looking stats:

class Dec: attempts                             93,971
class Dec: discarded early                      540,581
class Dec: only one candidate                   6,203
class Dec: total candidates visited             1,082,244

Changing splitTermKey so that it unpacks constant lambdas seems like a win in this case. I don't have enough patience to do a full build with/without this (700 seconds is quite a long time), but Data.Reflection.Term is one of the heaviest modules when it comes to the number of Discrete instances flying around. Unpacking constant lambdas patch improves the situation quite a bit:

class Dec: attempts                       61
class Dec: discarded early               244
class Dec: total candidates visited    1,220
--- after ---
class Dec: attempts                       61
class Dec: discarded early             1,891
class Dec: only one candidate             61
class Dec: total candidates visited       61

Every single Dec constraint now avoids the linear search step. However, it's important to keep in mind that this improvement in the ticky stats might not actually translate to an improvement in wall-clock time. The results for this particular module seem very promising:

Profiling point	As-is	Constant lambdas
Typing.InstanceSearch	12ms	9ms
Typing.InstanceSearch.FilterCandidates	459ms	136ms
Typing.InstanceSearch.InitialCandidates	16ms	14ms

jespercockx

This looks really awesome. I didn't have the time to really dive into the details of the implementation, but I scrolled through and left a few comments. In any case, since there are good performance improvements and there do not seem to be any regressions in the test suite I am happy to approve this.

src/full/Agda/TypeChecking/DiscrimTree/Types.hs

src/full/Agda/TypeChecking/InstanceArguments.hs

src/full/Agda/TypeChecking/DiscrimTree.hs

I have no idea why but this causes the 1Lab's implementation of "deriving show" for lists to no longer termination-check; this would probably affect other users, too, and possibly even the stdlib/cubical test suites.

Selfishly bumping to include agda/agda#7109. Go forth and instance search

plt-amy force-pushed the aliao/instance-discrim-tree branch from 0a72ae2 to 545437d Compare February 22, 2024 17:07

plt-amy marked this pull request as ready for review February 23, 2024 05:19

plt-amy force-pushed the aliao/instance-discrim-tree branch from 38e703e to 2aecf36 Compare February 23, 2024 05:29

plt-amy force-pushed the aliao/instance-discrim-tree branch from d7eda4c to 828fdd7 Compare February 25, 2024 22:04

plt-amy requested a review from jespercockx February 26, 2024 00:14

plt-amy added the instance Instance resolution label Feb 26, 2024

plt-amy added this to the 2.6.5 milestone Feb 26, 2024

jespercockx approved these changes Feb 26, 2024

View reviewed changes

plt-amy added 11 commits March 9, 2024 01:35

wip: start working on discrimination trees

1819929

wip: handle eta equality while adding instances

aaf18e7

finish impl of discrimination tree instance search

6a79260

fixup: revert the immediate postponement change

8e96e88

I have no idea why but this causes the 1Lab's implementation of "deriving show" for lists to no longer termination-check; this would probably affect other users, too, and possibly even the stdlib/cubical test suites.

fixup: don't prune constructors from the instance table

0b9b996

fixup: use projections' top-level types for the tree

92aabef

fixup: use record patterns in the API test

478b907

fixup: more context whac-a-mole

dfa9a6a

Some fine tuning of instance indexing

da23a92

address review feedback

9c64b76

changelog for discrim trees

9a0adb1

plt-amy force-pushed the aliao/instance-discrim-tree branch from 2ff567a to 9a0adb1 Compare March 9, 2024 04:44

plt-amy merged commit aa5d8bb into master Mar 9, 2024
37 checks passed

plt-amy deleted the aliao/instance-discrim-tree branch March 9, 2024 05:27

plt-amy mentioned this pull request Mar 9, 2024

chore: bump agda plt-amy/1lab#368

Merged

plt-amy added a commit to plt-amy/1lab that referenced this pull request Mar 9, 2024

chore: bump agda (#368)

37ee089

Selfishly bumping to include agda/agda#7109. Go forth and instance search

plt-amy mentioned this pull request Mar 11, 2024

Instance overlap pragmas #7183

Merged

andreasabel modified the milestones: 2.6.5, 2.7.0 Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrimination trees for instance search #7109

Discrimination trees for instance search #7109

plt-amy commented Feb 12, 2024 •

edited

plt-amy commented Feb 23, 2024

plt-amy commented Feb 23, 2024 •

edited

plt-amy commented Feb 25, 2024 •

edited

jespercockx left a comment

Discrimination trees for instance search #7109

Discrimination trees for instance search #7109

Conversation

plt-amy commented Feb 12, 2024 • edited

plt-amy commented Feb 23, 2024

plt-amy commented Feb 23, 2024 • edited

plt-amy commented Feb 25, 2024 • edited

Overlap with sorts

Classes on universe-polymorphic data types

The PathP conundrum

jespercockx left a comment

Choose a reason for hiding this comment

plt-amy commented Feb 12, 2024 •

edited

plt-amy commented Feb 23, 2024 •

edited

plt-amy commented Feb 25, 2024 •

edited