Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrimination trees for instance search #7109

Merged
merged 11 commits into from Mar 9, 2024
Merged

Conversation

plt-amy
Copy link
Member

@plt-amy plt-amy commented Feb 12, 2024

This PR optimizes instance search by storing global instances in a discrimination tree(-like; see below) data structure, and using this to narrow the list of possible candidates, instead of the previous strategy of linearly trying all instances in scope. The performance gains are significant: see this comment below for some numbers.

The initial lookup in the discrimination tree still has to be followed by a linear narrowing of the candidates using the conversion checker proper, since some (classes of) terms are all bunched together — e.g., all sorts become one Key, and non-constant lambda abstractions become wildcards.

The trade-off here is (essentially) how much work it takes to turn a term into a Key, vs. how many candidate checks we expect the extra accuracy to save. For example, an accurate representation of constant lambdas (or reflexive paths) takes more work, but allows us to narrow Dec (PathP (λ i → A) x y) instances based on A --- since, without this extra bit of work, Dec (PathP (λ i → Bool) x y) and Dec (PathP (λ i → Nat) x y) look indistinguishable to the discrimination tree.


The representation of the tree data structure implemented here looks less like a Trie, and more like the abstract machine's fast case trees. Indeed they are essentially the same, with the difference being entirely in how the lookup function uses the tree to traverse the term: rather than blocking on metavariables, we instead explore all the possibilities; and rather than having fallback cases, for when none of the branches matched, there is an "and-also" used to implement overlap. This representation essentially bakes in a form of "path compression" for the instance tree.

Consider this instance from the 1Lab:

instance
  extensionality-uncurry
    :  {ℓ ℓ' ℓ''} {A : Type ℓ} {B : A  Type ℓ'} {C : Type ℓ''}
     Extensionality (Σ A B  C)

Naïvely, we could turn the "instance head", Extensionality ..., into a list of Keys, by repeatedly destructuring the term and noting down the actual rigid symbols we run into. The result is as follows:

term: Extensionality (Σ @2 @1 → @0)
path: [Extensionality², _, Fun, Σ⁴, _, _, _, _, _]

Here, _ is used wherever the term wasn't rigid enough. In addition to any variables bound by the type of the instance, they include (e.g.) level arguments. If we use this key to generate a literal Trie, there would be indirections standing for each of the underscores, including those at the end, which is quite expensive; Moreover, to look up in this trie, we would have to force the type of the instance constraint quite far, to generate the corresponding key.

By representing the instance tree through the phrasing of case trees, we simultaneously remove the extra indirections and record exactly what parts of the term need to be forced to look up the instance candidates. From the instance above, we obtain the case tree fragment

  case 0 of
    Extensionality² → case 1 of
      Fun → case 1 of
        Σ⁴ → done {extensionality-uncurry}

Note that the "variable to case on" field uses the same scoping as the AM's case trees, i.e. the term is unpacked into the spine for matching. This case tree precisely records that we have to force (a) the type of the instance meta itself, to find the class name; (b) the argument to Extensionality, to find out whether it is a function type; and (c) the domain of the function type, to find out whether it is a Σ.

The PR also adds benchmarking points for the various parts of instance search (and adds a benchmarking point for evalTCM, used in unquoting tactics), and quite verbose ticky profiling using --profile=instances; see this comment for information on how to read the output.

@plt-amy plt-amy marked this pull request as ready for review February 23, 2024 05:19
@plt-amy
Copy link
Member Author

plt-amy commented Feb 23, 2024

I actually managed to get motivated to finish this today, though not before getting nerd-sniped by a GHC bug.. Anyway, this PR is pretty much finished: the instances pulled in from the scope are now stored in a discrimination tree(-like) data structure (it did not change since last week; the same remarks about the case-tree-like representation being an implementation of path compression apply).

Other than wiring it up to the type checker, what did change was the handling of eta-equality. The simplest approach is to just replace everything that might be of an eta-record type by a dummy; this is simpler than the "correct" approach of having an EtaDT, corresponding to FEta, in the discrimination tree, but unlike for actual pattern matching, being sloppy here does not cost us correctness, only potential performance.

Perhaps more importantly, I added a billion profiling points everywhere. Running with --profile=instances will vomit kilobytes of junk onto your terminal do ticky profiling for basically every relevant part of the internals of instance selection, including on a per-class basis. Here's the highlights. First, there's a global counter for checking candidates:

checkCandidateForMeta                                          292,529
checkCandidateForMeta: no                                      127,999
checkCandidateForMeta: yes                                     116,200

Then, for each "class", we have a block like this, tallying the performance of the discrimination tree. In order, we have the total number of instance constraints headed by this name; the total number of candidates that the discrimination tree excluded; the number of times that there was only one possible candidate (this includes locals), and the total number of checkCandidateForMeta that instances of this class were responsible for.

class H-Level: attempts                                          6,249
class H-Level: discarded early                                 116,848
class H-Level: only one candidate                                1,567
class H-Level: total candidates visited                         25,071

A refreshingly simple counter, this time for the function which implements sorting candidates for #6955. The 1Lab uses getInstances a lot, so I wanted to know exactly how much time was being spent in sorting, to figure out whether optimising it is worth the effort (it probably isn't.)

doesCandidateSpecialise                                          1,358

Back to the useful statistics, we have a tally of what, if anything, made the discrimination tree matcher go exploring. "Exploration" is what I call what happens whenever you have something like

instance
  foo : X A
  bar : X B

⊢ _ : X ?0

Since ?0 might become either A or B in the future, instead of taking any of the branches, we take all of the branches in that position. Ideally, these would all be metavariables, but there are things which don't yet meaningfully contribute to the lookup process, e.g. sorts or lambdas. It looks like this:

explore: Def                                                     4,720
explore: Lit                                                     1,798
explore: Meta                                                   37,317
explore: Var                                                     1,015
explore: constant function                                           4 -- Lam _ NoAbs{}
flex term blocking instance                                     71,716 -- all of the above + non-specific

The benchmarking for instance search also got an overhaul, since it was pretty hard to tell where time was being spent before; this included splitting off evalTCM into its own category. I hope the names here are self-explanatory.

Typing.Reflection                       25,423ms           
Typing.InstanceSearch                      216ms  (8,109ms)
Typing.InstanceSearch.FilterCandidates   6,913ms           
Typing.InstanceSearch.InitialCandidates    783ms           
Typing.InstanceSearch.OrderCandidates      196ms           

@plt-amy
Copy link
Member Author

plt-amy commented Feb 23, 2024

I'm convinced that instance projections exist solely to torment me. After venting my anger with instance projections on Mastodon, I managed to whac-a-mole my way into the right context to get their types to make sense, so both the test suite and the standard library tests now pass.

@plt-amy
Copy link
Member Author

plt-amy commented Feb 25, 2024

Here are some more complete performance numbers for the 1Lab, this patch vs. master (with my profiling changes cherry-picked in). The time spent filtering candidates was essentially halved, though we do see a slight uptick in the time spent finding instance candidates to compensate (which makes sense, since this function now has to do some actual work). Additionally, a significant percentage of the time saved is in the occurs checker; presumably this is simply from it being called less.

Profiling point Master This patch
Total 703,384ms 693,122ms
Typing 48,922ms 50,900ms
Typing.Reflection 195,049ms 209,717ms
Typing.CheckRHS 71,251ms 73,941ms
Typing.InstanceSearch 918ms 852ms
Typing.InstanceSearch.FilterCandidates 64,238ms 34,063ms
Typing.InstanceSearch.InitialCandidates 1,954ms 7,546ms
Typing.InstanceSearch.OrderCandidates 840ms 797ms
Typing.OccursCheck 44,405ms 30,683ms

While 7 seconds might seem like a bit much spent looking up instances, keep in mind that the 1Lab calls instance search 4,695,879 times --- so, on average, looking up instance candidates takes 0.0016 ms per constraint. Here are some other interesting things I've noticed staring at the per-class statistics.

Overlap with sorts

This is now fixed

In the 1Lab, we have an Underlying class, which equips types with a privileged projection into universes — things like projecting the type of objects of a category. The most basic instances serve to demonstrate the problem:

instance
  Underlying-Type :  {ℓ}  Underlying (Type ℓ)
  Underlying-n-Type :  {ℓ n}  Underlying (n-Type ℓ n)

In the discrimination tree, Type becomes a FlexK --- in effect, the discrimination tree thinks the type of the first instance is Underlying-Type : ∀ {A} → Underlying A. This results in the Underlying class having these funny stats at the end:

class Underlying: attempts                                                        280,570
class Underlying: discarded early                                               1,232,608
class Underlying: total candidates visited                                        908,626

While quite a few instances are discarded early, not a single instance constraint had a single candidate after filtering with the discrimination tree. This could easily be remedied by adding a SortK constructor to Key.

Classes on universe-polymorphic data types

The discrimination tree hates classes like Functor when they're made universe-polymorphic. Taking a look at this code:

record Functor {adj : Level  Level} (M :  {ℓ}  Type ℓ  Type (adj ℓ)) : Typeω where
  field
    map :  {ℓ} {ℓ'} {A : Type ℓ} {B : Type ℓ'}  (A  B)  M A  M B

instance
  Functor-List : Functor List
  Functor-List = record { map = go } where
    go :  {ℓ ℓ'} {A : Type ℓ} {B : Type ℓ'}  (A  B)  List A  List B
    go f []       = []
    go f (x ∷ xs) = f x ∷ go f xs

The "key" generated for Functor-List is Functor, _, _, because the type is really Functor (λ {l} → List {?}). The discrimination tree avoids dealing with lambdas entirely, so these classes essentially do not benefit. Conversely, the lookup code is smart enough to avoid forcing anything other than the outermost Term (to get the Functor class name). These classes all look like this:

class Bind: attempts                     228,834
class Bind: discarded early              0
class Bind: only one candidate           1,629
class Bind: total candidates visited     799,800

The PathP conundrum

This is now fixed

Because instances are added under the normalised form of their types, any x ≡ y or Path A x y will be reduced to PathP (λ i → A) x y. This comes up a lot when Dec is used as a class. While Dec does get some amount of early filtering done, this basically stops at the outermost constructor of the type being decided. Deciding equality will do a linear search over the instances. This results in these sad-looking stats:

class Dec: attempts                             93,971
class Dec: discarded early                      540,581
class Dec: only one candidate                   6,203
class Dec: total candidates visited             1,082,244

Changing splitTermKey so that it unpacks constant lambdas seems like a win in this case. I don't have enough patience to do a full build with/without this (700 seconds is quite a long time), but Data.Reflection.Term is one of the heaviest modules when it comes to the number of Discrete instances flying around. Unpacking constant lambdas patch improves the situation quite a bit:

class Dec: attempts                       61
class Dec: discarded early               244
class Dec: total candidates visited    1,220
--- after ---
class Dec: attempts                       61
class Dec: discarded early             1,891
class Dec: only one candidate             61
class Dec: total candidates visited       61

Every single Dec constraint now avoids the linear search step. However, it's important to keep in mind that this improvement in the ticky stats might not actually translate to an improvement in wall-clock time. The results for this particular module seem very promising:

Profiling point As-is Constant lambdas
Typing.InstanceSearch 12ms 9ms
Typing.InstanceSearch.FilterCandidates 459ms 136ms
Typing.InstanceSearch.InitialCandidates 16ms 14ms

@plt-amy plt-amy added the instance Instance resolution label Feb 26, 2024
@plt-amy plt-amy added this to the 2.6.5 milestone Feb 26, 2024
Copy link
Member

@jespercockx jespercockx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really awesome. I didn't have the time to really dive into the details of the implementation, but I scrolled through and left a few comments. In any case, since there are good performance improvements and there do not seem to be any regressions in the test suite I am happy to approve this.

src/full/Agda/TypeChecking/DiscrimTree/Types.hs Outdated Show resolved Hide resolved
src/full/Agda/TypeChecking/InstanceArguments.hs Outdated Show resolved Hide resolved
src/full/Agda/TypeChecking/DiscrimTree.hs Outdated Show resolved Hide resolved
src/full/Agda/TypeChecking/DiscrimTree.hs Show resolved Hide resolved
@plt-amy plt-amy force-pushed the aliao/instance-discrim-tree branch from 2ff567a to 9a0adb1 Compare March 9, 2024 04:44
@plt-amy plt-amy merged commit aa5d8bb into master Mar 9, 2024
37 checks passed
@plt-amy plt-amy deleted the aliao/instance-discrim-tree branch March 9, 2024 05:27
plt-amy added a commit to plt-amy/1lab that referenced this pull request Mar 9, 2024
Selfishly bumping to include agda/agda#7109. Go forth and instance
search
@andreasabel andreasabel modified the milestones: 2.6.5, 2.7.0 Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
instance Instance resolution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants